Skills / paper fetch
paper fetch
Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.
Installation
Compatibility
Description
paper-fetch — Download Scientific papers automatically
What it does
- Downloads paper PDFs from a DOI (or batch file of DOIs) via open-access sources
- 6-source fallback chain: Unpaywall → Semantic Scholar
openAccessPdf→ arXiv → PubMed Central OA → bioRxiv/medRxiv → Sci-Hub mirrors (last resort, on by default) - Zero dependencies — pure Python standard library, no
pip installneeded - Auto-named output —
{first_author}_{year}_{journal_abbrev}_{short_title}.pdf(journal omitted if unknown; multi-word journals get ISO-style initials, e.g. Proceedings of the National Academy of Sciences →PNAS) - Batch mode — pass a file of DOIs with
--batch, or pipe them in with--batch - - Agent-native — stable JSON envelope on stdout, NDJSON progress on stderr, machine-readable
schemasubcommand (withdeprecationsslot for forward-compat drift detection), TTY-aware format default, idempotent retries via--idempotency-key, typed exit codes (0/1/3/4), partial-success batches withnextretry hints, per-source diagnostics inresult.source_detail(e.g. which Sci-Hub mirror won, so an orchestrator can pin it viaPAPER_FETCH_SCIHUB_MIRRORS) - Safely retriable — re-running skips already-downloaded files;
--idempotency-keyreplays the exact envelope without any network I/O - Self-updating — when installed via
git clone, the agent runs a synchronousgit pull --ff-onlyon the first invocation per conversation, throttled to once per 24h via<skill_dir>/.last_update. Updates apply immediately. Zero user action required. Force an immediate check withrm <skill_dir>/.last_update.
Discipline Coverage
The skill is discipline-agnostic — it works for any field, not just life sciences or computer science.
| Source | Discipline scope | |---|---| | Unpaywall | ✅ All disciplines (covers every Crossref DOI — humanities, social sciences, physics, chemistry, economics, etc.) | | Semantic Scholar | ✅ All disciplines (cross-domain academic graph) | | arXiv | Physics, math, CS, statistics, quantitative finance, economics, EE | | PubMed Central | Biomedical only | | bioRxiv / medRxiv | Biology / medicine preprints only | | Sci-Hub mirrors | ✅ All disciplines (last-resort fallback when every OA / institutional source misses) |
In practice, Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, humanities, and every other field via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains, and Sci-Hub is the universal last resort.
Multi-Platform Support
Works with all major AI coding agents that support the Agent Skills format:
| Platform | Status | Details |
|----------|--------|---------|
| Claude Code | ✅ Full support | Native SKILL.md format |
| OpenClaw / ClawHub | ✅ Full support | metadata.openclaw namespace |
| Hermes Agent | ✅ Full support | Installable under research category |
| pi-mono | ✅ Full support | metadata.pimo namespace |
| OpenAI Codex | ✅ Full support | agents/openai.yaml sidecar |
| SkillsMP | ✅ Indexed | GitHub topics configured |
Comparison
vs No Skill (native agent)
| Feature | Native agent | This skill |
|---------|-------------|------------|
| Resolve DOI to PDF | Ad-hoc web search | Deterministic 5-source chain |
| Unpaywall integration | No | Yes — highest OA coverage |
| arXiv / PMC / bioRxiv fallback | Manual | Automatic |
| Batch download | No | Yes — --batch dois.txt or --batch - (stdin) |
| Consistent filenames | No | Yes — author_year_title.pdf |
| Machine-readable schema | No | Yes — fetch.py schema |
| Structured output | No | Stable JSON envelope + NDJSON progress |
| Idempotent retries | No | --idempotency-key replays cached envelope |
| Typed exit codes | No | 0/1/3/4 — orchestrator can route failures |
| Dependencies | Varies | Python stdlib only |
Prerequisites
- Python 3.8+ (standard library only, no extra packages)
- Unpaywall contact email (optional but recommended) — set once:
export [email protected]
Add it to ~/.zshrc / ~/.bashrc to persist. Without it, Unpaywall is skipped and the remaining 4 sources (Semantic Scholar, arXiv, PMC, bioRxiv/medRxiv) are still tried.
Skill Installation
Claude Code
# Global install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch
# Project-level install
git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch
OpenClaw / ClawHub
clawhub install paper-fetch
# Or manual
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch
Hermes Agent
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch
Or add to ~/.hermes/config.yaml:
skills:
external_dirs:
- ~/myskills/paper-fetch
pi-mono
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch
OpenAI Codex
# User-level
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch
# Project-level
git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch
SkillsMP
skills install paper-fetch
Installation paths summary
| Platform | Global path | Project path |
|----------|-------------|--------------|
| Claude Code | ~/.claude/skills/paper-fetch/ | .claude/skills/paper-fetch/ |
| OpenClaw | ~/.openclaw/skills/paper-fetch/ | skills/paper-fetch/ |
| Hermes Agent | ~/.hermes/skills/research/paper-fetch/ | Via external_dirs |
| pi-mono | ~/.pimo/skills/paper-fetch/ | — |
| OpenAI Codex | ~/.agents/skills/paper-fetch/ | .agents/skills/paper-fetch/ |
| SkillsMP | N/A (installed via CLI) | N/A |
Usage
Single DOI:
python scripts/fetch.py 10.1038/s41586-021-03819-2
Custom output directory:
python scripts/fetch.py 10.1038/s41586-021-03819-2 --out ~/papers
Batch mode:
cat > dois.txt <<EOF
10.1038/s41586-021-03819-2
10.1126/science.abj8754
10.1101/2023.01.01.522400
EOF
python scripts/fetch.py --batch dois.txt --out ~/papers
Dry-run (preview without downloading):
python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run
Human-readable text output:
python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text
Pipe DOIs from another tool:
echo 10.1038/s41586-021-03819-2 | python scripts/fetch.py --batch -
Safely retriable batch (replay cached envelope on retry):
python scripts/fetch.py --batch dois.txt --out ~/papers \
--idempotency-key monday-review-batch
Machine-readable self-description (for agents):
python scripts/fetch.py schema --pretty
Streaming NDJSON (one result per line as each DOI resolves):
python scripts/fetch.py --batch dois.txt --stream
Or just ask your agent naturally:
Download the AlphaFold2 paper PDF to my
~/papersfolder
Fetch the PDF for DOI 10.1038/s41586-020-2649-2
Download these three papers: 10.1038/s41586-021-03819-2, 10.1126/science.abj8754, 10.1101/2023.01.01.522400
Check if this paper has an open-access PDF available: 10.1038/s41586-020-2649-2
Batch download all DOIs from my dois.txt file into ~/papers
Resolution Order
- Unpaywall — best OA location across all publishers (highest hit rate)
- Semantic Scholar —
openAccessPdffield +externalIdslookup - arXiv — if the paper has an arXiv ID
- PubMed Central OA subset — if the paper has a PMCID
- bioRxiv / medRxiv — DOI prefix
10.1101/ - Publisher direct — institutional mode only (
PAPER_FETCH_INSTITUTIONAL=1); your subscription IP / cookies / EZproxy authorize the fetch - Sci-Hub mirrors — last resort, on by default. Tries
PAPER_FETCH_SCIHUB_MIRRORS(or built-in defaults:sci-hub.ru,sci-hub.st,sci-hub.su,sci-hub.box,sci-hub.red,sci-hub.al,sci-hub.mk,sci-hub.ee); on full miss, scrapeshttps://www.sci-hub.pub/once for fresh mirrors. Disable withPAPER_FETCH_NO_SCIHUB=1. - Otherwise → report failure with metadata (title/authors) for ILL
Files
SKILL.md— the only required file. Loaded by all platforms.scripts/fetch.py— the downloader (pure stdlib Python)agents/openai.yaml— OpenAI Codex sidecar configurationREADME.md— this fileREADME_CN.md— Chinese documentation
Known Limitations
- Some publisher redirects return an HTML landing page instead of a PDF; the script validates the
%PDFheader and fails cleanly in that case - No authentication — institutional proxies (EZproxy / OpenAthens) are not supported in this version
- SSRF defense — every outbound fetch rejects private IPs, non-http(s) schemes, non-80/443 ports, and cloud-metadata hostnames
- 50 MB size limit — per-PDF download cap to prevent runaway downloads
License
MIT
Support
If this skill helps you, consider supporting the author:
Author
Agents365-ai
- Bilibili: https://space.bilibili.com/441831884
- GitHub: https://github.com/Agents365-ai
Related Skills
last30days skill
AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
frontend slides
Create beautiful slides on the web using Claude's frontend skills
context mode
Context window optimization for AI coding agents. Sandboxes tool output, 98% reduction. 14 platforms
claude seo
Universal SEO skill for Claude Code. 19 sub-skills, 12 subagents, 3 extensions (DataForSEO, Firecrawl, Banana). Technical SEO, E-E-A-T, schema, GEO/AEO, backlinks, local SEO, maps intelligence, Google APIs, and PDF/Excel reporting.
claude ads
Comprehensive paid advertising audit & optimization skill for Claude Code. 250+ checks across Google, Meta, YouTube, LinkedIn, TikTok, Microsoft & Apple Ads with weighted scoring, parallel agents, industry templates, and AI creative generation.
claude obsidian
Claude + Obsidian knowledge companion. Persistent, compounding wiki vault based on Karpathy's LLM Wiki pattern. /wiki /save /autoresearch