MCP Server / Auto claude code research in sleep
Auto claude code research in sleep
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
Installation
claude mcp add auto-claude-code-research-in-sleep -- npx -y @google/gemini-cli
npx -y @google/gemini-cli
npm: @google/gemini-cli
Transport
Tools (20)
venue
`ICML`
false
Stop after parsing + strategy (Phase 0-3). See what reviewers want before drafting
false
Auto-run supplementary experiments via `/experiment-bridge` when reviewers ask for new evidence
Paper
Score
Author
Stack
UAV-CC
Under review
balanced
max
AUTO_PROCEED
`true`
false
Pause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early
sources
`all`
false
Download top relevant arXiv PDFs during literature survey. When `false`, only fetches metadata (title, abstract, authors)
DBLP_BIBTEX
`true`
true
GPT-5.4 xhigh reviews experiment code before GPU deployment. Set `false` to skip
wandb
`false`
illustration
`gemini`
venue
`ICLR`
false
GitHub repo URL to clone as base codebase (e.g., `— base repo: https://github.com/org/project`). No code? Build on top of an open-source project
gpu
`local`
compact
`false`
false
Reference paper to build on (PDF path or arXiv URL). Summarized first, then ideas extend/improve it. Combine with `base repo` for paper+code workflows
Dokumentation
Auto-claude-code-research-in-sleep (ARIS ⚔️🌙)
💡 Use ARIS in Claude Code / Cursor / Trae as a skill-based workflow, or get the full experience with the standalone CLI — enjoy any way you like!
🤖 AI agents: Read AGENT_GUIDE.md instead — structured for LLM consumption, not human browsing.
🔥 ARIS-Code CLI — 独立安装版 · English | ⬇️ Download
📰 ARIS-Code v0.4.4 (2026-04-20) — Setup UX + reviewer routing fixes (resolves #158, #162) |
/setupno longer forces Bearer for Anthropic + custom URL (fixes ModelScope /code.newcli.cometc.) | Provider-aware proxy URL hints | Stale state no longer leaks across provider switches | LlmReview smart fallbackv0.4.3 (2026-04-17) — Third-party Anthropic-compat proxy support (Bedrock etc.) | Skip beta flags that proxies reject | Propagate custom base URL for
anthropicprovider | Credit @screw-44v0.4.2 (2026-04-17) — Auto-compaction corruption fix | Compaction summary preserved on OpenAI-compat executors | Shell-provided API keys no longer erased on launch
v0.4.1 (2026-04-15) — Plan mode (
/plan) | Cooperative Ctrl+C interrupt | Auto-retry (429/5xx/network) | Research Wiki 📚 (persistent knowledge base) | Self-Evolution 🧬 (/meta-optimize) | Local models (LM Studio/Ollama) | 62 skills syncedv0.3.11 (2026-04-13) — Reviewer Anthropic-compatible mode (Claude via proxy)
v0.3.9 (2026-04-11) — Proxy/custom base URL (CCSwitch) | Local models (LM Studio/Ollama) | Windows (experimental)
v0.3.5 (2026-04-08) — Research Wiki (persistent papers/ideas/experiments/claims + relationship graph) | Meta-Optimize self-evolution (analyze logs → propose SKILL.md patches)
v0.3.0 (2026-04-03) — Multi-file memory index | Rich task system (TodoWrite) |
/plan| Security hardeningv0.2.2 (2026-04-03) —
/planstep-by-step planning |/taskspersistent trackingv0.2.1 (2026-04-03) — Persistent Memory | Kimi K2.5 multi-turn fix | CJK cursor fix
v0.2.0 (2026-04-02) — Open source | Kimi + MiniMax + GLM support | Smart LlmReview routing | CI/CD
v0.1.0 (2026-04-02) — Initial release | Multi-executor & reviewer | 42 bundled skills
中文版 README | English
🌙 Let Claude Code do research while you sleep. Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously.
🪶 Radically lightweight — zero dependencies, zero lock-in. The entire system is plain Markdown files. No framework to learn, no database to maintain, no Docker to configure, no daemon to babysit. Every skill is a single
SKILL.mdreadable by any LLM — swap Claude Code for Codex CLI, OpenClaw, Cursor, Trae, Antigravity, Windsurf, or your own agent and the workflows still work. Fork it, rewrite it, adapt it to your stack.💡 ARIS is a methodology, not a platform. What matters is the research workflow — take it wherever you go. 🌱
· · · · 💬 Join Community ·
Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate cross-model collaboration — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 Also supports alternative model combinations (Kimi, LongCat, DeepSeek, etc.) — no Claude or OpenAI API required. For example, MiniMax-M2.7 + GLM-5 or GLM-5 + MiniMax-M2.7. 🤖 Codex CLI native — full skill set also available for OpenAI Codex. 🖱️ Cursor — works in Cursor too. 🖥️ Trae — ByteDance AI IDE. 🚀 Antigravity — Google's agent-first IDE. 🆓 Free tier via ModelScope — zero cost, zero lock-in.
💭 Why not self-play with a single model? Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into local minima — the same model reviewing its own patterns creates blind spots.
Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.
💭 Why two models, not more? Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4.
Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — speed × rigor — produce better outcomes than either model talking to itself.
🧿 Want the strongest possible reviewer? Add
— reviewer: oracle-proto any skill to route reviews through GPT-5.4 Pro via Oracle MCP. Pro-level reasoning for proof verification, experiment auditing, and final stress tests. Works with API key or free browser mode. Setup →
🎯 More Than Just a Prompt
These are full pipelines — you can also use each workflow independently. Already have an idea? Skip to Workflow 1.5. Have results? Jump to Workflow 3. Got reviews? Jump to Workflow 4. Want persistent memory? Enable Research Wiki. See Quick Start for all commands and Workflows for the full breakdown.
Basic mode — give ARIS a research direction, it handles everything:
/research-pipeline "factorized gap in discrete diffusion LMs"
🔥 Targeted mode — got a paper you want to improve? Give ARIS the paper + the code:
/research-pipeline "improve method X" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/project
ARIS reads the paper → finds its weaknesses → clones the codebase → generates ideas that specifically fix those weaknesses with that code → runs experiments → writes your paper. Like telling a research assistant: "read this paper, use this repo, find what's missing, and fix it."
Mix and match:
ref paperonly = "what can be improved?",base repoonly = "what can I build with this code?", both = "improve this paper using this code."
🔥 Rebuttal mode — reviews just dropped? Don't panic. ARIS reads every concern, builds a strategy, and drafts a rebuttal that's grounded, structured, and under the character limit:
/rebuttal "paper/ + reviews" — venue: ICML, character limit: 5000
| Parameter | Default | What it does |
|-----------|---------|-------------|
| venue | ICML | Target venue (ICML/NeurIPS/ICLR/CVPR/ACL/AAAI/ACM) |
| character limit | — | Required. Hard character limit for rebuttal text |
| quick mode | false | Stop after parsing + strategy (Phase 0-3). See what reviewers want before drafting |
| auto experiment | false | Auto-run supplementary experiments via /experiment-bridge when reviewers ask for new evidence |
| max stress test rounds | 1 | How many times GPT-5.4 xhigh stress-tests the draft |
| max followup rounds | 3 | Per-reviewer follow-up round limit |
Three safety gates — rebuttal will NOT finalize if any fails:
- 🔒 No fabrication — every claim maps to paper/review/user-confirmed result
- 🔒 No overpromise — every promise is user-approved
- 🔒 Full coverage — every reviewer concern is tracked
Two outputs: PASTE_READY.txt (exact char count, paste to venue) + REBUTTAL_DRAFT_rich.md (extended version for manual editing).
After acceptance — your paper is in, now prepare the presentation:
/paper-slides "paper/" # → Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/" # → A0/A1 poster PDF + editable PPTX + SVG
💡 From idea to paper to podium — one toolchain. 🌱
🏆 Papers Built with ARIS
| Paper | Score | Venue | Author | Stack | |-------|:-----:|-------|--------|-------| | CS Paper | 8/10 "clear accept" | CS Conference | @DefanXue & @Monglitay | Claude Code + GPT-5.4 | | AAAI Paper | 7/10 "good paper, accept" | AAAI 2026 Main Technical | @xinbo820-web | Pure Codex CLI | | UAV-CC | Under review | IEEE TGRS | @wxx827 | Claude Opus 4.6 + Codex 5.4 xhigh + Cursor |
🎉 Built with ARIS — from idea to submission. Full details + PDFs →
📢 What's New
- 2026-05-04 — 🪲
/research-wikiand 8 caller skills now resolve helper via fallback chain (#204). Bug: afterbash tools/install_aris.shthe helper lives at.aris/tools/research_wiki.py(symlink), but skills hard-codedtools/research_wiki.pyand silently failed when invoked —research-wiki/stayed empty across full W1 runs. Fix: 3-layer chain (.aris/tools/→tools/→$ARIS_REPO/tools/) codified inshared-references/wiki-helper-resolution.md. The manual-copy workaround at<project>/tools/research_wiki.pyis layer 2, so users whocp-installed the helper as a temporary fix continue to work. ⚠️ Existing users: rerunbash tools/install_aris.shonce — also picks up a separate Python 3.9ImportErrorfix in the helper. - 2026-05-03 — 🎨 Opt-in
— style-ref: <source>for writer-side skills (#202)./paper-{plan,write,writing,illustration,poster,slides},/grant-proposal, and/auto-paper-improvement-loopaccept an optional— style-ref: <source>argument that mimics a reference paper's structural style (section ordering, theorem/figure density, sentence cadence, citation style) without copying its prose, claims, or terminology. Sources: local.texdir/file, local PDF, arXiv id (2501.12345orarxiv:2501.12345), HTTP/HTTPS URL. Overleaf URLs/IDs are rejected — clone via/overleaf-sync setup <id>first. Default OFF; existing behavior unchanged when the flag is absent. Reviewer / auditor sub-skills (/proof-checker,/paper-claim-audit,/citation-audit, the improvement-loop reviewer) never see the style ref — cross-model review independence preserved. ⚠️ Existing ARIS users: the helper ships attools/extract_paper_style.py, distributed via the.aris/toolssymlink (install_aris.shPhase 0, added in #192). Re-runbash tools/install_aris.shonce to refresh the symlink and pick up the helper. Manual fallback:cp <ARIS-repo>/tools/extract_paper_style.py <your-project>/tools/. Without either, the writer skill aborts with a clear error pointing here. - 2026-05-02 — 🪨 Community spotlight: rosetta by @SyntaxSmith. Programmatic access to ChatGPT Pro /
gpt-5.5-pro/ DeepResearch from Node, via Chrome CDP Fetch interception + WebSocket second-leg streaming; ships an MCP server for Claude Code / Codex / Cline. Alternative implementation path to Oracle MCP for ARIS users invoking— reviewer: oracle-pro— same target capability (Pro-tier reviewer), different mechanics. Indexed under Awesome Community Skills & Extensions. 🌟 if you're using it! - 2026-05-02 — 💎🧿 Model & MCP routing updates. (a)
/gemini-searchdefault bumped togemini-3-pro-preview(strongest Gemini, out-of-box). ⚠️ Action required: requiresgemini-cliv0.40+ (rungemini --version; upgrade withnpm i -g @google/gemini-cliif older). Legacy override:/gemini-search "topic" — model: gemini-2.5-pro. Other overrides:gemini-3-flash-preview(faster),auto-gemini-3(load-routed). (b)/idea-discoveryPhase 1 now includes Gemini in its literature survey by default (#199) — auto-injects— sources: all, geminiinto/research-litunless the user passed an explicit— sources:; graceful skip ifgemini-clinot installed. (c) Oracle MCP upstream PR queue (steipete/oracle/pulls) is the first triage stop when invoking— reviewer: oracle-pro(especiallyo3-deep-research/gpt-5.5-pro) — ARIS does not vendor Oracle MCP; check upstream first if behavior surprises you (reviewer-routing.md) - 2026-05-02 — 🛠️🔗 Tools-infrastructure migration started. (a)
install_aris.shcreates optional.aris/toolssymlink (#192, closes #174) — Phase 0 of the 4-step tools-stability plan (#174 → #176 → #177 → #178); idempotent, zero impact until rerun. (b)/experiment-queueorchestration paths repaired (#193) — first real user of the symlink; 7 cascading bugs fixed via 3 rounds of Codex MCPgpt-5.5xhigh audit. Pure prose + docstring;queue_manager.pylogic untouched. Windowsinstall_aris.ps1parallel update tracked as follow-up - 2026-05-02 — 🔬 Three new opt-in audit flags via fast-path delegated-agent workflow (#187, #188, #189).
/citation-audit --uncitedsurfaces bib entries with no\cite{}reference (detect-only)./proof-checker --deep-fixadds a repair-grade plan to the Phase 1 reviewer prompt (corrected statement / patch plan / closure tests + Schur/quadratic-form algebra sanity)./proof-checker --restatement-checkadds Phase 3.6 cross-location theorem drift detection (6 drift signatures). Zero behavior change when flags unset. Plus doc PRs #190 (thread-policy) + #191 (auto-loop xref). Delegated-agent + maintainer-fixup pattern; Codex MCPgpt-5.5xhigh review caught 6+ blockers - 2026-05-01 — 🔍 Gemini + OpenAlex literature sources for
/research-lit(#175, community contribution by @stdAri). Two opt-in sources:/gemini-search(AI-driven discovery viajamubc/gemini-mcp-toolMCP) and/openalex(250M+ work open citation graph, no API key). Triggered via— sources: geminior— sources: openalex; zero behavior change when defaultall(both excluded). Maintainer fixups: corrected@google/gemini-clinpm name; addedtry/except ImportError+ bash preflight for graceful OpenAlex skip whenrequestsmissing - 2026-04-30 — 📝
/rebuttalper-reviewer thread mode + transferable patterns (SKILL.md). AddsVENUE_MODE(single_document|per_reviewer_thread) for OpenReview-style venues,reviewer_priority: pivotalrouting,structural_distinctionresponse mode, 5 reviewer-defensive heuristics, 2 Phase 5 lints, and severity-scaled stress rounds. DefaultVENUE_MODE = single_documentkeeps ICML-style behavior — zero change for existing users. Three rounds of cross-model review before/after merge - 2026-04-30 — 🪞 Codex skill mirror rebuilt + dedicated install/update chain (#179, community contribution by @No-518).
skills/skills-codex/now mirrors all 67 mainline skills; replacesmcp__codex__codexreviewer path with Codex-nativespawn_agent+send_input. Newtools/install_aris_codex.sh+tools/smart_update_codex.shhandle project-local symlinks with manifest tracking. Anti-drift:tests/test_codex_skill_mirror.py+tests/test_codex_install_update.py(26 failure paths). Open discussion in #184 - 2026-04-24 — 🎨
/paper-illustration-image2— Codex-native image generation as Phase 2b illustration backend (#166, community contribution by @kbr19-thu 清华). Uses ChatGPT Plus/Pro quota via local Codex app-server MCP bridge — noGEMINI_API_KEYrequired. Triggered by/paper-writing — illustration: codex-image2; default staysfigurespec(zero behavior change). Async-only API, sandboxed writes tofigures/ai_generated/, integration-contract-compliant helper. Marked experimental (Codex debug app-server is unstable upstream) - 2026-04-21 — 📚 Research Wiki ingest actually works now (
research_wiki.py,/research-wiki). Fixes user-reported bug where/research-wiki initleftpapers/empty forever (ingestsubcommand had no implementation; paper-reading skills had no wiki hook). New canonicalpython3 tools/research_wiki.py ingest_paperhelper owns slugging / metadata fetch / dedup / page render; all 6 paper-reading skills wired to it. Manual backfill viasync --arxiv-idsorsync --from-file. Ships withintegration-contract.mdformalizing the six-component pattern every cross-skill integration must follow - 2026-04-21 — 🛡️ Assurance Gate:
— effort: beast | maxnow really runs mandatory audits (assurance-contract.md,tools/verify_paper_audits.sh). Fixes silent-skip of/proof-checker//paper-claim-audit//citation-auditat high effort. Newassuranceaxis (draft|submission) independent fromeffort:lite/balanced→draft(zero behavior change),max/beast→submission. At submission the 3 audits emit a JSON artifact with 6-state verdict;paper-writingPhase 6 runs the external verifier as source of truth (non-zero exit blocks Final Report). SHA256 input hashing catches stale audits. Escape hatch:— effort: beast, assurance: draft
-
2026-04-20 — 🩹 Project install: flat layout + manifest tracking — fixes a real bug where the previous nested install (
.claude/skills/aris/) hid skills from Claude Code's slash-command discovery (CC only scans one directory level). Anyone who raninstall_aris.shbefore this date was silently affected. Newinstall_aris.shcreates one symlink per skill at.claude/skills/<name>, writes a versioned manifest to.aris/installed-skills.txt, and is re-runnable to reconcile new/removed upstream skills. Defense-in-depth: 13 safety rules (no-symlinked-parents, exact-target revalidation, slug regex, atomic same-dir manifest rename, no-overwrite-real-files, mkdir-based portable lock, ADOPT for crash recovery, …). Granular--adopt-existing/--replace-linkflags replace the all-or-nothing--force. Migration paths:--from-oldfor legacy nested symlink,--migrate-copy keep-user|prefer-upstreamfor legacy nested copy.smart_update.sh --target-subdir .claude/skills/arisis now deprecated with a redirect toinstall_aris.sh. Stale-file bug incp -roverlay also fixed (nowrm -rf && cp -rfor safe-update path) -
2026-04-19 — 🔗
/overleaf-sync— two-way bridge between local ARIS paper directory and an Overleaf project via the official Overleaf Git bridge (Premium). Lets collaborators keep editing in the Overleaf web UI while ARIS audit/edit pipelines (/paper-claim-audit,/citation-audit,/auto-paper-improvement-loop) keep running locally. Sub-commands:setup(one-time, user-driven so the agent never sees the token) /pull(with diff-protocol — flags half-sentences, typos, claim/cite changes that should re-trigger audits) /push(with confirmation gate before writing to shared Overleaf state) /status(3-way divergence check). Token never touches the agent or any file — primed once into macOS Keychain via the user's terminal, then auth-free for all subsequent agent operations -
2026-04-19 — 📚
/citation-audit— fourth and final layer of the evidence-and-claim assurance stack (experiment-audit→result-to-claim→paper-claim-audit→citation-audit). Fresh cross-family reviewer (gpt-5.4 via Codex MCP) with web/DBLP/arXiv lookup verifies every\cite{...}along three independent axes: existence (paper resolves at claimed arXiv ID/DOI/venue), metadata correctness (authors/year/venue/title match canonical sources), and context appropriateness (the cited paper actually establishes the claim it supports — the most diagnostic check). Per-entry verdicts: KEEP / FIX / REPLACE / REMOVE. Auto-integrated into Workflow 3 Phase 5.8 as the pre-submission bibliography gate. Empirical motivation: in a real submission run, several real papers were cited in contexts they did not actually support, and at least one entry shipped withauthor = "Anonymous"— none caught by metadata-only checks -
2026-04-17 — 🔀
/experiment-queueintegrated into Workflow 1.5 + research-pipeline —experiment-bridgePhase 4 Deploy now auto-routes by milestone job count: ≤5 jobs →/run-experiment, ≥10 jobs or phase dependencies →/experiment-queue(with OOM retry, stale-screen cleanup, wave-transition gating, crash-safe state). New--- batch: queueoverride for global force-queue mode. Large multi-seed sweeps fromEXPERIMENT_PLAN.md(e.g., 36-cellN × seed × n_traingrids) now get proper orchestration without manual queue invocation -
2026-04-17 — 🔗 Project-local symlink install (resolves #118) — new recommended default install.
bash tools/install_aris.shauto-detects platform (Claude Code / Codex CLI), creates.claude/skills/arisor.agents/skills/arissymlink to the ARIS repo, adds a managed<!-- ARIS:BEGIN -->block toCLAUDE.md/AGENTS.mdtelling the agent to use only project-local skills, and records install metadata in.aris/skill-source.txt. Solves the skill collision problem when ARIS is mixed with Superpowers / OpenHands / other community packs in the same global skill directory. PowerShell version (install_aris.ps1) ships with junction support for Windows.smart_update.sh --target-subdirflag added for.agents/skills/aris(Codex) project-copy installs; symlinked installs now correctly refusesmart_updateand direct users togit pull. Global install remains supported for power users -
2026-04-16 — 🎨
/figure-spec— deterministic JSON→SVG renderer packaged as a first-class skill. Preferred default for architecture/workflow/pipeline/audit-cascade figures in papers. Shape-aware edge clipping (rect/circle/ellipse/diamond), self-loops, curved edges, multi-line labels with CJK width estimation. Editable vector output, reproducible (same spec → same SVG), no external API. Phase 2b in Workflow 3 restored:illustration: figurespec(new default) /gemini/mermaid/false— 4-way illustration selector with complementary strengths -
2026-04-16 — ⚙️
/experiment-queue— SSH job queue for multi-seed/multi-config ML experiments. Designed from real 36-cell NeurIPS sweep pain points: OOM-aware retry with backoff, stale-screen cleanup, wave-transition race prevention, teacher→student phase dependencies, crash-safe scheduler that resumes from JSON state. Declarative grid specs expand automatically (e.g.,N × seed × n_train → 36 jobs). Configurableconda_hook+gpu_free_threshold_mibfor non-standard environments. Use for ≥10 jobs;/run-experimentstays for ad-hoc -
2026-04-15 — 🛡️ Paper Writing Pipeline Hardening — 10 empirically-motivated patches from a real NeurIPS run.
REVIEWER_BIAS_GUARD=true: every review round uses a fresh thread (codex-reply inflated 3→8/10). Reviewer Independence Protocol: no fix summaries to reviewer. Step 4.5 Restatement Regression Test: catches theorem drift across fix rounds. Step 5.5 Kill Argument Exercise: final-round adversarial attack/defense for theory papers. Location-aware overfull blocking. Theory Paper Consistency Pass in/paper-write. Enforced Bib Hygiene with DBLP/CrossRef validation. Phase 5.5 Mandatory Final Claim Audit as submission gate. Review Tracing Protocol: full prompt/response pairs saved to.aris/traces/for reviewer-independence audit (review-tracing.md,save_trace.sh). Inspired by community contribution from @李傲龍 -
2026-04-15 — 🎨 FigureSpec Renderer v2 — deterministic JSON→SVG figure generation for academic papers. Shape-aware edge clipping (rect/circle/ellipse/diamond), self-loops, curved edges, multi-line labels with CJK width estimation, comprehensive validation (type checks, structure, palette). Went through 5 rounds of Codex review (3/10→7/10). All architecture and workflow diagrams in the ARIS tech report were generated with this pipeline. New
--- mode: vectorfor/paper-illustrationskill -
2026-04-14 — 📋
/paper-claim-audit— zero-context paper-to-evidence verification. Fresh reviewer with NO prior context compares every number in the paper against raw result files. Catches rounding inflation, best-seed cherry-pick, config mismatch, delta errors, scope overclaim. Auto-integrated into Workflow 3 (Phase 4.7). Completes the 3-layer audit chain:/experiment-audit(code) →/result-to-claim(science) →/paper-claim-audit(reporting). 👁️ Visual PDF review also added to improvement loop — reviewer now sees compiled PDF, not just LaTeX source. Inspired by Hermes Agent -
2026-04-13 — 🧿 GPT-5.4 Pro via Oracle —
— reviewer: oracle-proon any skill for the strongest available reviewer. API mode (fast) or browser mode (free). Supported on:/research-review,/auto-review-loop,/experiment-audit,/proof-checker,/rebuttal,/idea-creator,/research-lit. Default stays Codex xhigh. Not installed = zero impact. Setup → -
2026-04-13 — 🔬
/proof-checker— rigorous mathematical proof verification via cross-model review. 20-category issue taxonomy, two-axis severity, side-condition checklists (DCT/MCT/Fubini/IFT/...), counterexample red team, proof-obligation ledger. Auto-integrated into Workflow 3: detects\begin{theorem}and runs before improvement loop. Complements/proof-writer -
2026-04-10 — ⚡ Effort Levels —
— effort: lite | balanced | max | beast. Controls work intensity across all skills: papers found, ideas generated, review rounds, writing depth. Codex reasoning staysxhighalways.beast= every knob to maximum for top-venue sprints. Defaultbalanced= zero change for existing users. Details → -
2026-04-10 — 🔎 DeepXiv integration — progressive paper retrieval via DeepXiv CLI. Opt-in:
— sources: deepxivor— sources: all, deepxiv. Staged reading: search → brief → head → section.pip install deepxiv-sdkto enable. Community contribution by @DreamEnding -
2026-04-10 — 🛡️
/experiment-audit— cross-model experiment integrity verification. GPT-5.4 reads your eval scripts and results directly, checks for fake ground truth, self-normalized scores, phantom results, and scope inflation (#131, #57). Advisory — warns loudly, never blocks./result-to-claimauto-reads audit if present. New experiment-integrity.md shared reference. The executor must never judge its own integrity. -
2026-04-10 — 🧠
tools/smart_update.sh— intelligent skill updater. Compares local vs upstream, detects personal customizations (server paths, API keys), only updates safe skills.bash tools/smart_update.sh --apply -
2026-04-10 — 🏆 Community paper: UAV-CC — first community paper with full PDF archived. UAV change captioning benchmark for IEEE TGRS by @wxx827. Stack: Claude Opus 4.6 + Codex 5.4 xhigh + Cursor. Papers now archived in
community_papers/ -
2026-04-08 — 📚
/research-wiki— persistent research knowledge base inspired by Karpathy's LLM Wiki. Accumulates papers, ideas, experiments, and claims across the entire research lifecycle with typed relationships. Wiki-aware hooks in/research-lit(ingest papers),/idea-creator(read wiki + write ideas back), and/result-to-claim(update claim status + trigger re-ideation). Failed ideas become anti-repetition memory. ARIS now learns from its mistakes. -
2026-04-05 — 🧬
/meta-optimize— outer-loop harness optimization for ARIS. Passively logs skill invocations, tool calls, failures, and parameter overrides via Claude Code hooks. Run/meta-optimizeto analyze accumulated usage data and propose SKILL.md improvements — reviewer-gated, user-approved. Inspired by Meta-Harness (Lee et al., 2026). ARIS now optimizes itself. -
2026-04-04 — 🔧 Codex Plugin deep integration —
/codex:rescuenow auto-invoked when experiments fail (Workflow 1.5) or LaTeX won't compile (Workflow 3). GPT independently diagnoses the bug before Claude retries — two AI debuggers are better than one. Optional:codex execpowers nightmare review,/codex:rescuepowers auto-debug. Setup → -
2026-04-03 — ☁️ Modal serverless GPU — no GPU?
gpu: modalin CLAUDE.md, one command (modal run launcher.py), no SSH, no Docker, auto scale-to-zero. $30/month free tier — enough to try ARIS experiments without any hardware.pip install modal && modal setupand go. Community contribution by @zeyuzhangzyz -
2026-04-03 — 🎮 Reviewer Difficulty Levels —
medium(default, unchanged),hard(reviewer memory + debate protocol),nightmare(GPT reads repo directly viacodex exec— Claude can't hide anything).— difficulty: nightmarefor maximum stress test before submission -
2026-03-30 — 🔥 Auto-debug & exhaust-before-surrender — experiment-bridge auto-diagnoses failures (OOM, import, CUDA, NaN) and retries up to 3×. Inspired by PUA
-
2026-03-30 — ☁️ Vast.ai GPU rental —
gpu: vastauto-rents cheapest GPU. By @YIHONG-JIN. 🔧 MiniMax M2.7 upgrade by @octo-patch -
2026-03-27 — 📄 IEEE venue support (9 families). 🔎 Semantic Scholar. By @ypd666
-
2026-03-26 — 📄 Document-based input —
RESEARCH_BRIEF.mdauto-detect -
2026-03-24 — 📝 Workflow 4:
/rebuttal— 7-phase pipeline, 3 safety gates -
2026-03-23 — 🔧
/training-check,/result-to-claim,/ablation-plannerintegrated. 📦compactmode. By @JingxuanKang & @couragec -
2026-03-22 — 📋 Templates — input templates for every workflow. 📄 7 venue templates — CVPR, ACL, AAAI, ACM MM added. 🛡️ Anti-hallucination fix — Workflow 2 enforces DBLP → CrossRef → [VERIFY]. 🔗
base repo— clone a GitHub repo as base codebase (— base repo: https://github.com/org/project) -
2026-03-22 — 🔍 Codex + Gemini review guide — Codex executes, Gemini reviews via local
gemini-reviewMCP bridge. CN -
2026-03-20 — 🚀 Antigravity adaptation guide — use ARIS skills in Google Antigravity (agent-first IDE). Community contribution by @PeppaPigw
-
2026-03-20 — 🖥️ Trae adaptation guide — use ARIS skills in Trae (ByteDance AI IDE). Community contribution by @Prometheus-cotigo. 🔢
formula-derivation— Community contribution by @Falling-Flower -
2026-03-19 — 🖼️
paper-poster— Conference poster. Community contribution by @dengzhe-hou -
2026-03-19 — 🔗 Workflow 1.5 upgraded —
/experiment-bridgeGPT-5.4 code review. 📊 W&B fix -
2026-03-18 — 🎤
paper-slides+ 🔁 Codex+Claude bridge + 🖱️ Cursor guide + 🤖 Codex CLI skills + 📝grant-proposal+ 🎨paper-illustration(Gemini) + 📊 CitationClaw -
2026-03-17 — 🔧 Git code sync + 🆓 ModelScope guide + parameter pass-through
-
2026-03-16 — 🔬
research-refine+experiment-plan— turn vague ideas into problem-anchored proposals with claim-driven experiment roadmaps. Now integrated into Workflow 1 (/idea-discovery). Community contribution by @zjYao36 -
2026-03-16 — 🇨🇳 Alibaba Coding Plan guide — one API key, 4 models (Kimi-K2.5 + Qwen3.5+ + GLM-5 + MiniMax-M2.7), dual-endpoint setup. Community contribution by @tianhao909
-
2026-03-15 — 🔀 Bring your own model! Any OpenAI-compatible API now works as reviewer via
llm-chatMCP server. GLM, MiniMax, Kimi, LongCat, DeepSeek all tested — zero Claude or OpenAI API needed -
2026-03-15 — 🐾 OpenClaw adaptation guide — use ARIS research workflows in OpenClaw without Claude Code slash skills
-
2026-03-15 — 📐
proof-writer— community skill for rigorous theorem proof drafting. 📚 Anti-hallucination citations —/paper-writenow fetches real BibTeX from DBLP/CrossRef instead of LLM-generated entries — on by default, zero install -
2026-03-14 — 📱 Feishu/Lark integration: three modes (off/push/interactive), mobile notifications for experiments, reviews, and checkpoints
-
2026-03-13 — 🛑 Human-in-the-loop: configurable
AUTO_PROCEEDcheckpoints across all workflows. Full autopilot or step-by-step approval -
2026-03-12 — 🔗 Zotero + Obsidian + local PDFs + arXiv/Scholar: multi-source literature search with cross-model novelty verification
-
2026-03-12 — 🚀 Three end-to-end workflows complete: one prompt → top-venue-style paper.
/research-pipelinechains idea discovery → auto review → paper writing autonomously -
2026-03-12 — 📝
/paper-writingworkflow: narrative report → structured outline → figures → LaTeX → compiled PDF → 2-round auto-improvement (4/10 → 8.5/10)
🚀 Quick Start
# 1. Install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
mkdir -p ~/.claude/skills/ # create if it doesn't exist (new Claude Code versions)
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
# 1b. Update skills (when upstream has new versions)
cd Auto-claude-code-research-in-sleep && git pull
bash tools/smart_update.sh # dry-run: shows what's new/changed/safe
bash tools/smart_update.sh --apply # apply: adds new + updates safe ones
# Optional Codex mirror managed project install
bash tools/install_aris_codex.sh ~/your-codex-project
# Managed Codex project update
cd Auto-claude-code-research-in-sleep && git pull
bash tools/install_aris_codex.sh ~/your-codex-project --reconcile
# Copied Codex installs only (not for projects installed by install_aris_codex.sh)
bash tools/smart_update_codex.sh --local ~/.codex/skills
bash tools/smart_update_codex.sh --local ~/.codex/skills --apply
# 2. Set up Codex MCP (for review skills)
npm install -g @openai/codex
codex setup # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server
# 3. Use in Claude Code
claude
> /idea-discovery "your research direction" # Workflow 1 — be specific! not "NLP" but "factorized gap in discrete diffusion LMs"
> /experiment-bridge # Workflow 1.5 — have a plan? implement + deploy + collect results
> /auto-review-loop "your paper topic or scope" # Workflow 2: review → fix → re-review overnight
> /paper-writing "NARRATIVE_REPORT.md" # Workflow 3: narrative → polished PDF
> /rebuttal "paper/ + reviews" — venue: ICML # Workflow 4: parse reviews → draft rebuttal → follow-up
> /research-pipeline "your research direction" # Full pipeline: Workflow 1 → 1.5 → 2 → 3 end-to-end
> /research-wiki init # 📚 Enable persistent research memory (one-time)
> /meta-optimize # Meta: analyze usage logs → propose skill improvements
📚 Research Wiki (optional): Give ARIS persistent memory across sessions. Papers, ideas, failed experiments — nothing is forgotten:
# In Claude Code: > /research-wiki init # creates research-wiki/ in your project # That's it. From now on, /research-lit auto-ingests papers, /idea-creator reads # the wiki before brainstorming (and writes ideas back), /result-to-claim updates # claim status. Failed ideas become anti-repetition memory for future ideation.See Research Wiki for the full guide.
🧬 Meta-optimization (optional): Run these in your normal terminal (not inside Claude Code) to enable passive usage logging:
# One-time setup in your project directory mkdir -p .claude .aris/meta tools/meta_opt cp Auto-claude-code-research-in-sleep/templates/claude-hooks/meta_logging.json .claude/settings.json cp Auto-claude-code-research-in-sleep/tools/meta_opt/*.sh tools/meta_opt/ chmod +x tools/meta_opt/*.sh # Then start Claude Code — hooks are active immediately claudeEvents are logged to both project-level (
.aris/meta/events.jsonl) and global (~/.aris/meta/events.jsonl) logs. After 5+ workflow runs, run/meta-optimizeto see data-driven improvement proposals. Use/meta-optimize --globalto analyze trends across all your projects. See Workflow M for details.
📝 Templates available! See
templates/for ready-to-use input templates for every workflow — research brief (Workflow 1), experiment plan (Workflow 1.5), narrative report (Workflow 3), paper plan (Workflow 3).🔎 Optional: DeepXiv progressive retrieval
pip install deepxiv-sdkThen use
/deepxivdirectly or opt into it from/research-litwith— sources: deepxivor— sources: all, deepxiv.🔎 Optional: Exa AI-powered web search
pip install exa-py export EXA_API_KEY=your-key-hereThen use
/exa-searchdirectly or opt into it from/research-litwith— sources: exaor— sources: all, exa. Covers blogs, docs, news, and research papers with built-in content extraction.🗑️ Uninstall: To remove ARIS skills without affecting your own personal skills:
cd Auto-claude-code-research-in-sleep && ls skills/ | xargs -I{} rm -rf ~/.claude/skills/{}
Tip: All pipeline behaviors are configurable via inline overrides — append
— key: valueto any command:| Parameter | Default | What it does | |-----------|---------|-------------| |
AUTO_PROCEED|true| Auto-continue at idea selection gate. Setfalseto manually pick which idea to pursue before committing GPU time | |human checkpoint|false| Pause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early | |sources|all| Which literature sources to search:zotero,obsidian,local,web,semantic-scholar,deepxiv,exa, orall. Note:semantic-scholar,deepxiv, andexamust be explicitly listed — not included inall| |arxiv download|false| Download top relevant arXiv PDFs during literature survey. Whenfalse, only fetches metadata (title, abstract, authors) | |DBLP_BIBTEX|true| Fetch real BibTeX from DBLP/CrossRef instead of LLM-generated entries. Eliminates hallucinated citations. Zero install | |code review|true| GPT-5.4 xhigh reviews experiment code before GPU deployment. Setfalseto skip | |wandb|false| Auto-add W&B logging to experiment scripts. Settrue+ configurewandb_projectin CLAUDE.md./monitor-experimentpulls training curves from W&B | |illustration|gemini| AI illustration in Workflow 3:gemini(default, needsGEMINI_API_KEY),mermaid(free), orfalse(skip) | |venue|ICLR| Target venue:ICLR,NeurIPS,ICML,CVPR,ACL,AAAI,ACM. Determines LaTeX style file and page limit | |base repo|false| GitHub repo URL to clone as base codebase (e.g.,— base repo: https://github.com/org/project). No code? Build on top of an open-source project | |gpu|local| GPU target:local(default),remote(SSH server), orvast(rent on-demand from Vast.ai — auto-provision, auto-destroy) | |compact|false| Generate compact summary files (IDEA_CANDIDATES.md,findings.md,EXPERIMENT_LOG.md) for short-context models and session recovery | |ref paper|false| Reference paper to build on (PDF path or arXiv URL). Summarized first, then ideas extend/improve it. Combine withbase repofor paper+code workflows | |effort|balanced| Work intensity:lite(0.4x tokens),balanced(default),max(2.5x),beast(5-8x). Controls breadth/depth/iterations. Codex reasoning alwaysxhigh. See Effort Levels | |reviewer|codex| Reviewer backend:codex(GPT-5.4 xhigh, default),oracle-pro(GPT-5.4 Pro via Oracle — strongest reasoning). See Setup → | |difficulty|medium| Reviewer adversarial level:medium(default),hard(+ memory + debate),nightmare(+ GPT reads repo viacodex exec) |/research-pipeline "your topic" — AUTO_PROCEED: false # pause at idea selection gate /research-pipeline "your topic" — human checkpoint: true # pause after each review round to give feedback /research-pipeline "your topic" — sources: zotero, web # only search Zotero + web (skip local PDFs) /research-pipeline "your topic" — sources: all, deepxiv # default sources plus DeepXiv progressive retrieval /research-pipeline "your topic" — sources: all, exa # default sources plus Exa AI-powered web search /research-pipeline "your topic" — arxiv download: true # download top arXiv PDFs during literature survey /research-pipeline "your topic" — difficulty: nightmare # maximum adversarial review before submission /research-pipeline "your topic" — effort: beast # all knobs to maximum — top-venue sprint /research-pipeline "your topic" — effort: beast, reviewer: oracle-pro # beast + GPT-5.4 Pro reviewer — ultimate mode /research-pipeline "your topic" — effort: lite # quick exploration, save tokens /research-pipeline "your topic" — effort: max, review_rounds: 3 # max effort but cap review at 3 rounds /research-pipeline "your topic" — AUTO_PROCEED: false, human checkpoint: true # combine options /proof-checker "paper/" — reviewer: oracle-pro # Pro-level proof verification
Important: Codex MCP uses the model from
~/.codex/config.toml, not from skill files. Make sure it saysmodel = "gpt-5.4"(recommended). Other options:gpt-5.3-codex,gpt-5.2-codex,o3. Runcodex setupor edit the file directly.
Want Codex to execute but Claude Code to review? See
docs/CODEX_CLAUDE_REVIEW_GUIDE.md. That path installs the baseskills/skills-codex/*, then overlaysskills/skills-codex-claude-review/*, and routes review-heavy skills through the localclaude-reviewMCP bridge.
Want Codex to execute but Gemini to review locally? See
docs/CODEX_GEMINI_REVIEW_GUIDE.mdand CN. That path installs the baseskills/skills-codex/*, then overlaysskills/skills-codex-gemini-review/*, and routes the reviewer-aware predefined skills through the localgemini-reviewMCP bridge using direct Gemini API by default.
Want the Codex mirror install chain? Use
tools/install_aris_codex.shfor managed project installs andtools/smart_update_codex.shfor copied Codex installs. The Claude scripts remain the mainline entry points for Claude projects.
See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API.
🧠 Update skills later? Smart update analyzes what's safe:
cd Auto-claude-code-research-in-sleep git pull bash tools/smart_update.sh # dry-run: shows what's new/changed/safe bash tools/smart_update.sh --apply # apply: adds new + updates safe onesCompares local skills with upstream, detects personal customizations (server paths, API keys, etc.), and only updates skills that are safe to replace. Skills with your personal info are flagged for manual review.
✨ Features
- 📊 31 composable skills — mix and match, or chain into full pipelines (
/idea-discovery,/auto-review-loop,/paper-writing,/research-pipeline) - 🔍 Literature & novelty — mul