MCP Server / pentest agents
pentest agents
Bug bounty agent framework for Claude Code, Codex, Gemini, Cursor, Windsurf, Copilot, and OpenClaw — 48 agents, 26 commands, 19 CLI tools, 2 MCP servers, autonomous hunt loops, exploit chain builder.
Transport
Tools (20)
Target
Agents
Rules
MCP
Workflows
`.windsurf/rules/*.md` (≤12 KiB / file)
Mode
Requires
Command
Description
Command
Description
Command
Description
Command
Description
Skill
Lines
Highlights
rce-hunter
1,218-report distillation. RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle, LangChain REPL, Tekton/OpenProject git arg injection, ingress-nginx, container/runtime, ML serving, agentic LLM tool-
969
idor-hunter
968
xss-hunter
770
oauth-hunter
930
llm-ai-hunter
File
Lines
267
23 hunting rules (Rule 0 harm check, Rule 8 sibling check, Rule 9 A→B signal, Rule 19 never-submit)
389
Proven attack techniques extracted from real paid engagements
166
WAF bypass iteration ladder for Akamai/Cloudflare/Imperva
126
Patched vendor vectors, framework fingerprints, cooldown tables
Dokumentation
~760 files · ~118k lines · 50 agents · 26 commands · 19 CLI tools · 11 skills · 2 MCP servers (16 bug-bounty platforms + BYO writeup search) · 2,500 payload lines
A complete bug bounty framework. Battle-tested hunting methodology with concrete payloads, 7-Question Gate validation, autonomous hunt loops, A→B exploit chain building, persistent brain with endpoint tracking, optional semantic writeup search (bring your own index), automatic cost tracking via CC hooks, live platform integration, and a cross-IDE installer that emits the native format for Claude Code, Codex, Gemini, Cursor, Windsurf, VS Code Copilot, and OpenClaw.
Quick Start
# MCP servers are launched via `uv run --with mcp` — no global pip install required.
export HACKERONE_USERNAME=you HACKERONE_TOKEN=your_token
uv run python3 tools/scaffold.py hackerone tesla --type web-app
cd ~/bounties/hackerone-tesla && claude
/model opus # Opus 4.7 [1M] — subagents inherit via model: "inherit"
/sync hackerone tesla
/brain init && /status
/hunt tesla.com
Install (Claude Code + 6 other AI coding tools)
The framework ships pre-rendered for every supported tool. There are two ways to use it:
1. Use the bundles directly (no install step)
git clone https://github.com/H-mmer/pentest-agents-suite
cd pentest-agents-suite/pentest-agents/providers/codex
codex # or: cd ../gemini && gemini, etc.
The providers/<id>/ tree contains a fully-translated, ready-to-use bundle
for each non-Claude target. Path references inside use .. to reach the
repo's tools/, rules/, and mcp-*-server/ — so the bundle works as
long as it stays inside the cloned repo.
2. Run the installer (writes into your own project or ~/.codex/ etc.)
python3 -m tools.installer install --targets all --scope project
python3 -m tools.installer install --targets codex --scope global
Install mode rewrites paths to absolute references back into the cloned pentest-agents repo, so the install works no matter where the user's own project lives.
| Target | Agents | Slash commands | Rules | MCP | Scopes |
|---|---|---|---|---|---|
| Claude Code | native .claude/agents/*.md | .claude/skills/<name>/SKILL.md | CLAUDE.md | .mcp.json / ~/.claude.json | global + project |
| OpenAI Codex | native .codex/agents/*.toml | ~/.codex/prompts/*.md (user-only — no project-scope prompt loading) | AGENTS.md (≤32 KiB) | [mcp_servers.*] in config.toml | global + project (prompts always go to ~/.codex/) |
| Google Gemini | native .gemini/agents/*.md | TOML in .gemini/commands/ | GEMINI.md | mcpServers in settings.json | global + project |
| Cursor | → skills .cursor/skills/agent-*/SKILL.md (no native subagents) | → skills .cursor/skills/cmd-*/SKILL.md | .cursor/rules/*.mdc + AGENTS.md | .cursor/mcp.json | global + project |
| Windsurf | → skills | Workflows | .windsurf/rules/*.md (≤12 KiB / file) | ~/.codeium/windsurf/mcp_config.json | global + project |
| VS Code Copilot | .github/agents/*.agent.md (≤30 KiB / agent) | .github/prompts/*.prompt.md | .github/copilot-instructions.md + .github/instructions/* | .vscode/mcp.json | project + global-MCP |
| OpenClaw | → skills | → skills | ~/.openclaw/workspace/AGENTS.md or <proj>/AGENTS.md | mcp.servers in ~/.openclaw/openclaw.json | global + project (MCP is user-level) |
Cursor, Windsurf, and OpenClaw have no native subagent concept; Claude-format
agents render as skills/rules. Codex slash commands only load from
~/.codex/prompts/; project-scope prompt loading is unsupported by Codex.
providers/ directory (in the cloned repo):
providers/
├── codex/ AGENTS.md + .codex/{agents,prompts,config.toml}
├── gemini/ GEMINI.md + .gemini/{agents,commands} + settings.json
├── cursor/ AGENTS.md + .cursor/{rules,skills,mcp.json}
├── windsurf/ AGENTS.md + .windsurf/{rules,workflows,skills} + mcp_config.json
├── copilot/ .github/{copilot-instructions.md,instructions,prompts,agents} + .vscode/mcp.json
└── openclaw/ AGENTS.md + .agents/skills/ + openclaw.json
providers/ is generated, not edited by hand. Re-render after editing
.claude/, rules/, or skills/ source:
python3 -m tools.installer render --targets all
python3 -m tools.installer render --check # exits 1 if drift
The test_committed_providers_match_render pytest case enforces drift
detection locally — there is no GitHub Actions CI by project policy.
What gets translated
When .claude/ content is rendered for non-Claude targets, the translator:
- Drops the
model:field — each target uses its own default model. - Strips Claude-specific prose — "Claude Code" → "the AI coding tool",
"the Agent tool" → "the subagent dispatch tool",
model: "inherit"is removed entirely. - Rewrites
$CLAUDE_PROJECT_DIR— to..inproviders/(relative to the cloned repo), or to absolute paths into the cloned source repo when installing into a user's project. - Maps
effort:frontmatter tomodel_reasoning_effortin Codex TOML. - Caps body length — Copilot agents are truncated at 30,000 chars (Copilot's hard limit). Windsurf rules are chunked at 12,000 chars (workspace) / 6,000 chars (global).
- Adds Copilot subagent links — orchestrator agents (chain-builder,
correlator, recon-ranker) get an
agents:list of siblings so Copilot wires the dispatch graph.
Installer management
pentest-agents list # detect which targets are installed
pentest-agents install --targets claude_code,codex --scope global
pentest-agents install --dry-run # preview every file + JSON merge
pentest-agents verify # check manifest vs. disk (drift)
pentest-agents uninstall # reverse, restore .pa-backup files
pentest-agents render --targets all # regenerate providers/<id>/
pentest-agents render --check # drift gate (exit 1 if dirty)
Every install records a manifest (.pentest-agents/manifest.json for project
scope, ~/.config/pentest-agents/manifest.json for global). Uninstall only
removes files we wrote and surgically strips only the MCP/JSON keys we merged —
your other settings are never touched. Conflicting writes back up the original
as <path>.pa-backup and are restored on uninstall.
Workflow
New program: /new → /sync → /brain init → /analyze → /surface → /hunt
Returning: /resume <target> → /hunt or /autopilot
After finding: /validate → /chain → /report → /dupcheck → /submit → /learn
Batch triage: /triage (7-Question Gate on all findings)
MCP Servers (2)
bounty-platforms (16 platforms)
HackerOne (full API), Bugcrowd, Intigriti, Immunefi (public), YesWeHack + 11 stubs. 7 MCP tools: list_platforms, get_program_scope, get_program_policy, search_hacktivity, sync_program, draft_report, submit_report.
writeup-search (BYO index)
Searchable knowledge base agents query during hunting and validation. 4 MCP tools:
search_writeups— semantic search (FAISS) or keyword search for prior artget_writeup— full writeup content by IDsearch_techniques— exploitation techniques by vuln classsearch_payloads— curated payloads fromrules/payloads.md
The writeup index is not bundled. Bulk-redistributing scraped hacktivity violates most platform ToS, so this repo ships the server only. The
search_payloads+search_techniquesfallback works out of the box; the semantic/keyword layers activate once you point the server at your own index.
Three search modes (auto-detected, graceful fallback):
| Mode | Requires | Searches |
|------|----------|----------|
| FAISS (semantic) | faiss-cpu, sentence-transformers, your metadata.db + index.faiss | Your writeup corpus via vector embeddings |
| SQLite (keyword) | Your metadata.db only | Your writeup corpus via LIKE over the text column |
| Local (default) | Nothing — zero deps | rules/payloads.md + skills/ shipped in this repo |
Point the server at your index by dropping metadata.db (+ optionally index.faiss) into ~/.local/share/pentest-writeups/, or set WRITEUP_DB_DIR=/path/to/dir.
Expected schema (metadata.db): a SQLite file with at least one table containing columns id, title, url, and one text column (content / text / body / writeup). Row order in the table must match vector order in index.faiss when using semantic mode.
Build your own index — rag-builder/
The repo now ships a local RAG/FAISS builder under rag-builder/ that turns a list of GitHub / GitLab repositories into a metadata.db + index.faiss pair the writeup-search MCP server consumes. Destructive operations (clone, embed, write) are always gated behind --execute — running the CLI without it prints the plan and changes nothing, so you can never wipe an existing index by accident.
cd rag-builder
# 1. Inspect the plan — no network, no writes.
python3 build.py status
python3 build.py ingest # dry-run (the default)
# 2. Opt-in pre-flight: probe every URL with `git ls-remote` (network).
python3 build.py ingest --check-remotes # ~5s for 141 repos at 16 workers
# 3. Actually clone + index every repo from repos.yaml into ./data/.
python3 build.py ingest --execute
python3 build.py ingest --execute --check-remotes # skip unreachable first
# 4. Point the MCP server at the output.
export WRITEUP_DB_DIR="$PWD/data"
python3 ../mcp-writeup-server/server.py --test
rag-builder/repos.yaml ships with a 146-entry seed covering CTF archives, bug-bounty reports, payload collections, and research aggregators — edit freely. repos-skipped.yaml is loaded automatically as an exclusion list (override with --skip-list or --no-skip-list). config.yaml controls the embedding model (all-MiniLM-L6-v2 by default), host allowlist, clone size cap, and file-size ceiling. See rag-builder/README.md for the full reference.
CC Hooks (automatic cost tracking)
Configured in settings.json, fires automatically:
- SubagentStop →
cost_hook.pylogs agent name + session tocost-tracking.json - Stop → logs session end
- SessionStart → welcome message
Statusline shows live cost from session token data: $0.57
Commands (26)
Hunting & Analysis
| Command | Description |
|---------|-------------|
| /hunt <target> [--vuln-class] | Active hunting — searches writeup DB for techniques first, then tests with concrete payloads |
| /autopilot <target> | Autonomous loop with --paranoid/--normal/--yolo checkpoints |
| /surface <target> | P1/P2/Kill ranked attack surface |
| /chain | Build A→B→C exploit chains via chain-builder agent (9 capability rows + 4 documented deep chains in rules/chain-table.md) |
| /analyze <target> | AI analysis: crown jewels, attack paths, blind spots |
| /mindmap <target> | Attack surface tree with brain status |
| /sast <repo> | Source-code vulnerability hunting (entry → flow → gap → exploit pipeline) |
Validation & Reporting
| Command | Description |
|---------|-------------|
| /validate <finding> | 7-Question Gate → PASS/KILL/DOWNGRADE/CHAIN REQUIRED |
| /triage | Batch-validate ALL findings, kill weak ones |
| /quality <draft> | Score report 1-10 (blocks below 7) |
| /report [format] | Reports (hard gate: requires /validate PASS) |
| /dupcheck <desc> | Hacktivity + writeup DB for duplicates |
| /submit <finding> | Submit (hard gate: /validate PASS + /quality ≥ 7) |
Session & Memory
| Command | Description |
|---------|-------------|
| /resume <target> | Resume — untested endpoints + suggestions |
| /remember | Log finding/pattern for cross-target learning |
| /learn <id> <status> | Record response — auto-boosts paid techniques |
| /brain | init, brief, status, endpoint, endpoints, record, exhausted |
Infrastructure
| Command | Description |
|---------|-------------|
| /new, /sync, /status | Setup + dashboard |
| /pipeline, /quickscan, /fullscan | Scanning pipelines |
| /correlate | Chain discovery across findings |
| /cost, /monitor | Cost tracking, target change detection |
Agents (50)
H1 Weakness Specialists (19)
xss-hunter (#60/#61/#62), sqli-hunter (#67), csrf-hunter (#57), ssrf-hunter (#75), ssti-hunter (#74), idor-hunter (#55), auth-tester (#27), info-disclosure (#18), open-redirect (#38), rce-hunter (#70), xxe-hunter (#63), file-upload (#39), cors-hunter (#58), subdomain-takeover (#145), business-logic (#28), race-condition (#29), privilege-escalation (#26), oauth-hunter (#1/#22/#106/#137), llm-ai-hunter (chains under #18/#55/#61/#70/#106)
Hunting & Analysis (3)
- validator — 7-Question Gate + never-submit list (PASS/KILL/DOWNGRADE/CHAIN)
- chain-builder — A→B chain walk against the capability table, searches writeup DB for proven chains
- recon-ranker — P1/P2/Kill surface ranking
Infrastructure / Recon (10)
recon, vuln-scanner, config-auditor, cloud-recon, js-analyzer, waf-profiler, graphql-audit, nuclei-writer, browser-agent (Burp MCP), browser-stealth-agent (Camoufox)
Meta / Validation (9)
brain, correlator, quality-check, monitor, poc-builder, report-writer, scope-check, browser-verifier (client-side PoC proof), dast-devils-advocate (adversarial downgrade)
SAST Pipeline (8)
sast-file-ranker, sast-entry-mapper, sast-danger-mapper, sast-flow-tracer, sast-gap-analyzer, sast-devils-advocate, sast-hunter, sast-exploit-builder
Specialized (1)
web3-auditor — Solidity grep arsenal, Foundry PoC, DeFi patterns
Hunting Skills (5 deep methodology skills + 6 reference skills = 11)
The hunt-* skills are vuln-class-specific methodology files distilled from
public bug-bounty reports. Each has a verified 2024-2026 CVE catalog and
sub-techniques. The matching specialist agent reads its skill via
Read $CLAUDE_PROJECT_DIR/skills/hunt-<class>/SKILL.md before testing.
| Skill | Lines | Pairs With | Highlights |
|-------|-------|------------|------------|
| skills/hunt-rce/SKILL.md | 1,135 | rce-hunter | 1,218-report distillation. RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle, LangChain REPL, Tekton/OpenProject git arg injection, ingress-nginx, container/runtime, ML serving, agentic LLM tool-use, OSS supply chain |
| skills/hunt-idor/SKILL.md | 969 | idor-hunter | 1,117-report distillation. Sam Curry automotive chain, OneUptime CVE-2026-30956, Zitadel V2Beta/Mgmt API, Inforcer tenant enum, Apache Answer UUIDv1 prediction, Indico BOLA, GraphQL field-level pivots, agentic AI cross-tenant |
| skills/hunt-xss/SKILL.md | 968 | xss-hunter | DOMPurify mXSS family, Auth0 nextjs-auth0 returnTo, RSC DoS family, markdown-to-jsx, listmonk admin-ATO, Trix rich-text editor (H1 #2819573 / #2521419), Jupyter notebook XSS (GHSA-rch3-82jr-f9w9), n8n MCP OAuth XSS (GHSA-537j-gqpc-p7fq), LinkedIn-class iframe-in-article (H1 #2212950), 10 sub-techniques (A-J), Semgrep / ast-grep / ripgrep / CodeQL patterns |
| skills/hunt-oauth/SKILL.md | 770 | oauth-hunter | 365-report distillation. ruby-saml parser differentials, Authentik regex redirect_uri, workers-oauth-provider PKCE downgrade, Entra ID actor token, Hono JWT alg confusion, nOAuth, Tekton token exfil, Argo CD project token, tinyauth |
| skills/hunt-llm-ai/SKILL.md | 930 | llm-ai-hunter | OWASP LLM Top 10 v2025 + Agentic AI Top 10. Microsoft 365 Copilot ASCII Smuggling, LangChain GmailToolkit indirect injection (CVE-2025-46059), LangChain PythonREPLTool semantic RCE (CVE-2025-68613), BentoML pickle, Ollama RCE family, Open WebUI SSE injection, MLflow path traversal |
Reference skills (read by methodology-aware agents): hunting-methodology,
recon-methodology, report-writing, sast-methodology,
triage-validation, vuln-classes.
CLI Tools (19)
| Tool | Purpose | |------|---------| | brain.py | Brain with endpoint tracking + circuit breaker | | intel_engine.py | Hacktivity patterns + tech→vuln mapping | | journal.py | JSONL session journal for /resume | | target_selector.py | Program ROI ranking | | cost_hook.py | CC hook: auto-logs agent completions via SubagentStop | | statusline.py | Dashboard (--compact/--watch/--json) | | scope_check.py | Scope validation with --list | | scope_hook.py | PreToolUse hook: blocks out-of-scope Bash commands (exact + wildcard) | | cvss_version_guard.py | Enforces H1 = CVSS 3.1, other platforms = CVSS 4.0 | | file_path_guard.py | Blocks hallucinated file paths in reports | | file_safety.py | Shared safety checks for agent-written files | | dedup_findings.py | Dedup + hacktivity cross-reference | | global_brain.py | Cross-engagement knowledge (incremental hash-based sync) | | response_tracker.py | Response learning + auto-boost paid techniques | | scaffold.py | Workspace scaffolding with update mode | | capture.py | Screenshots + video (WSL2) | | cost.py | Token cost tracking + ROI | | camofox_ctl.sh | Camoufox (stealth Firefox) lifecycle — Cloudflare/Akamai bypass | | pentest-statusline.sh | CC statusline: findings, brain, context, cost |
Rules Library (rules/)
Single source of truth for every agent — all hunters, validators, and report-writers read the relevant files at session start.
| File | Lines | Purpose |
|------|-------|---------|
| hunting.md | 267 | 23 hunting rules (Rule 0 harm check, Rule 8 sibling check, Rule 9 A→B signal, Rule 19 never-submit) |
| payloads.md | 2,500 | XSS / SSRF / SQLi / IDOR / OAuth / upload / race / SSTI / deser / JWT / LFI / prototype pollution / NoSQLi / DeFi |
| techniques.md | 389 | Proven attack techniques extracted from real paid engagements |
| waf-bypass-protocol.md | 166 | WAF bypass iteration ladder for Akamai/Cloudflare/Imperva |
| vendor-status.md | 126 | Patched vendor vectors, framework fingerprints, cooldown tables |
| chain-table.md | 79 | Capability→next-bug chain table for /chain (9 capability rows + 4 documented deep chains) |
| never-submit.md | 42 | Never-submit list + conditionally-valid-with-chain table |
| mistakes.md | 661 | Top 10 most common mistakes — every agent reads this at session start |
Key Features
- Writeup search MCP: Agents query prior art during hunting — bring your own FAISS/SQLite writeup index, or fall back to the shipped payload/technique library
- CC hooks: SubagentStop/Stop auto-log costs, statusline shows live
$X.XXfrom token data - PreToolUse scope hook: Bash commands are matched (exact + wildcard) against
scope.yaml; out-of-scope targets are blocked before the tool call fires - 7-Question Gate: Every finding validated — first NO = KILL
- Depth Engine:
/autopilotenforces an anti-shallow protocol — no claim of "exhausted" until the exhaustion matrix is complete - Stacked-encoding mandate:
/huntand/autopilotrequire multi-layer encoding in every payload attempt before declaring a surface clean - CVSS policy guard: HackerOne findings use CVSS 3.1; every other platform uses CVSS 4.0 — enforced by
cvss_version_guard.py - Circuit breaker: 5× consecutive 403/429 → auto-backoff 60s
- Endpoint tracking: Brain records every endpoint tested per target
- Hard validation gates: /report and /submit refuse without /validate PASS
- Never-submit filter: Pipeline auto-kills informational findings
- Incremental sync: Global brain hash-based, skips unchanged files
- Feedback loop: /learn auto-boosts paid techniques globally
- Session journal: JSONL log for /resume continuity
Requirements
- Python 3.10+,
uv(MCP servers launch viauv run --with mcp) - Optional:
uv pip install faiss-cpu sentence-transformers(for writeup semantic search) - Security tools: nmap, httpx, subfinder, nuclei, ffuf, katana, sqlmap
- GraphQL hunter tools:
graphql-path-enum—cargo install --git https://gitlab.com/dee-see/graphql-path-enum(auto-installed bysetup-mcp.shifcargois present) - Evidence: grim/scrot, wf-recorder/ffmpeg
- jq (for statusline)
License
For authorized security testing only. Follow responsible disclosure.