Zum Inhalt springen

MCP Server / skylos

skylos

Open-source Python, TypeScript, and Go SAST with dead code detection. Finds secrets, exploitable flows, and AI regressions. VS Code extension, GitHub Action, and MCP server for AI agents.

β˜… 422von @duriantacoApache-2.0GitHub β†’

Transport

sse

Tools (20)

Goal

Command

Suite

Skylos Result

Objective

Command

Objective

Command

Objective

Command

Check

Severity

no-dangerous-sink

Critical

untrusted-input-to-prompt

Critical

output-validation

High

prompt-delimiter

High

rag-context-isolation

High

output-pii-filter

High

model-pinned

Medium

input-length-limit

Low

logging-present

Medium

cost-controls

Medium

rate-limiting

Medium

Language

Parser

Security

Quality

Python

AST

Dokumentation

πŸ“– Website Β· Documentation Β· Blog Β· GitHub Action Β· VS Code Extension Β· MCP Server

English | δΈ­ζ–‡


What is Skylos?

Skylos is an open-source static analysis tool and PR gate for Python, TypeScript, and Go. It helps teams detect dead code, hardcoded secrets, exploitable flows, and AI-generated security regressions before they land in main.

If you use Vulture for dead code, Bandit for security checks, or Semgrep/CodeQL for CI enforcement, Skylos combines those workflows with framework-aware dead code detection and diff-aware regression detection for AI-assisted refactors.

The core use case is straightforward: run it locally, add it to CI, and gate pull requests on real findings with GitHub annotations and review comments. Advanced features like AI defense, remediation agents, VS Code, MCP, and cloud upload are available, but you do not need any of them to get value from Skylos.

Best for

  • Python teams that want dead code detection with fewer false positives than Vulture
  • Repositories using Cursor, Copilot, Claude Code, or other AI coding assistants
  • CI/CD pull request gates with GitHub annotations and review comments
  • Python and TypeScript LLM applications that need OWASP LLM Top 10 checks

Available as

  • CLI for local scans and CI/CD workflows
  • GitHub Action for pull request gating and annotations
  • VS Code extension for in-editor findings and AI-assisted fixes
  • MCP server for AI agents and coding assistants

Start here

| Goal | Command | What you get | |:---|:---|:---| | Run everything local | skylos suite . | Static findings, technical debt hotspots, Python AI defense, and provenance summary in one report | | Scan a repo | skylos . -a | Dead code, risky flows, secrets, and code quality findings | | Gate pull requests | skylos cicd init | A GitHub Actions workflow with a quality gate and inline annotations | | Audit an LLM app | skylos defend . | Optional AI defense checks for Python and direct TypeScript LLM integrations |

If you only remember 4 commands

  • skylos suite . for the full local overview
  • skylos . for the focused static scan
  • skylos cicd init for CI setup
  • skylos agent scan . for hybrid static + LLM review

Why teams adopt it

  1. Better dead code signal on real frameworks: Skylos understands FastAPI, Django, Flask, pytest, Next.js, React, and more, so dynamic code produces less noise.
  2. Diff-aware AI regression detection: Skylos can catch removed auth decorators, CSRF, rate limiting, validation, logging, and other controls that disappear during AI-assisted refactors.
  3. One workflow instead of three tools: Dead code, security scanning, and PR gating live in the same CLI and CI flow.
  4. Local-first by default: You can keep scans on your machine and add optional AI or cloud features later if you need them.
  5. Self-explaining output: Every table prints a legend explaining what each column and number means β€” no manual required.
  6. Monorepo-aware TS resolution and reachability: Skylos uses declared workspaces and importer-local direct tsconfig project references during TypeScript package resolution, and it keeps workspace package entrypoints alive during dead-file and unnecessary-export analysis.
  7. AI defense now reaches TS repos too: skylos discover and skylos defend can now pick up direct TypeScript LLM integrations in Node / Next-style codepaths as a beta surface.

Benchmark Status

Skylos currently passes the checked-in regression benchmark suites, but those suites are not independent proof of state-of-the-art performance. Independent benchmarking needs frozen labels, external corpora, holdout splits, and before/after runs before analyzer fixes.

Current status: the in-repo suite is a regression gate, not a golden benchmark. BENCHMARK.md now tracks the missing golden-benchmark requirements explicitly. The independent corpus lives in a separate sibling workspace (../skylos-benchmarks) so frozen labels and external corpora can evolve outside the analyzer implementation repo. The current independent manifest set is frozen as golden-v0.2, with exact manifest, harness, and result hashes in ../skylos-benchmarks/benchmark.lock.json. Benchmark output is reported by suite -> language -> tool -> score; unsupported scanner/language pairs are shown as not applicable instead of being scored.

| Suite | Skylos Result | Baseline Comparison | |:---|:---|:---| | Dead code regression | 16 cases, TP=41 FP=0 FN=0 TN=59 | Vulture: score 77.29; Ruff: score 62.35 | | Security regression | 20 cases, TP=11 FP=0 FN=0 TN=10 | Bandit: score 47.14 on Python-applicable cases | | Quality regression | 6 cases, score 100.0 | Regression gate only | | Agent review | 25 cases, score 100.0 | Regression gate only |

Current external frozen result: on OWASP Benchmark Java, Skylos scores 61.38 with TP=17 FP=0 FN=103 TN=120, so Java security-flow coverage is the main known external gap.

Benchmark methodology and caveats β†’

πŸš€ New to Skylos? Start with CI/CD Integration

# Generate a GitHub Actions workflow in 30 seconds
skylos cicd init

# Commit and push to activate
git add .github/workflows/skylos.yml && git push

What you get:

  • Automatic dead code detection on every PR
  • Security vulnerability scanning (SQLi, secrets, dangerous patterns)
  • Quality gate that fails builds on critical issues
  • Inline PR review comments with file:line links
  • GitHub Annotations visible in the "Files Changed" tab

No configuration needed - works out of the box with sensible defaults. See CI/CD section for customization.


Table of Contents

Quick Start

If you are evaluating Skylos, start with the core workflow below. The LLM and AI defense commands are optional.

Core Workflow

| Objective | Command | Outcome | | :--- | :--- | :--- | | Everything local | skylos suite . | One report for static findings, technical debt, Python AI defense, and provenance | | First scan | skylos . | Dead code findings with confidence scoring | | Audit risk and quality | skylos . -a | Dead code, risky flows, secrets, quality, and SCA findings | | Higher-confidence dead code | skylos . --trace | Cross-reference static findings with runtime activity | | Review only changed lines | skylos . --diff origin/main | Focus findings on active work instead of legacy debt | | Local staged hook | skylos agent pre-commit . | Fast staged check for security, secrets, and high-signal quality regressions | | Gate locally | skylos --gate | Fail on findings before code leaves your machine | | Set up CI/CD | skylos cicd init | Generate a GitHub Actions workflow in 30 seconds | | Gate in CI | skylos cicd gate --input results.json | Fail builds when issues cross your threshold |

Optional Workflows

| Objective | Command | Outcome | | :--- | :--- | :--- | | Detect Unused Pytest Fixtures | skylos . --pytest-fixtures | Find unused @pytest.fixture across tests + conftest | | AI-Powered Analysis | skylos agent scan . --model gpt-4.1 | Fast static + LLM file review with dead-code verification available on demand | | Dead Code Verification | skylos agent verify . --model gpt-4.1 | Dead-code-only second pass: static findings reviewed by the LLM | | Security Audit | skylos agent scan . --security | Staged security audit with repo map, file facts, and verifier-backed evidence | | Auto-Remediate | skylos agent remediate . --auto-pr | Scan, fix, test, and open a PR β€” end to end | | Code Cleanup | skylos agent remediate . --standards | LLM-guided code quality cleanup against coding standards | | PR Review | skylos agent scan . --changed | Analyze only git-changed files | | PR Review (JSON) | skylos agent scan . --changed --format json -o results.json | LLM review with code-level fix suggestions | | Local LLM | skylos agent scan . --base-url http://localhost:11434/v1 --model codellama | Use Ollama/LM Studio (no API key needed) | | PR Review (CI) | skylos cicd review -i results.json | Post inline comments on PRs | | AI Defense: Discover | skylos discover . | Map all LLM integrations in your codebase | | AI Defense: Defend | skylos defend . | Check LLM integrations for missing guardrails | | AI Defense: CI Gate | skylos defend . --fail-on critical --min-score 70 | Block PRs with critical AI defense gaps | | Whitelist | skylos whitelist 'handle_*' | Suppress known dynamic patterns |

Cloud Uploads

| Objective | Command | What uploads | | :--- | :--- | :--- | | Upload one code scan | skylos . --danger --quality --upload | One Code Scan with danger, quality, and dead_code data | | Upload one defense scan | skylos defend . --upload | One AI Defense scan | | Upload one debt scan | skylos debt . --upload | One Technical Debt scan | | Upload the full suite | skylos suite . --upload | Separate Code Scan, AI Defense, and Technical Debt scans linked as one suite bundle | | Upload everything except danger | skylos suite . --upload --families static,defense,debt --static-categories quality,secrets,dead_code,dependency | Code/debt/defense uploads without danger findings in the code-scan payload |

Skylos now prints an explicit upload manifest before every non-JSON upload so it is clear which scan family is being sent.

For monorepos, run Skylos from the service root you want to own in Cloud, for example cd apps/api && skylos suite . --upload. The CLI sends the repo-relative project root (apps/api) with the upload so Skylos Cloud can route the scan to the matching repo + project root binding instead of mixing all services in one project.

SonarQube Migration

If you already have sonar-project.properties, generate a Skylos migration report:

skylos sonar import sonar-project.properties

Write the mapped Skylos exclusion/gate config:

skylos sonar import sonar-project.properties --write-config

The first importer maps Sonar project metadata, sonar.sources, test paths, and common exclusion keys into a Skylos migration report plus .skylos/config.yaml. Quality profiles, issue history, and coverage thresholds still require manual review.

Security Taskflow

skylos agent scan . --security now runs as an internal security taskflow instead of a single opaque LLM pass:

  • repo_map derives repo context, entrypoint hints, trust boundaries, and framework-aware file facts such as sources, sinks, and guards
  • audit performs whole-file security review with those facts in prompt context
  • verify re-reviews surviving findings and records evidence such as hypothesis, review_supported, and refuted
  • a candidate ledger keeps stable finding IDs and stage transitions internally so later workflow/reporting layers can build on the same evidence model

Example

from fastapi import FastAPI, Request
from urllib.parse import urlparse
import httpx

app = FastAPI()

@app.get("/proxy")
async def proxy(request: Request):
    target = request.query_params.get("url")
    return httpx.get(target).text

@app.get("/safe")
async def safe_proxy(request: Request):
    target = request.query_params.get("url")
    if urlparse(target).netloc not in {"internal.local"}:
        target = "https://internal.local/health"
    return httpx.get(target).text
skylos agent scan . --security --format json -o security.json
{
  "findings": [
    {
      "rule_id": "SKY-D216",
      "severity": "critical",
      "message": "Possible SSRF: tainted URL passed to HTTP client.",
      "location": {
        "file": "app.py",
        "line": 10
      },
      "symbol": "proxy",
      "metadata": {
        "security_evidence": "review_supported",
        "review_verdict": "SUPPORTED",
        "review_reason": "challenge pass resolved the SSRF flow"
      }
    }
  ],
  "summary": "Found 1 issues: 1 critical"
}

The same run also writes taskflow artifacts under .skylos/runs/<run-id>/:

  • repo_map.json
  • candidates.json
  • verified.json
  • summary.json

Technical Debt Hotspots

Use skylos debt <path> to rank structural debt hotspots without collapsing everything into a single urgency number.

  • score is the project-level structural debt score.
  • priority is the hotspot triage score used for ordering fix candidates.
  • --changed limits the visible hotspot list to changed files, but keeps the structural debt score anchored to the whole project.
# Full project debt scan
skylos debt .

# Review only changed hotspots without distorting the project score
skylos debt . --changed

# Upload debt results to Skylos Cloud
skylos debt . --upload

# Compare the current project against a saved debt baseline
skylos debt . --baseline

# Save a repo-level debt baseline
skylos debt . --save-baseline

Debt policy files such as skylos-debt.yaml are discovered from the scan target upward, and explicit CLI flags like --top override policy defaults.

Debt uploads land in Skylos Cloud as their own Technical Debt scan family. They do not get mixed into the recurring code issue inbox.

Demo

Backup (GitHub): https://github.com/duriantaco/skylos/discussions/82

Key Capabilities

The core product is dead code detection, security scanning, and PR gating. The AI-focused features below are optional layers on top of that baseline workflow.

Security Scanning (SAST)

  • Taint Analysis: Traces untrusted input from API endpoints to databases to prevent SQL Injection and XSS.
  • Secrets Detection: Hunts down hardcoded API keys (AWS, Stripe, OpenAI) and private credentials before commit.
  • Vulnerability Checks: Flags dangerous patterns like eval(), unsafe pickle, and weak cryptography.

AI-Generated Code Guardrails

Skylos can also flag common AI-generated code mistakes. Every finding includes vibe_category and ai_likelihood (high/medium/low) metadata so you can filter them separately if you want.

  • Phantom Call Detection: Catches calls to security functions (sanitize_input, validate_token, check_permission, etc.) that are never defined or imported, including local-module references like security.require_auth(). hallucinated_reference, high
  • Phantom Decorator Detection: Catches security decorators (@require_auth, @rate_limit, @authenticate, etc.) that are never defined or imported, including local-module decorators like @guards.require_auth. hallucinated_reference, high
  • Unfinished Generation: Detects functions with only pass, ..., or raise NotImplementedError β€” AI-generated stubs that silently do nothing in production. incomplete_generation, medium
  • Undefined Config: Flags os.getenv("ENABLE_X") referencing feature flags that are never defined anywhere in the project. ghost_config, medium
  • Stale Mock Detection: Catches mock.patch("app.email.send_email") where send_email no longer exists β€” AI renames functions but leaves tests pointing at the old name. stale_reference, medium
  • Security TODO Scanners: Flags # TODO: add auth placeholders that AI left behind and nobody finished.
  • Disabled Security Controls: Detects verify=False, @csrf_exempt, DEBUG=True, and ALLOWED_HOSTS=["*"].
  • Credential & Randomness Checks: Catches hardcoded passwords and random.choice() used for security-sensitive values like tokens and OTPs.

Prompt Injection and Content Scanning

These checks run under --danger and look for prompt injection patterns or obfuscated instructions in repository content.

  • Multi-File Prompt Injection Scanner: Scans Python, Markdown, YAML, JSON, TOML, and .env files for hidden instruction payloads β€” instruction overrides ("ignore previous instructions"), role hijacking ("you are now"), AI-targeted suppression ("do not flag", "skip security"), data exfiltration prompts, and AI-targeting phrases.
  • Text Canonicalization Engine: NFKC normalization, whitespace folding, and confusable replacement neutralize obfuscation before pattern matching.
  • Zero-Width & Invisible Unicode: Detects zero-width spaces, joiners, BOM, and bidi overrides (U+200B–U+202E) that hide payloads from human reviewers.
  • Base64 Obfuscation Detection: Automatically decodes base64-encoded strings and re-scans for injection content.
  • Homoglyph / Mixed-Script Detection: Flags Cyrillic and Greek characters mixed with Latin text (e.g., Cyrillic 'Π°' in password) that bypass visual review.
  • Location-Aware Severity: Findings in README files, HTML comments, and YAML prompt fields get elevated severity. Test files are automatically skipped.

Advanced: AI Defense for LLM Apps

Static analysis for AI application security that maps every LLM call in your codebase and checks for missing guardrails. Python support is mature; direct TypeScript / TSX discovery and shared guardrail checks are now available in beta.

# Discover all LLM integrations
skylos discover .

# Check defenses and get a scored report
skylos defend .

# CI gate: fail on critical gaps, require 70% defense score
skylos defend . --fail-on critical --min-score 70

# JSON output for dashboards and pipelines
skylos defend . --json -o defense-report.json

# Upload defense results to Skylos Cloud
skylos defend . --upload

# Filter by OWASP LLM Top 10 category
skylos defend . --owasp LLM01,LLM04

13 checks across defense and ops:

| Check | Severity | OWASP | What it detects | |:---|:---|:---|:---| | no-dangerous-sink | Critical | LLM02 | LLM output flowing to eval/exec/subprocess | | untrusted-input-to-prompt | Critical | LLM01 | Raw user input in prompt with no processing | | tool-scope | Critical | LLM04 | Agent tools with dangerous system calls | | tool-schema-present | Critical | LLM04 | Agent tools without typed schemas | | output-validation | High | LLM02 | LLM output used without structured validation | | prompt-delimiter | High | LLM01 | User input in prompts without delimiters | | rag-context-isolation | High | LLM01 | RAG context injected without isolation | | output-pii-filter | High | LLM06 | No PII filtering on user-facing LLM output | | model-pinned | Medium | LLM03 | Model version not pinned (floating alias) | | input-length-limit | Low | LLM01 | No input length check before LLM call | | logging-present | Medium | Ops | No logging around LLM calls | | cost-controls | Medium | Ops | No max_tokens set on LLM calls | | rate-limiting | Medium | Ops | No rate limiting on LLM endpoints |

Defense and ops scores are tracked separately β€” adding logging won't inflate your security score.

Custom policy via skylos-defend.yaml:

rules:
  model-pinned:
    severity: critical    # Upgrade severity
  input-length-limit:
    enabled: false        # Disable check
gate:
  min_score: 70
  fail_on: high

Supports OpenAI, Anthropic, Google Gemini, Cohere, Mistral, Ollama, Together AI, Groq, Fireworks, Replicate, LiteLLM, LangChain, LlamaIndex, CrewAI, and AutoGen.

TypeScript beta scope today is intentionally narrow: direct SDK patterns in .ts / .tsx / .js / .jsx for providers such as OpenAI, Anthropic, Google Gemini, LiteLLM, and Vercel AI SDK. Wrapper-heavy code and full TS agent parity are follow-up work.

Dead Code Detection & Cleanup

  • Find Unused Code: Identifies unreachable functions, orphan classes, and unused imports with confidence scoring.
  • Smart Tracing: Distinguishes between truly dead code and dynamic frameworks (Flask/Django routes, Pytest fixtures).
  • Safe Pruning: Uses LibCST to safely remove dead code without breaking syntax.

Advanced: Agents, Reviews, and Remediation

  • Context-aware audits: Combines static analysis speed with LLM reasoning to validate findings and filter noise.
  • Remediation workflow: skylos agent remediate can scan, generate fixes, run tests, and optionally open a PR.
  • Local model support: Supports Ollama and other OpenAI-compatible local endpoints if you want code to stay on your machine.

CI/CD and PR Gating

  • 30-Second Workflow Setup: skylos cicd init generates GitHub Actions workflows with sensible defaults.
  • Diff-Aware Enforcement: Gate only the lines that changed, fail on severity thresholds, and keep legacy debt manageable with baselines.
  • PR-Native Feedback: GitHub annotations, inline review comments, and optional dashboard upload keep findings where teams already work.
  • Corpus Guard: Require the Corpus Guard workflow on PRs to catch dead-code precision regressions against curated framework and language fixtures.

Safe Cleanup and Workflow Controls

  • CST-safe removals: Uses LibCST to remove selected imports or functions (handles multiline imports, aliases, decorators, async etc..)
  • Logic Awareness: Deep integration for Python frameworks (Django, Flask, FastAPI) and TypeScript (Tree-sitter) to identify active routes and dependencies.
  • Granular Filtering: Skip lines tagged with # pragma: no skylos, # pragma: no cover, or # noqa

Operational Governance & Runtime

  • Coverage Integration: Auto-detects .skylos-trace files to verify dead code with runtime data
  • Quality Gates: Enforces hard thresholds for complexity, nesting, and security risk via pyproject.toml to block non-compliant PRs
  • Interactive CLI: Manually verify and remove/comment-out findings through an inquirer-based terminal interface
  • Security-Audit Mode: Leverages an independent reasoning loop to identify security vulnerabilities

Pytest Hygiene

  • Unused Fixture Detection: Finds unused @pytest.fixture definitions in test_*.py and conftest.py
  • Cross-file Resolution: Tracks fixtures used across modules, not just within the same file

Multi-Language Support

| Language | Parser | Dead Code | Security | Quality | |----------|--------|-----------|----------|---------| | Python | AST | βœ… | βœ… | βœ… | | TypeScript/TSX | Tree-sitter | βœ… | βœ… | βœ… | | Java | Tree-sitter | βœ… | βœ… | βœ… | | Go | Standalone binary | βœ… | - | - |

Languages are auto-detected by file extension. Mixed-language repos work out of the box. No Node.js or JDK required β€” all parsers are built-in via Tree-sitter.

TypeScript Rules

| Rule | ID | What It Catches | |------|-----|-----------------| | Dead Code | | | | Functions | - | Unused functions, arrow functions, and overloads | | Classes | - | Unused classes, interfaces, enums, and type aliases | | Imports | - | Unused named, default, and namespace imports | | Methods | - | Unused methods (lifecycle methods excluded) | | Security | | | | eval() | SKY-D201 | eval() usage | | Dynamic exec | SKY-D202 | exec(), new Function(), setTimeout with string | | XSS | SKY-D226 | innerHTML, outerHTML, document.write(), dangerouslySetInnerHTML | | SQL injection | SKY-D211 | Template literal / f-string in SQL query | | Command injection | SKY-D212 | child_process.exec(), os.system() | | SSRF | SKY-D216 | fetch()/axios with variable URL | | Open redirect | SKY-D230 | res.redirect() with variable argument | | Weak hash | SKY-D207/D208 | MD5 / SHA1 usage | | Prototype pollution | SKY-D510 | __proto__ access | | Dynamic require | SKY-D245 | require() with variable argument | | JWT bypass | SKY-D246 | jwt.decode() without verification | | CORS wildcard | SKY-D247 | cors({ origin: '*' }) | | Internal URL | SKY-D248 | Hardcoded localhost/127.0.0.1 URLs | | Insecure random | SKY-D250 | Math.random() for security-sensitive ops | | Sensitive logs | SKY-D251 | Passwords/tokens passed to console.log() | | Insecure cookie | SKY-D252 | Missing httpOnly/secure flags | | Timing attack | SKY-D253 | ===/== comparison of secrets | | Storage tokens | SKY-D270 | Sensitive data in localStorage/sessionStorage | | Error disclosure | SKY-D271 | error.stack/.sql sent in HTTP response | | Secrets | SKY-S101 | Hardcoded API keys + high-entropy strings | | Quality | | | | Complexity | SKY-Q301 | Cyclomatic complexity exceeds threshold | | Nesting depth | SKY-Q302 | Too many nested levels | | Function length | SKY-C304 | Function exceeds line limit | | Too many params | SKY-C303 | Function has too many parameters | | Duplicate condition | SKY-Q305 | Identical condition in if-else-if chain | | Await in loop | SKY-Q402 | await inside for/while loop | | Unreachable code | SKY-UC002 | Code after return/throw/break/continue |

Framework-aware: Next.js convention exports (page.tsx, layout.tsx, route.ts, middleware.ts), config exports (getServerSideProps, generateMetadata, revalidate), React patterns (memo, forwardRef), and exported custom hooks (use*) are automatically excluded from dead code reports.

TypeScript dead code detection tracks: callbacks, type annotations, generics, decorators, inheritance (extends), object shorthand, spread, re-exports, and typeof references. Benchmarked at 95% recall with 0 false positives on alive code.

Installation

Basic Installation

## from pypi
pip install skylos

## with LLM-powered features (agent verify, agent remediate, etc.)
pip install skylos[llm]

## with Rust-accelerated analysis (up to 63x faster)
pip install skylos[fast]

## both
pip install skylos[llm,fast]

## or from source
git clone https://github.com/duriantaco/skylos.git
cd skylos

pip install .

skylos[fast] installs an optional Rust backend that accelerates clone detection (63x), file discovery (5x), coupling analysis, and cycle detection. Same results, just faster. Pure Python works fine without it β€” the Rust module is auto-detected at runtime.

skylos[llm] installs litellm for LLM-powered features (skylos agent verify, skylos agent remediate, --llm). Core static analysis works without it.

Container Image

Skylos publishes an official first-party multi-arch CLI image to GHCR: ghcr.io/duriantaco/skylos

# floating tag for the latest stable release
docker pull ghcr.io/duriantaco/skylos:latest

# exact release tag (replace X.Y.Z with a real release like 4.5.0)
docker pull ghcr.io/duriantaco/skylos:X.Y.Z

# check the CLI inside the container
docker run --rm ghcr.io/duriantaco/skylos:latest --version

# scan the current repository from a container
docker run --rm \
  -v "$PWD":/work \
  -w /work \
  ghcr.io/duriantaco/skylos:latest \
  . --json --no-provenance

Stable releases publish latest, major, major.minor, and full major.minor.patch tags. Pre-releases publish the exact version tag only. For CI, prefer an exact X.Y.Z tag instead of latest.

🎯 What's Next?

After installation, we recommend:

  1. Set up CI/CD (30 seconds):

    skylos cicd init
    git add .github/workflows/skylos.yml && git push
    

    This will automatically scan every PR for dead code and security issues.

  2. Run your first scan:

    skylos .                              # Dead code only
    skylos . --danger --secrets           # Include security checks
    
  3. Keep scans focused on active work:

    skylos . --diff origin/main
    
  4. Try advanced workflows only if you need them:

    skylos agent review . --model gpt-4.1
    skylos defend .
    

See all commands in the Quick Start table


Benchmarks and Evaluation

Skylos has checked-in regression benchmarks for dead code, security, quality, and agent review. These are designed to keep known behavior from regressing. They are not independent proof that Skylos is universally state of the art.

Current local regression results:

| Suite | Cases | Skylos Result | Baseline Comparison | |:---|---:|:---|:---| | Dead code | 16 | TP=41 FP=0 FN=0 TN=59, score 100.0 | Vulture score 77.29; Ruff score 62.35 | | Security | 20 | TP=11 FP=0 FN=0 TN=10, score 100.0 | Bandit score 47.14 on Python-applicable cases | | Quality | 6 | score 100.0 | Regression gate only | | Agent review | 25 | score 100.0 | Regression gate only |

The external /Users/oha/skylos-demo dead-code target currently scores TP=12 FP=0 FN=0 TN=12, but still has 92 unlabeled findings. That target is therefore a realistic smoke test, not strict ground truth.

The separate frozen corpus at ../skylos-benchmarks reports scores as suite -> language -> tool -> score. Its current external OWASP Java security slice scores Skylos at 61.38 with TP=17 FP=0 FN=103 TN=120, which points to Java security-flow recall as the next benchmark-driven gap.

For credible independent claims, Skylos needs a separate benchmark corpus with frozen labels, external sources, holdout splits, competitor configs, and before/after runs captured before analyzer fixes. The intended methodology is:

  • keep the in-repo benchmarks as regression gates
  • build independent dead-code cases from real OSS cleanup PRs, seeded real-repo injections, documented language/tool semantics, and JS/TS project graph cases comparable to Knip-style checks
  • build independent security cases from OWASP Benchmark, NIST Juliet/SARD, SATE-style real/injected programs, and Semgrep/CodeQL/gosec-style rule tests
  • split quality into deterministic static smells and real bug/fix corpora such as Defects4J, BugsInPy/PyBugHive, QuixBugs, or Bears
  • evaluate agent review with SWE-bench-style real issues, hidden tests, clean negative tasks, and cost/runtime tracking
  • report TP/FP/FN/TN, precision, recall, F1, false-positive rate, skipped scanner/language pairs, raw tool output, tool versions, and locked configs

See BENCHMARK.md for commands, current numbers, caveats, and the independent benchmark plan. The checked-in benchmark numbers remain regression results; draft independent smoke results must not be used as public performance claims until the labels are frozen and reviewed.


Projects Using Skylos

If you use Skylos in a public repository, open an issue and add it here. This list is based on self-submissions, so it will stay small until more teams opt in publicly.

| Project | Description | |---------|-------------| | Skylos | Uses Skylos on itself for dead code, security, and CI gating | | Your project here | Add yours |

Add your project β†’


How It Works

Skylos builds a reference graph of your entire codebase - who defines what, who calls what, across all files.

Parse all files -> Build definition map -> Track references -> Find orphans (zero refs = dead)

High Precision & Confidence Scoring

Static analysis often struggles with Python's dynamic nature (e.g., getattr, pytest.fixture). Skylos minimizes false positives through:

  1. Confidence Scoring: Grades findings (High/Medium/Low) so you only see what matters.
  2. Hybrid Verification: Uses LLM reasoning to double-check static findings before reporting.
  3. Runtime Tracing: Optional --trace mode validates "dead" code against actual runtime execution.

| Confidence | Meaning | Action | |------------|---------|--------| | 100 | Definitely unused | Safe to delete | | 60 | Probably unused (default threshold) | Review first | | 40 | Maybe unused (framework helpers) | Likely false positive | | 20 | Possibly unused (decorated/routes) | Almost certainly used | | 0 | Show everything | Debug mode |

skylos . -c 60  # Default: high-confidence findings only
skylos . -c 30  # Include framework helpers  
skylos . -c 0  # Everything

Framework Detection

When Skylos sees Flask, Django, FastAPI, Next.js, or React imports, it adjusts scoring automatically:

| Pattern | Handling | |---------|----------| | @app.route, @router.get | Entry point β†’ marked as used | | app.add_url_rule(...), app.add_api_route(...), app.add_route(...), app.register_listener(...), app.register_middleware(...) | Imperative route or lifecycle registration β†’ marked as used | | @pytest.fixture | Treated as a pytest entrypoint, but can be reported as unused if never referenced | | @pytest.hookimpl, @hookimpl | Plugin hook implementation β†’ marked as used | | @celery.task | Entry point β†’ marked as used | | getattr(mod, "func") | Tracks dynamic reference | | getattr(mod, f"handle_{x}") | Tracks pattern handle_* | | Next.js page.tsx, layout.tsx, route.ts | Default/named exports β†’ marked as used | | Next.js getServerSideProps, generateMetadata | Config exports β†’ marked as used | | React.memo(), forwardRef() | Wrapped components β†’ marked as used | | Exported use* hooks | Custom hooks β†’ marked as used |

Test File Exclusion

Tests call code in weird ways that look like dead code. By default, Skylos excludes:

| Detected By | Examples | |-------------|----------| | Path | /tests/, /test/, *_test.py | | Imports | pytest, unittest, mock | | Decorators | @pytest.fixture, @patch |

# These are auto-excluded (confidence set to 0)
/project/tests/test_user.py
/project/test/helper.py  

# These are analyzed normally
/project/user.py
/project/test_data.py  # Doesn't end with _test.py

Want test files included? Use --include-folder tests.

Philosophy

When ambiguous, we'd rather miss dead code than flag live code as dead.

Framework endpoints are called externally (HTTP, signals). Name resolution handles aliases. When things get unclear, we err on the side of caution.

Precision Regression Guard

Skylos ships a curated corpus of small fixtures that encode framework contracts and important Python runtime patterns we must not regress on.

Run it locally when you change analysis behavior:

python3 scripts/corpus_ci.py --manifest corpus/manifest.json

In GitHub, keep the Corpus Guard workflow required in branch protection. When you fix a confirmed false positive, add a focused fixture and expectation to the corpus in the same change.

Unused Pytest Fixtures

Skylos can detect pytest fixtures that are defined but never used.

skylos . --pytest-fixtures

This includes fixtures inside conftest.py, since conftest.py is the standard place to store shared test fixtures.

Advanced Workflows

These commands are optional. Use them when you want LLM-assisted review, remediation, or AI defense on top of the core scanner and CI gate.

Skylos uses a hybrid architecture that combines static analysis with LLM reasoning:

Why Hybrid?

| Approach | Recall | Precision | Logic Bugs | |----------|--------|-----------|------------| | Static only | Low | High | ❌ | | LLM only | High | Medium | βœ… | | Hybrid | Highest | High | βœ… |

Research shows LLMs find vulnerabilities that static analysis misses, while static analysis validates LLM suggestions. However, LLMs are prone to false positives in dead code if they are asked to invent findings from raw source alone.

Skylos now splits agent workflows into a fast review lane and a slower verification lane.

For dead code, Skylos uses a stricter contract:

  • static analysis generates the candidate list
  • repo facts and graph evidence are gathered around each candidate
  • skylos agent verify is the dedicated dead-code adjudication pass
  • skylos agent scan --verify-dead-code adds that slower verifier back into the review pipeline when you explicitly want it
  • deterministic suppressors still exist, and in judge_all mode they are attached as evidence instead of silently deciding the outcome

Use --verification-mode production if you want the cheaper deterministic-first path for agent verify.

Agent Commands

| Command | Description | |---------|-------------| | skylos agent scan PATH | Fast hybrid review: static findings plus one-pass LLM security/quality review | | skylos agent scan PATH --verify-dead-code | Same review path, plus the slower dead-code verification pass | | skylos agent scan PATH --no-fixes | Same review pipeline, skip fix suggestions (faster) | | skylos agent scan PATH --changed | Analyze only git-changed files | | skylos agent scan PATH --security | Security-only taskflow audit with repo map, file facts, and verifier-backed evidence | | skylos agent verify PATH | Dead-code-only verification pass over static findings | | skylos agent verify PATH --fix --pr | Verify, generate removal patches, create branch and commit | | skylos agent remediate PATH | End-to-end: scan, fix, test, and create PR | | skylos agent remediate PATH --standards | LLM-guided cleanup with built-in standards (or --standards custom.md) | | skylos agent triage suggest | Show auto-triage candidates from learned patterns | | skylos agent triage dismiss ID | Dismiss a finding from the queue |

Provider Configuration

Skylos supports cloud and local LLM providers:

# Cloud - OpenAI (auto-detected from model name)
skylos agent scan . --model gpt-4.1

# Cloud - Anthropic (auto-detected from model name)
skylos agent scan . --model claude-sonnet-4-20250514

# Local - Ollama
skylos agent scan . \
  --provider openai \
  --base-url http://localhost:11434/v1 \
  --model qwen2.5-coder:7b

# Cheaper dead-code verification path
skylos agent verify . \
  --model claude-sonnet-4-20250514 \
  --verification-mode production

Note: You can use the --model flag to specify the model that you want. We support Gemini, Groq, Anthropic, ChatGPT and Mistral.

Keys and configuration

Skylos can use API keys from (1) skylos key, or (2) environment variables.

Recommended (interactive)

skylos key
# opens a menu:
# - list keys
# - add key (openai / anthropic / google / groq / mistral / ...)
# - remove key

Environment Variables

Set defaults to avoid repeating flags:

# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Default to local Ollama
export SKYLOS_LLM_PROVIDER=openai
export SKYLOS_LLM_BASE_URL=http://localhost:11434/v1

LLM PR Review

skylos agent scan --changed analyzes git-changed files, runs static analysis, then uses the LLM for fast file review and code-level fix suggestions. Dead-code verification is optional and not on the critical path by default.

# Run LLM review and output JSON
skylos agent scan . --changed --model claude-sonnet-4-20250514 --format json -o llm-results.json

# Use with cicd review to post inline comments on PRs
skylos cicd review --input results.json --llm-input llm-results.json

The hybrid pipeline runs in stages:

  1. Static analysis β€” finds security, quality, and dead code issues
  2. LLM review β€” one-pass file or diff review for security, logic, quality, and performance issues static analysis may miss
  3. Optional dead-code verification β€” when requested, the LLM judges static dead-code candidates using graph evidence, repo facts, and surrounding context
  4. Code fix generation β€” for each reported finding, generates the problematic code snippet and a corrected version

Each PR comment shows the exact vulnerable lines and a drop-in replacement fix.

What LLM Analysis Detects

| Category | Examples | |----------|----------| | Hallucinations | Calls to functions that don't exist | | Logic bugs | Off-by-one, incorrect conditions, missing edge cases | | Business logic | Auth bypasses, broken access control | | Context issues | Problems requiring understanding of intent |

Local LLM Setup (Ollama)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a code model
ollama pull qwen2.5-coder:7b

# Use with Skylos
skylos agent scan ./src \
  --provider openai \
  --base-url http://localhost:11434/v1 \
  --model qwen2.5-coder:7b

Remediation Agent

The remediation agent automates the full fix lifecycle. It scans your project, prioritizes findings, generates fixes via the LLM, validates each fix by running your test suite, and optionally opens a PR.

# Preview what would be fixed (safe, no changes)
skylos agent remediate . --dry-run

# Fix up to 5 critical/high issues, validate with tests
skylos agent remediate . --max-fixes 5 --severity high

# Full auto: fix, test, create PR
skylos agent remediate . --auto-pr --model gpt-4.1

# Use a custom test command
skylos agent remediate . --test-cmd "pytest test/ -x"

Safety guardrails:

  • Dry run by default β€” use --dry-run to preview without touching files
  • Fixes that break tests are automatically reverted
  • Low-confidence fixes are skipped
  • After applying a fix, Skylos re-scans to confirm the finding is actually gone
  • --auto-pr always works on a new branch, never touches main
  • --max-fixes prevents runaway changes (default 10)

Recommended Models

| Model | Provider | Use Case | |-------|----------|----------| | gpt-4.1 | OpenAI | Best accuracy | | claude-sonnet-4-20250514 | Anthropic | Best reasoning | | qwen2.5-coder:7b | Ollama | Fast local analysis | | codellama:13b | Ollama | Better local accuracy |

CI/CD

Run Skylos in your CI pipeline with quality gates, GitHub annotations, and PR review comments.

Quick Start (30 seconds)

# Auto-generate a GitHub Actions workflow
skylos cicd init

# Commit and activate
git add .github/workflows/skylos.yml && git push

That's it! Your next PR will have:

  • Dead code detection
  • Security scanning (SQLi, SSRF, secrets)
  • Quality checks
  • Inline PR comments with clickable file:line links
  • Quality gate that fails builds on critical issues

Want AI-powered code fixes on PRs?

skylos cicd init --llm --model claude-sonnet-4-20250514

This adds an LLM step that generates code-level fix suggestions β€” showing the vulnerable code and the corrected version inline on your PR.

Optional GitHub Secrets

For the default skylos cicd init workflow, you do not need any Skylos-specific secrets. Add these only if you enable the matching feature in GitHub Actions (Settings > Secrets and variables > Actions):

| Secret | When needed | Description | |--------|-------------|-------------| | ANTHROPIC_API_KEY | If using Claude models | Your Anthropic API key | | OPENAI_API_KEY | If using GPT models | Your OpenAI API key | | SKYLOS_API_KEY | For Skylos Cloud features | Get from skylos.dev | | SKYLOS_TOKEN | If using --upload | Upload token from skylos.dev/dashboard/settings |

GH_TOKEN is automatically provided by GitHub Actions β€” no setup needed for PR comments.

Release Automation

Skylos uses a split release workflow:

  • .github/workflows/release-please.yml updates CHANGELOG.md, bumps pyproject.toml, and opens or updates the release PR.
  • .github/workflows/publish.yml publishes from an immutable release tag, pushing both PyPI artifacts and the multi-arch GHCR image ghcr.io/duriantaco/skylos.
  • For protected repos, prefer a dedicated RELEASE_PLEASE_TOKEN secret so bot-authored release PRs can satisfy required PR checks.
  • On the first GHCR publish, make the skylos package public if anonymous pulls return 403 Forbidden.

First-time bootstrap (already configured in this repo)

Release Please is bootstrapped with:

  • tools/release/.release-please-manifest.json set to 4.2.1
  • tools/release/release-please-config.json set with bootstrap-sha at the commit that prepared 4.2.1 (a498b27b6902b34e469acfddac1068635aae8122)

This prevents backfilling old history and starts automated releases from the current baseline.

Normal release flow

  1. Merge conventional commits into main (for example: feat: ..., fix: ...).
  2. Release Please opens/updates the release PR.
  3. Merge the release PR to main.
  4. Release Please creates the GitHub release tag (vX.Y.Z).
  5. The tag push triggers .github/workflows/publish.yml.
  6. Skylos builds and publishes to PyPI from that tag using PYPI_TOKEN.
  7. The same workflow smoke-tests and pushes ghcr.io/duriantaco/skylos for linux/amd64 and linux/arm64.

Manual build/publish checks

If you need to validate packaging before a release:

python -m pip install --upgrade pip
python -m pip install "build>=1.2.2" "twine>=6.1.0"
python -m build --sdist --wheel --outdir dist
python -m twine check dist/*

If you need to validate the container locally before a release:

docker build -t skylos:local .
docker run --rm skylos:local --version
docker run --rm \
  -v "$PWD/benchmarks/quality/fixtures/argument_overload:/work" \
  -w /work \
  skylos:local \
  . --json --no-provenance

If you need to manually run the fallback publish workflow, use Actions -> Build and publish -> Run workflow and set ref to the exact release tag (for example vX.Y.Z). Do not use a branch name.

After the first successful GHCR publish, verify the package is publicly pullable:

docker pull ghcr.io/duriantaco/skylos:latest
docker run --rm ghcr.io/duriantaco/skylos:latest --version

If docker pull fails with 403 Forbidden, go to GitHub -> Packages -> skylos -> Package settings and change visibility to Public.

PR title types used for release semantics

Skylos validates semantic PR titles via .github/workflows/pr-title.yml with these allowed types:

  • feat
  • fix
  • docs
  • refactor
  • test
  • chore
  • perf
  • style
  • ci
  • infra
  • revert

For complete release ownership, guardrails, and recovery steps, see RELEASE_WORKFLOW.md.

Release Workflow Runbook

Release roles, prerequisites, branch protection guidance, semantic type policy, and incident recovery steps are documented in RELEASE_WORKFLOW.md.

Command Reference

Core Analysis

| Command | Description | |---------|-------------| | skylos suite <path> | Full local bundle: static analysis, debt, Python AI defense, and provenance summary | | skylos <path> | Dead code, security, and quality analysis | | skylos debt <path> | Technical debt hotspot analysis with baseline-aware prioritization | | skylos discover <path> | Map LLM/AI integrations in your codebase | | skylos defend <path> | Check LLM integrations for missing defenses | | skylos sonar import <file> | Convert sonar-project.properties into a Skylos migration report/config |

AI Agent

| Command | Description | |---------|-------------| | skylos agent scan <path> | Fast hybrid static + LLM review | | skylos agent verify <path> | LLM-verify dead code (100% accuracy) | | skylos agent remediate <path> | Auto-fix issues and create PR | | skylos agent watch <path> | Continuous repo monitoring with optional triage pattern learning | | skylos agent pre-commit <path> | Staged local hook for security, secrets, and high-signal quality regressions | | skylos agent triage | Manage finding triage (dismiss/snooze) |

CI/CD

| Command | Description | |---------|-------------| | skylos cicd init | Generate GitHub Actions workflow | | skylos cicd gate | Quality gate (CI exit code) | | skylos cicd annotate | Emit GitHub Actions annotations | | skylos cicd review | Post inline PR review comments |

Account

| Command | Description | |---------|-------------| | skylos login | Connect to Skylos Cloud | | skylos whoami | Show connected account info | | skylos key | Manage API keys | | skylos credits | Check credit balance |

Utility

| Command | Description | |---------|-------------| | skylos init | Initialize config in pyproject.toml | | `skyl

skylos | hub.ai-engineering.at