MCP Servers / OmniRoute
OmniRoute
Never stop coding. The free AI gateway — one endpoint, 160+ providers, zero downtime. Smart 4-tier auto-fallback (Subscription → API → Cheap → Free), prompt compression (save 15-75% tokens), 3-level proxy for geo-blocks, MCP Server (29 tools), A2A Protocol, 10 multi-modal APIs, and Desktop/Android/PWA apps.
Installation
claude mcp add omniroute -- npx -y omniroute
npx -y omniroute
npm: omniroute
Transport
Tools (20)
Page
Screenshot
Mode
Savings
Problem
OmniRoute Solution
Problem
Solution
Step
Action
Image
Tag
Description
latest
~250MB
Platform
Install
Level
Badge
Tier
Provider
Monthly
GitHub users
Cerebras
**FREE** (1M tok/day)
Groq
**FREE** (30 RPM)
None
Best price/quality reasoning
None
Fastest + tool calling, ultralow
None
Reasoning flagship from xAI
Mistral
Free trial + paid
OpenRouter
Pay-per-use
Pay-per-use
None
Documentation
🚀 OmniRoute — The Free AI Gateway
Never stop coding. Save 15-75% tokens with prompt compression + auto-fallback to FREE & low-cost AI models.
The most complete open-source AI proxy — one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (29 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.
Chat Completions • Responses API • Embeddings • Image Generation • Video • Music • Audio Speech/Transcription • Reranking • Moderations • Web Search • MCP Server • A2A Protocol • 4,600+ Tests • 100% TypeScript
🔥 Limited offer: Sign up at AgentRouter and get $100 in free AI credits Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits →
🚀 Quick Start • 💡 Features • 🗜️ Compression • 💰 Pricing • 🎯 Use Cases • 🌍 Proxy • ❓ FAQ • 📖 Docs • 💬 WhatsApp
🌐 Available in: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština
🖼️ Main Dashboard
📸 Dashboard Preview
| Page | Screenshot | | -------------- | ------------------------------------------------- | | Providers | | | Combos | | | Analytics | | | Health | | | Translator | | | Settings | | | CLI Tools | | | Usage Logs | | | Endpoints | |
🤖 Free AI Provider for your favorite coding agents
Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.
📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota
📺 OmniRoute in Action — Video Guides
🎬 Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.
🤔 Why OmniRoute?
Stop wasting money, tokens and hitting limits:
❌ Subscription quota expires unused every month
❌ Rate limits stop you mid-coding
❌ Tool outputs (git diff, grep, ls...) burn tokens fast
❌ Expensive APIs ($20-50/month per provider)
❌ Manual switching between providers
❌ Each provider has a different API format
❌ AI providers blocked in your country
OmniRoute solves all of this:
✅ Prompt Compression — auto-compress prompts & tool outputs, save 15-75% tokens per request ✅ Maximize subscriptions — track quota, use every bit before reset ✅ Auto fallback — Subscription → API Key → Cheap → Free, zero downtime ✅ Multi-account — round-robin between accounts per provider ✅ Format translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API, any tool works ✅ 3-level proxy — bypass geo-blocks with global, per-provider, and per-key proxies ✅ 10 multi-modal APIs — chat, images, video, music, audio, search in one endpoint ✅ MCP + A2A — 29 MCP tools + agent-to-agent protocol, production-ready ✅ Universal — works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool
📧 Support
💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.
- Website: omniroute.online
- GitHub: github.com/diegosouzapw/OmniRoute
- Issues: github.com/diegosouzapw/OmniRoute/issues
- WhatsApp: Community Group
- Contributing: See CONTRIBUTING.md, open a PR, or pick a
good first issue - Original Project: 9router by decolua
🐛 Reporting a Bug?
When opening an issue, please run the system-info command and attach the generated file:
npm run system-info
This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.
🛠️ Supported CLI Tools
OmniRoute works seamlessly with 16+ AI coding tools — one config, all tools:
📖 Full setup for each tool: docs/CLI-TOOLS.md
🌐 Supported Providers — 160+
🔐 OAuth Providers
🆓 Free Providers (No Cost)
🔑 API Key Providers (120+)
Alibaba · Amazon Q · AssemblyAI · Baidu Qianfan · Baseten · Black Forest Labs · Blackbox · Brave Search · Bytez · CablyAI · Cartesia · ChatGPT Web · Chutes.ai · Clarifai · Codestral · CrofAI · DataRobot · Deepgram · ElevenLabs · Empower · Exa Search · Fal.ai · Featherless AI · FenayAI · FriendliAI · Galadriel · GigaChat · GitLab Duo · GLHF Chat · GoAPI · Heroku AI · Hyperbolic · IBM watsonx · Inference.net · Inworld · Jina AI · Kilo Gateway · Lambda AI · LaoZhang · Linkup Search · LlamaGate · Maritalk · Modal · Moonshot AI · Morph · Muse Spark · NanoBanana · NanoGPT · NLP Cloud · Nous Research · Novita AI · nScale · OCI · Ollama Cloud · OVHcloud · PiAPI · PlayHT · Poe · Predibase · PublicAI · Qwen Code · Recraft · Reka · Runway · SAP · Scaleway · SearchAPI · SearXNG · Serper · Stability AI · Synthetic · Tavily · TheB.AI · Topaz · Upstage · v0 (Vercel) · Vercel AI Gateway · Volcengine · Voyage AI · W&B Inference · Xiaomi MiMo · You.com · Z.AI · + OpenAI/Anthropic-compatible custom endpoints
🏠 Self-Hosted
🔄 How It Works
┌─────────────┐
│ Your CLI │ (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│ Tool │
└──────┬──────┘
│ http://localhost:20128/v1
↓
┌──────────────────────────────────────────────────┐
│ OmniRoute (Smart Router) │
│ • 🗜️ Prompt Compression (save 15-75% tokens) │
│ • Format translation (OpenAI ↔ Claude ↔ Gemini) │
│ • Quota tracking + Embeddings + Images │
│ • Auto token refresh + Rate limit management │
└──────┬───────────────────────────────────────────┘
│
├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
│ ↓ quota exhausted
├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
│ ↓ budget limit
├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
│ ↓ budget limit
└─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)
Result: Never stop coding, minimal cost + 15-75% token savings
🗜️ Prompt Compression — Save 15-75% Tokens Automatically
Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage on every request — before it even reaches the provider. Inspired by Caveman (⭐ 51K+).
How It Works
Every request passes through the compression pipeline transparently — no client changes needed:
┌──────────────────┐ ┌─────────────────────────────┐ ┌──────────────┐
│ Client sends │────▶│ OmniRoute Compression │────▶│ Provider │
│ full prompt │ │ Pipeline (5 modes) │ │ receives │
│ (10,000 tok) │ │ │ │ compressed │
│ │ │ 🪶 Lite ........... ~15% │ │ (2,500 tok) │
│ │ │ 🪨 Standard ....... ~30% │ │ │
│ │ │ ⚡ Aggressive ..... ~50% │ │ 💰 75% saved │
│ │ │ 🔥 Ultra .......... ~75% │ │ │
└──────────────────┘ └─────────────────────────────┘ └──────────────┘
5 Compression Modes
| Mode | Savings | Technique | Best For | | ------------------------- | ------- | ----------------------------------------------------------------------------------------------- | -------------------------------------- | | Off | 0% | No compression | When you need exact prompts | | 🪶 Lite | ~15% | Whitespace collapse, dedup system prompts, image URL shortening | Always-on safe default | | 🪨 Standard (Caveman) | ~30% | 30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedup | Daily coding with Claude/Codex | | ⚡ Aggressive | ~50% | All standard + progressive message aging + tool result summarization + LLM-based compression | Long sessions with many tool calls | | 🔥 Ultra | ~75% | All aggressive + heuristic token pruning + stopword removal + score-based filtering | Maximum savings when tokens are scarce |
Before & After (Standard/Caveman Mode)
🗣️ Before compression (69 tokens):
"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."
🪨 After compression (19 tokens):
"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."
Same answer. 72% less tokens. Zero accuracy loss.
Architecture
Request Body
│
├─ strategySelector.ts ─── Picks mode (config / combo override / auto-trigger)
│
├─ lite.ts ─────────────── Whitespace, dedup, image URLs, redundant content
├─ caveman.ts ──────────── 30+ regex rules via cavemanRules.ts
│ └─ preservation.ts ─── Protects code blocks, URLs, JSON from compression
├─ aggressive.ts ───────── Summarizer + tool result compressor + progressive aging
│ ├─ summarizer.ts ───── Rule-based message summarization
│ ├─ toolResultCompressor.ts ── file/grep/shell/JSON/error compression
│ └─ progressiveAging.ts ──── Older messages → shorter summaries
└─ ultra.ts ────────────── Heuristic token scoring + pruning
└─ ultraHeuristic.ts ─ Stopword detection, score thresholds, force-preserve
Configuration
Dashboard → Settings → Compression → Pick your mode
Or per-combo override:
{
"comboOverrides": {
"my-coding-combo": "standard",
"my-cheap-combo": "ultra"
}
}
Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.
🪨 Fun fact: The standard/caveman mode is inspired by Caveman — the viral project that proved "caveman speak" cuts 65% of tokens while keeping 100% technical accuracy. OmniRoute takes this further with a 5-mode pipeline that goes from gentle whitespace cleanup all the way to aggressive heuristic pruning.
📖 Full compression documentation: docs/COMPRESSION_GUIDE.md
🎯 What OmniRoute Solves
Every developer using AI tools faces these problems daily. OmniRoute solves them all.
| # | Problem | OmniRoute Solution | | --- | ---------------------------------------- | ----------------------------------------------------------------------------------------------- | | 💸 | Subscription quota expires mid-coding | Smart 4-Tier Fallback — auto-routes Subscription → API Key → Cheap → Free | | 🔌 | Each provider has a different API format | Format Translation — unified endpoint translates OpenAI ↔ Claude ↔ Gemini ↔ Responses | | 🌐 | AI providers block my country/region | 3-Level Proxy — global, per-provider, and per-key proxy with TLS fingerprint spoofing | | 🆓 | Can't afford AI subscriptions | 11 Free Providers — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM... | | 🔒 | Gateway is exposed without protection | API Key Management — scoping, rotation, IP filtering, rate limiting, prompt injection guard | | 🛑 | Provider went down, lost coding flow | Circuit Breakers — auto-failover with cooldown, retry, anti-thundering herd | | 🔧 | Configuring each CLI tool is tedious | CLI Tools Dashboard — one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo | | 🔑 | Managing OAuth tokens is hell | Auto Token Refresh — OAuth PKCE for 8 providers, multi-account, LAN/remote fix | | 📊 | Don't know how much I'm spending | Cost Analytics — per-token tracking, budget limits, usage stats per API key | | 🐛 | Can't diagnose errors in AI calls | Unified Logs — 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry |
| # | Problem | Solution |
| --- | --------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| 11 | Deploying/maintaining is complex | npm global, Docker multi-arch, Electron, Termux — deploy anywhere |
| 12 | Interface is English-only | 40+ languages with RTL support |
| 13 | Need more than chat (images, audio, video) | 10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch |
| 14 | No way to test/compare models | LLM Evals, Translator Playground, Chat Tester, Live Monitor |
| 15 | Need to scale without losing performance | Semantic cache, request dedup, rate limit detection, queue & pacing |
| 16 | Want to control model behavior globally | System prompt injection, thinking budget, wildcard routing |
| 17 | Need MCP tools as first-class features | 29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail |
| 18 | Need A2A orchestration | JSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths |
| 19 | Need real MCP process health | Runtime heartbeat, PID tracking, UI status cards |
| 20 | Need auditable MCP execution | SQLite-backed audit with filters, pagination, stats |
| 21 | Need scoped MCP permissions | 10 granular scopes per integration |
| 22 | Need operational controls without redeploying | Combo switches, resilience tuning, breaker resets from dashboard |
| 23 | Need A2A task lifecycle visibility | Task listing/filtering, drill-down, cancellation |
| 24 | Need active stream metrics | Active stream counters, per-state counts, A2A dashboard cards |
| 25 | Need standard agent discovery | Agent Card at /.well-known/agent.json |
| 26 | Need protocol discoverability | Consolidated Endpoints page with Proxy, MCP, A2A, API tabs |
| 27 | Need E2E protocol validation | Real MCP SDK + A2A client flows in test:protocols:e2e |
| 28 | Need unified observability | Health + audit + telemetry across OpenAI, MCP, and A2A layers |
| 29 | Need one runtime for proxy + tools + agents | OpenAI proxy + MCP + A2A in one stack with shared auth/resilience |
| 30 | Need agentic workflows without glue-code | Unified endpoint, protocol UIs, production-ready foundations |
| 31 | Long sessions crash with context limits | Proactive context compression, structural integrity guards, multi-layer dropping |
📖 Deep dives: Resilience Guide • Proxy Guide • Setup Guide • Compression Guide
🆓 Start Free — Zero Configuration Cost
Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.
| Step | Action | Providers Unlocked |
| ---- | -------------------------------------------------- | ------------------------------------------------------------------ |
| 1 | Connect Kiro (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 — unlimited |
| 2 | Connect Qoder (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited |
| 3 | Connect Qwen (Device Code) | qwen3-coder-plus, qwen3-coder-flash... — unlimited |
| 4 | Connect Gemini CLI (Google OAuth) | gemini-3-flash, gemini-2.5-pro — 180K/mo free |
| 5 | /dashboard/combos → Free Stack ($0) template | Round-robin all free providers automatically |
Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.
Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).
⚡ Quick Start
1) Install and run
npm install -g omniroute
omniroute
Dashboard opens at http://localhost:20128 · API at http://localhost:20128/v1.
2) Connect providers
- Dashboard → Providers → connect at least one provider (OAuth or API key)
- Dashboard → Endpoints → create an API key
- Dashboard → Combos → set your fallback chain (optional)
3) Point your coding tool
Base URL: http://localhost:20128/v1
API Key: [copy from Endpoint page]
Model: if/kimi-k2-thinking (or any provider/model)
Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.
Docker:
docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest
From source:
cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev
pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute
Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service
MCP: omniroute --mcp (stdio transport)
CLI options: omniroute --port 3000, omniroute --no-open, omniroute --help
Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute
Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)
📖 Full details: Setup Guide · Docker · Void Linux template
🐳 Docker
OmniRoute is available as a public Docker image on Docker Hub.
Quick run:
docker run -d \
--name omniroute \
--restart unless-stopped \
--stop-timeout 40 \
-p 20128:20128 \
-v omniroute-data:/app/data \
diegosouzapw/omniroute:latest
With environment file:
# Copy and edit .env first
cp .env.example .env
docker run -d \
--name omniroute \
--restart unless-stopped \
--stop-timeout 40 \
--env-file .env \
-p 20128:20128 \
-v omniroute-data:/app/data \
diegosouzapw/omniroute:latest
Using Docker Compose:
# Base profile (no CLI tools)
docker compose --profile base up -d
# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d
Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings → Appearance without changing active tunnel state.
Notes:
- Quick Tunnel URLs are temporary and change after every restart.
- Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
- Managed install currently supports Linux, macOS, and Windows on
x64/arm64. - Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set
CLOUDFLARED_PROTOCOL=quicorautoif you want a different transport. - Docker images bundle system CA roots and pass them to managed
cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container. - SQLite runs in WAL mode.
docker stopshould be allowed to finish so OmniRoute can checkpoint the latest changes back intostorage.sqlite. - The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep
--stop-timeout 40(or similar) so manual stops do not cut off shutdown cleanup. - Set
CLOUDFLARED_BIN=/absolute/path/to/cloudflaredif you want OmniRoute to use an existing binary instead of downloading one.
Using Docker Compose with Caddy (HTTPS Auto-TLS):
OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.
services:
omniroute:
image: diegosouzapw/omniroute:latest
container_name: omniroute
restart: unless-stopped
volumes:
- omniroute-data:/app/data
environment:
- PORT=20128
- NEXT_PUBLIC_BASE_URL=https://your-domain.com
caddy:
image: caddy:latest
container_name: caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128
volumes:
omniroute-data:
| Image | Tag | Size | Description |
| ------------------------ | -------- | ------ | --------------------- |
| diegosouzapw/omniroute | latest | ~250MB | Latest stable release |
| diegosouzapw/omniroute | 3.7.8 | ~250MB | Current version |
📖 Full Docker documentation: docs/DOCKER_GUIDE.md — Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.
📱 Multi-Platform — Run Anywhere
OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).
| Platform | Install | Highlights |
| -------------- | -------------------------------------------- | -------------------------------------------------------------------------- |
| 🖥️ Desktop | npm run electron:build | Native window, system tray, auto-start, offline mode — Windows/macOS/Linux |
| 📱 Android | pkg install nodejs-lts && npx -y omniroute | ARM native, no root, 24/7 via Termux:Boot — your phone is an AI server |
| 📲 PWA | "Add to Home Screen" in browser | Fullscreen, offline page, service worker caching — Android/iOS/Desktop |
- Native Electron app with system tray, auto-start, native notifications
- One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
- Dev:
npm run electron:dev· Build:npm run electron:build - 📖 Full docs:
electron/README.md
pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest
Access from any device on the same network: http://PHONE_IP:20128/v1
- 📖 Full guide:
docs/TERMUX_GUIDE.md
- Android (Chrome): ⋮ → "Add to Home screen"
- iOS (Safari): Share → "Add to Home Screen"
- Desktop (Chrome/Edge): Install icon in address bar
- 📖 Full docs:
docs/PWA_GUIDE.md
🌍 Bypass Geographic Blocks — Use AI From Any Country
🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.
| Level | Badge | Configure In | Use Case | | ------------------ | ----- | ------------------ | ------------------------------- | | Global | 🟢 | Settings → Proxy | All traffic through one proxy | | Per-Provider | 🟡 | Provider → Proxy | Only specific providers proxied | | Per-Connection | 🔵 | Connection → Proxy | Each API key uses its own proxy |
What gets proxied: API requests ✅ • OAuth flows ✅ • Connection tests ✅ • Token refresh ✅ • Model sync ✅
Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies
🆓 1proxy — Free Proxy Marketplace
No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:
- One-click sync (up to 500 proxies) • Quality scores (0-100) • Country filter • Auto-rotation (quality/random/sequential) • Auto-degradation • Circuit breaker
Anti-Detection
- 🔒 TLS Fingerprint Spoofing — browser-like TLS via
wreq-js - 🔏 CLI Fingerprint Matching — matches native CLI binary signatures
- 🏠 Proxy IP Preservation — stealth + IP masking simultaneously
📖 Full proxy documentation: docs/PROXY_GUIDE.md
💰 Pricing at a Glance
| Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- | | 💳 SUBSCRIPTION | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | FREE | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | 🔑 API KEY | NVIDIA NIM | FREE (dev forever) | ~40 RPM | 70+ open models | | | Cerebras | FREE (1M tok/day) | 60K TPM / 30 RPM | World's fastest | | | Groq | FREE (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma | | | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning | | | xAI Grok-4 Fast | $0.20/$0.50 per 1M 🆕 | None | Fastest + tool calling, ultralow | | | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI | | | Mistral | Free trial + paid | Rate limited | European AI | | | OpenRouter | Pay-per-use | None | 100+ models aggr. | | | AgentRouter 🆕 | Pay-per-use | None | $200 free credits at signup | | 💰 CHEAP | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship | | | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | 🆓 FREE | Qoder | $0 | Unlimited | 5 models unlimited | | | Qwen | $0 | Unlimited | 4 models unlimited | | | Kiro | $0 | Unlimited | Claude Sonnet/Haiku (AWS Builder) | | | LongCat Flash-Lite 🆕 | $0 (50M tok/day 🔥) | 1 RPS | Largest free quota on Earth | | | Pollinations AI 🆕 | $0 (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 | | | Cloudflare Workers AI 🆕 | $0 (10K Neurons/day) | ~150 resp/day | 50+ models, global edge | | | Scaleway AI 🆕 | $0 (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B |
🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.
💡 See the full $0 Free Stack (11 providers) below.
💡 Understanding Dashboard Costs:
The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything — it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.
🆓 Free Models — 11 Providers, $0 Forever
Combine all free providers into one unbreakable combo — OmniRoute auto-routes between them when quota runs out.
| Provider | Prefix | Free Models | Quota |
| ----------------- | ----------- | ------------------------------------------------------------- | -------------------- |
| Kiro | kr/ | Claude Sonnet 4.5, Haiku 4.5, Opus 4.6 | 50 CREDITS per month |
| Qoder | if/ | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1 | ♾️ Unlimited |
| Qwen | qw/ | qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next | ♾️ Unlimited |
| Pollinations | pol/ | GPT-5, Claude, Gemini, DeepSeek, Llama 4, Mistral | No key needed |
| LongCat | lc/ | LongCat-Flash-Lite | 50M tokens/day 🔥 |
| Gemini CLI | gc/ | gemini-3-flash, gemini-2.5-pro | 180K tok/mo |
| Cloudflare AI | cf/ | 50+ models (Llama, Gemma, Mistral, Whisper) | 10K Neurons/day |
| Groq | groq/ | Llama 3.3 70B, Qwen3 32B, Kimi K2 | 14.4K RPD |
| NVIDIA NIM | nvidia/ | 129 models (DeepSeek, Llama, GLM, Kimi) | ~40 RPM |
| Cerebras | cerebras/ | Qwen3 235B, GPT-OSS 120B, Llama 3.1 | 1M tok/day |
| Scaleway | scw/ | Qwen3 235B, Llama 70B, DeepSeek V3 | 1M tokens (EU) |
Also free (API Key required):
Mistral (1B tok/month) · OpenRouter (35+ :free models) · GitHub Models (GPT-5, 45+ models) ·
Cohere (1K calls/month) · Z.AI/GLM (permanent free Flash models) · SiliconFlow (1K RPM, 50K TPM) ·
Kilo Code (~200 req/hr auto-router) · HuggingFace ($0.10/mo credits) · Ollama Cloud (400+ models) ·
LLM7.io (30+ models) · Kluster AI · IBM watsonx (300K tok/month) · OpenCode Zen · Vercel AI Gateway ($5/mo)
Trial credits (one-time): Baseten ($30) · NLP Cloud ($15) · AI21 ($10) · Upstage ($10) · SambaNova ($5) · Modal ($5/mo) · Fireworks ($1) · Nebius ($1) · Inference.net ($1 + $25 survey) · Hyperbolic ($1) · Novita ($0.50)
China-based (free tiers): ModelScope · Tencent Hunyuan · Volcengine · ChatAnywhere · InternAI · Bigmodel
Combined capacity: ~31,000+ RPD · ~32B+ tokens/month · 500+ models · $0
📖 Complete free provider directory: docs/FREE_TIERS.md — 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.
🎙️ Free Transcription Combo
Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.
| Provider | Free Credits | Best Model | Rate Limit |
| ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- |
| 🟢 Deepgram | $200 free (signup) | nova-3 — best accuracy, 30+ languages | No RPM limit on free credits |
| 🔵 AssemblyAI | $50 free (signup) | universal-3-pro — chapters, sentiment, PII | No RPM limit on free credits |
| 🔴 Groq | Free forever | whisper-large-v3 — OpenAI Whisper | 30 RPM (rate limited) |
Suggested combo in /dashboard/combos:
Name: free-transcription
Strategy: Priority
Nodes:
[1] deepgram/nova-3 → uses $200 free first
[2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
[3] groq/whisper-large-v3 → free forever, emergency fallback
Then in /dashboard/media → Transcription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.
💡 Key Features
4,690+ automated tests across 517 test files. Not just a relay — a full operational platform.
| Feature | Why It Matters | | ---------------------------------------------------------------------------------------------------- | -------------------------------- | | 🧠 Smart 4-Tier Fallback — Subscription → API → Cheap → Free | Never stop coding, zero downtime | | 🔄 Format Translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API | Works with ANY