Zum Inhalt springen

MCP Server / OmniRoute

OmniRoute

Never stop coding. The free AI gateway — one endpoint, 160+ providers, zero downtime. Smart 4-tier auto-fallback (Subscription → API → Cheap → Free), prompt compression (save 15-75% tokens), 3-level proxy for geo-blocks, MCP Server (29 tools), A2A Protocol, 10 multi-modal APIs, and Desktop/Android/PWA apps.

3,830von @diegosouzapwMITGitHub →

Installation

Claude Code
claude mcp add omniroute -- npx -y omniroute
npx
npx -y omniroute

npm: omniroute

Transport

stdiossehttp

Tools (20)

Page

Screenshot

Mode

Savings

Problem

OmniRoute Solution

Problem

Solution

Step

Action

Image

Tag

Description

latest

~250MB

Platform

Install

Level

Badge

Tier

Provider

Monthly

GitHub users

Cerebras

**FREE** (1M tok/day)

Groq

**FREE** (30 RPM)

None

Best price/quality reasoning

None

Fastest + tool calling, ultralow

None

Reasoning flagship from xAI

Mistral

Free trial + paid

OpenRouter

Pay-per-use

Pay-per-use

None

Dokumentation

🚀 OmniRoute — The Free AI Gateway

Never stop coding. Save 15-75% tokens with prompt compression + auto-fallback to FREE & low-cost AI models.

The most complete open-source AI proxy — one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (29 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.

Chat Completions • Responses API • Embeddings • Image Generation • Video • Music • Audio Speech/Transcription • Reranking • Moderations • Web Search • MCP Server • A2A Protocol • 4,600+ Tests • 100% TypeScript

🔥 Limited offer: Sign up at AgentRouter and get $100 in free AI credits Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits →

🚀 Quick Start💡 Features🗜️ Compression💰 Pricing🎯 Use Cases🌍 Proxy❓ FAQ📖 Docs💬 WhatsApp


🌐 Available in: 🇺🇸 English | 🇧🇷 Português (Brasil) | 🇪🇸 Español | 🇫🇷 Français | 🇮🇹 Italiano | 🇷🇺 Русский | 🇨🇳 中文 (简体) | 🇩🇪 Deutsch | 🇮🇳 हिन्दी | 🇹🇭 ไทย | 🇺🇦 Українська | 🇸🇦 العربية | 🇯🇵 日本語 | 🇻🇳 Tiếng Việt | 🇧🇬 Български | 🇩🇰 Dansk | 🇫🇮 Suomi | 🇮🇱 עברית | 🇭🇺 Magyar | 🇮🇩 Bahasa Indonesia | 🇰🇷 한국어 | 🇲🇾 Bahasa Melayu | 🇳🇱 Nederlands | 🇳🇴 Norsk | 🇵🇹 Português (Portugal) | 🇷🇴 Română | 🇵🇱 Polski | 🇸🇰 Slovenčina | 🇸🇪 Svenska | 🇵🇭 Filipino | 🇨🇿 Čeština


🖼️ Main Dashboard


📸 Dashboard Preview

| Page | Screenshot | | -------------- | ------------------------------------------------- | | Providers | | | Combos | | | Analytics | | | Health | | | Translator | | | Settings | | | CLI Tools | | | Usage Logs | | | Endpoints | |


🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota


📺 OmniRoute in Action — Video Guides

🎬 Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.


🤔 Why OmniRoute?

Stop wasting money, tokens and hitting limits:

❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provider) ❌ Manual switching between providers ❌ Each provider has a different API format ❌ AI providers blocked in your country

OmniRoute solves all of this:

Prompt Compression — auto-compress prompts & tool outputs, save 15-75% tokens per request ✅ Maximize subscriptions — track quota, use every bit before reset ✅ Auto fallback — Subscription → API Key → Cheap → Free, zero downtime ✅ Multi-account — round-robin between accounts per provider ✅ Format translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API, any tool works ✅ 3-level proxy — bypass geo-blocks with global, per-provider, and per-key proxies ✅ 10 multi-modal APIs — chat, images, video, music, audio, search in one endpoint ✅ MCP + A2A — 29 MCP tools + agent-to-agent protocol, production-ready ✅ Universal — works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool


📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.


🛠️ Supported CLI Tools

OmniRoute works seamlessly with 16+ AI coding tools — one config, all tools:

📖 Full setup for each tool: docs/CLI-TOOLS.md


🌐 Supported Providers — 160+

🔐 OAuth Providers

🆓 Free Providers (No Cost)

🔑 API Key Providers (120+)

Alibaba · Amazon Q · AssemblyAI · Baidu Qianfan · Baseten · Black Forest Labs · Blackbox · Brave Search · Bytez · CablyAI · Cartesia · ChatGPT Web · Chutes.ai · Clarifai · Codestral · CrofAI · DataRobot · Deepgram · ElevenLabs · Empower · Exa Search · Fal.ai · Featherless AI · FenayAI · FriendliAI · Galadriel · GigaChat · GitLab Duo · GLHF Chat · GoAPI · Heroku AI · Hyperbolic · IBM watsonx · Inference.net · Inworld · Jina AI · Kilo Gateway · Lambda AI · LaoZhang · Linkup Search · LlamaGate · Maritalk · Modal · Moonshot AI · Morph · Muse Spark · NanoBanana · NanoGPT · NLP Cloud · Nous Research · Novita AI · nScale · OCI · Ollama Cloud · OVHcloud · PiAPI · PlayHT · Poe · Predibase · PublicAI · Qwen Code · Recraft · Reka · Runway · SAP · Scaleway · SearchAPI · SearXNG · Serper · Stability AI · Synthetic · Tavily · TheB.AI · Topaz · Upstage · v0 (Vercel) · Vercel AI Gateway · Volcengine · Voyage AI · W&B Inference · Xiaomi MiMo · You.com · Z.AI · + OpenAI/Anthropic-compatible custom endpoints

🏠 Self-Hosted


🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌──────────────────────────────────────────────────┐
│              OmniRoute (Smart Router)             │
│  • 🗜️ Prompt Compression (save 15-75% tokens)    │
│  • Format translation (OpenAI ↔ Claude ↔ Gemini) │
│  • Quota tracking + Embeddings + Images          │
│  • Auto token refresh + Rate limit management    │
└──────┬───────────────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost + 15-75% token savings

🗜️ Prompt Compression — Save 15-75% Tokens Automatically

Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage on every request — before it even reaches the provider. Inspired by Caveman (⭐ 51K+).

How It Works

Every request passes through the compression pipeline transparently — no client changes needed:

┌──────────────────┐     ┌─────────────────────────────┐     ┌──────────────┐
│   Client sends   │────▶│  OmniRoute Compression      │────▶│  Provider    │
│   full prompt    │     │  Pipeline (5 modes)          │     │  receives    │
│   (10,000 tok)   │     │                              │     │  compressed  │
│                  │     │  🪶 Lite ........... ~15%     │     │  (2,500 tok) │
│                  │     │  🪨 Standard ....... ~30%     │     │              │
│                  │     │  ⚡ Aggressive ..... ~50%     │     │  💰 75% saved │
│                  │     │  🔥 Ultra .......... ~75%     │     │              │
└──────────────────┘     └─────────────────────────────┘     └──────────────┘

5 Compression Modes

| Mode | Savings | Technique | Best For | | ------------------------- | ------- | ----------------------------------------------------------------------------------------------- | -------------------------------------- | | Off | 0% | No compression | When you need exact prompts | | 🪶 Lite | ~15% | Whitespace collapse, dedup system prompts, image URL shortening | Always-on safe default | | 🪨 Standard (Caveman) | ~30% | 30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedup | Daily coding with Claude/Codex | | ⚡ Aggressive | ~50% | All standard + progressive message aging + tool result summarization + LLM-based compression | Long sessions with many tool calls | | 🔥 Ultra | ~75% | All aggressive + heuristic token pruning + stopword removal + score-based filtering | Maximum savings when tokens are scarce |

Before & After (Standard/Caveman Mode)

🗣️ Before compression (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

🪨 After compression (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% less tokens. Zero accuracy loss.

Architecture

Request Body
  │
  ├─ strategySelector.ts ─── Picks mode (config / combo override / auto-trigger)
  │
  ├─ lite.ts ─────────────── Whitespace, dedup, image URLs, redundant content
  ├─ caveman.ts ──────────── 30+ regex rules via cavemanRules.ts
  │   └─ preservation.ts ─── Protects code blocks, URLs, JSON from compression
  ├─ aggressive.ts ───────── Summarizer + tool result compressor + progressive aging
  │   ├─ summarizer.ts ───── Rule-based message summarization
  │   ├─ toolResultCompressor.ts ── file/grep/shell/JSON/error compression
  │   └─ progressiveAging.ts ──── Older messages → shorter summaries
  └─ ultra.ts ────────────── Heuristic token scoring + pruning
      └─ ultraHeuristic.ts ─ Stopword detection, score thresholds, force-preserve

Configuration

Dashboard → Settings → Compression → Pick your mode

Or per-combo override:

{
  "comboOverrides": {
    "my-coding-combo": "standard",
    "my-cheap-combo": "ultra"
  }
}

Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.

🪨 Fun fact: The standard/caveman mode is inspired by Caveman — the viral project that proved "caveman speak" cuts 65% of tokens while keeping 100% technical accuracy. OmniRoute takes this further with a 5-mode pipeline that goes from gentle whitespace cleanup all the way to aggressive heuristic pruning.

📖 Full compression documentation: docs/COMPRESSION_GUIDE.md


🎯 What OmniRoute Solves

Every developer using AI tools faces these problems daily. OmniRoute solves them all.

| # | Problem | OmniRoute Solution | | --- | ---------------------------------------- | ----------------------------------------------------------------------------------------------- | | 💸 | Subscription quota expires mid-coding | Smart 4-Tier Fallback — auto-routes Subscription → API Key → Cheap → Free | | 🔌 | Each provider has a different API format | Format Translation — unified endpoint translates OpenAI ↔ Claude ↔ Gemini ↔ Responses | | 🌐 | AI providers block my country/region | 3-Level Proxy — global, per-provider, and per-key proxy with TLS fingerprint spoofing | | 🆓 | Can't afford AI subscriptions | 11 Free Providers — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM... | | 🔒 | Gateway is exposed without protection | API Key Management — scoping, rotation, IP filtering, rate limiting, prompt injection guard | | 🛑 | Provider went down, lost coding flow | Circuit Breakers — auto-failover with cooldown, retry, anti-thundering herd | | 🔧 | Configuring each CLI tool is tedious | CLI Tools Dashboard — one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo | | 🔑 | Managing OAuth tokens is hell | Auto Token Refresh — OAuth PKCE for 8 providers, multi-account, LAN/remote fix | | 📊 | Don't know how much I'm spending | Cost Analytics — per-token tracking, budget limits, usage stats per API key | | 🐛 | Can't diagnose errors in AI calls | Unified Logs — 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry |

| # | Problem | Solution | | --- | --------------------------------------------- | -------------------------------------------------------------------------------------------------- | | 11 | Deploying/maintaining is complex | npm global, Docker multi-arch, Electron, Termux — deploy anywhere | | 12 | Interface is English-only | 40+ languages with RTL support | | 13 | Need more than chat (images, audio, video) | 10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch | | 14 | No way to test/compare models | LLM Evals, Translator Playground, Chat Tester, Live Monitor | | 15 | Need to scale without losing performance | Semantic cache, request dedup, rate limit detection, queue & pacing | | 16 | Want to control model behavior globally | System prompt injection, thinking budget, wildcard routing | | 17 | Need MCP tools as first-class features | 29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail | | 18 | Need A2A orchestration | JSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths | | 19 | Need real MCP process health | Runtime heartbeat, PID tracking, UI status cards | | 20 | Need auditable MCP execution | SQLite-backed audit with filters, pagination, stats | | 21 | Need scoped MCP permissions | 10 granular scopes per integration | | 22 | Need operational controls without redeploying | Combo switches, resilience tuning, breaker resets from dashboard | | 23 | Need A2A task lifecycle visibility | Task listing/filtering, drill-down, cancellation | | 24 | Need active stream metrics | Active stream counters, per-state counts, A2A dashboard cards | | 25 | Need standard agent discovery | Agent Card at /.well-known/agent.json | | 26 | Need protocol discoverability | Consolidated Endpoints page with Proxy, MCP, A2A, API tabs | | 27 | Need E2E protocol validation | Real MCP SDK + A2A client flows in test:protocols:e2e | | 28 | Need unified observability | Health + audit + telemetry across OpenAI, MCP, and A2A layers | | 29 | Need one runtime for proxy + tools + agents | OpenAI proxy + MCP + A2A in one stack with shared auth/resilience | | 30 | Need agentic workflows without glue-code | Unified endpoint, protocol UIs, production-ready foundations | | 31 | Long sessions crash with context limits | Proactive context compression, structural integrity guards, multi-layer dropping |

📖 Deep dives: Resilience GuideProxy GuideSetup GuideCompression Guide


🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

| Step | Action | Providers Unlocked | | ---- | -------------------------------------------------- | ------------------------------------------------------------------ | | 1 | Connect Kiro (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 — unlimited | | 2 | Connect Qoder (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited | | 3 | Connect Qwen (Device Code) | qwen3-coder-plus, qwen3-coder-flash... — unlimited | | 4 | Connect Gemini CLI (Google OAuth) | gemini-3-flash, gemini-2.5-pro — 180K/mo free | | 5 | /dashboard/combosFree Stack ($0) template | Round-robin all free providers automatically |

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

⚡ Quick Start

1) Install and run

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128 · API at http://localhost:20128/v1.

2) Connect providers

  1. Dashboard → Providers → connect at least one provider (OAuth or API key)
  2. Dashboard → Endpoints → create an API key
  3. Dashboard → Combos → set your fallback chain (optional)

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.

Docker:

docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

From source:

cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute

Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service

MCP: omniroute --mcp (stdio transport)

CLI options: omniroute --port 3000, omniroute --no-open, omniroute --help

Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute

Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)

📖 Full details: Setup Guide · Docker · Void Linux template


🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings → Appearance without changing active tunnel state.

Notes:

  • Quick Tunnel URLs are temporary and change after every restart.
  • Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
  • Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
  • Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
  • Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
  • SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
  • The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
  • Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:

| Image | Tag | Size | Description | | ------------------------ | -------- | ------ | --------------------- | | diegosouzapw/omniroute | latest | ~250MB | Latest stable release | | diegosouzapw/omniroute | 3.7.8 | ~250MB | Current version |

📖 Full Docker documentation: docs/DOCKER_GUIDE.md — Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.


📱 Multi-Platform — Run Anywhere

OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).

| Platform | Install | Highlights | | -------------- | -------------------------------------------- | -------------------------------------------------------------------------- | | 🖥️ Desktop | npm run electron:build | Native window, system tray, auto-start, offline mode — Windows/macOS/Linux | | 📱 Android | pkg install nodejs-lts && npx -y omniroute | ARM native, no root, 24/7 via Termux:Boot — your phone is an AI server | | 📲 PWA | "Add to Home Screen" in browser | Fullscreen, offline page, service worker caching — Android/iOS/Desktop |

  • Native Electron app with system tray, auto-start, native notifications
  • One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
  • Dev: npm run electron:dev · Build: npm run electron:build
  • 📖 Full docs: electron/README.md
pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest

Access from any device on the same network: http://PHONE_IP:20128/v1

  • Android (Chrome): ⋮ → "Add to Home screen"
  • iOS (Safari): Share → "Add to Home Screen"
  • Desktop (Chrome/Edge): Install icon in address bar
  • 📖 Full docs: docs/PWA_GUIDE.md

🌍 Bypass Geographic Blocks — Use AI From Any Country

🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.

| Level | Badge | Configure In | Use Case | | ------------------ | ----- | ------------------ | ------------------------------- | | Global | 🟢 | Settings → Proxy | All traffic through one proxy | | Per-Provider | 🟡 | Provider → Proxy | Only specific providers proxied | | Per-Connection | 🔵 | Connection → Proxy | Each API key uses its own proxy |

What gets proxied: API requests ✅ • OAuth flows ✅ • Connection tests ✅ • Token refresh ✅ • Model sync ✅

Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies

🆓 1proxy — Free Proxy Marketplace

Contributed by @oyi77#1847

No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:

  • One-click sync (up to 500 proxies) • Quality scores (0-100) • Country filter • Auto-rotation (quality/random/sequential) • Auto-degradation • Circuit breaker

Anti-Detection

  • 🔒 TLS Fingerprint Spoofing — browser-like TLS via wreq-js
  • 🔏 CLI Fingerprint Matching — matches native CLI binary signatures
  • 🏠 Proxy IP Preservation — stealth + IP masking simultaneously

📖 Full proxy documentation: docs/PROXY_GUIDE.md



💰 Pricing at a Glance

| Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- | | 💳 SUBSCRIPTION | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | FREE | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | 🔑 API KEY | NVIDIA NIM | FREE (dev forever) | ~40 RPM | 70+ open models | | | Cerebras | FREE (1M tok/day) | 60K TPM / 30 RPM | World's fastest | | | Groq | FREE (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma | | | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning | | | xAI Grok-4 Fast | $0.20/$0.50 per 1M 🆕 | None | Fastest + tool calling, ultralow | | | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI | | | Mistral | Free trial + paid | Rate limited | European AI | | | OpenRouter | Pay-per-use | None | 100+ models aggr. | | | AgentRouter 🆕 | Pay-per-use | None | $200 free credits at signup | | 💰 CHEAP | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship | | | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | 🆓 FREE | Qoder | $0 | Unlimited | 5 models unlimited | | | Qwen | $0 | Unlimited | 4 models unlimited | | | Kiro | $0 | Unlimited | Claude Sonnet/Haiku (AWS Builder) | | | LongCat Flash-Lite 🆕 | $0 (50M tok/day 🔥) | 1 RPS | Largest free quota on Earth | | | Pollinations AI 🆕 | $0 (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 | | | Cloudflare Workers AI 🆕 | $0 (10K Neurons/day) | ~150 resp/day | 50+ models, global edge | | | Scaleway AI 🆕 | $0 (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B |

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 See the full $0 Free Stack (11 providers) below.

💡 Understanding Dashboard Costs:

The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything — it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.


🆓 Free Models — 11 Providers, $0 Forever

Combine all free providers into one unbreakable combo — OmniRoute auto-routes between them when quota runs out.

| Provider | Prefix | Free Models | Quota | | ----------------- | ----------- | ------------------------------------------------------------- | -------------------- | | Kiro | kr/ | Claude Sonnet 4.5, Haiku 4.5, Opus 4.6 | 50 CREDITS per month | | Qoder | if/ | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1 | ♾️ Unlimited | | Qwen | qw/ | qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next | ♾️ Unlimited | | Pollinations | pol/ | GPT-5, Claude, Gemini, DeepSeek, Llama 4, Mistral | No key needed | | LongCat | lc/ | LongCat-Flash-Lite | 50M tokens/day 🔥 | | Gemini CLI | gc/ | gemini-3-flash, gemini-2.5-pro | 180K tok/mo | | Cloudflare AI | cf/ | 50+ models (Llama, Gemma, Mistral, Whisper) | 10K Neurons/day | | Groq | groq/ | Llama 3.3 70B, Qwen3 32B, Kimi K2 | 14.4K RPD | | NVIDIA NIM | nvidia/ | 129 models (DeepSeek, Llama, GLM, Kimi) | ~40 RPM | | Cerebras | cerebras/ | Qwen3 235B, GPT-OSS 120B, Llama 3.1 | 1M tok/day | | Scaleway | scw/ | Qwen3 235B, Llama 70B, DeepSeek V3 | 1M tokens (EU) |

Also free (API Key required): Mistral (1B tok/month) · OpenRouter (35+ :free models) · GitHub Models (GPT-5, 45+ models) · Cohere (1K calls/month) · Z.AI/GLM (permanent free Flash models) · SiliconFlow (1K RPM, 50K TPM) · Kilo Code (~200 req/hr auto-router) · HuggingFace ($0.10/mo credits) · Ollama Cloud (400+ models) · LLM7.io (30+ models) · Kluster AI · IBM watsonx (300K tok/month) · OpenCode Zen · Vercel AI Gateway ($5/mo)

Trial credits (one-time): Baseten ($30) · NLP Cloud ($15) · AI21 ($10) · Upstage ($10) · SambaNova ($5) · Modal ($5/mo) · Fireworks ($1) · Nebius ($1) · Inference.net ($1 + $25 survey) · Hyperbolic ($1) · Novita ($0.50)

China-based (free tiers): ModelScope · Tencent Hunyuan · Volcengine · ChatAnywhere · InternAI · Bigmodel

Combined capacity: ~31,000+ RPD · ~32B+ tokens/month · 500+ models · $0

📖 Complete free provider directory: docs/FREE_TIERS.md — 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.


🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

| Provider | Free Credits | Best Model | Rate Limit | | ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- | | 🟢 Deepgram | $200 free (signup) | nova-3 — best accuracy, 30+ languages | No RPM limit on free credits | | 🔵 AssemblyAI | $50 free (signup) | universal-3-pro — chapters, sentiment, PII | No RPM limit on free credits | | 🔴 Groq | Free forever | whisper-large-v3 — OpenAI Whisper | 30 RPM (rate limited) |


Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses $200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/mediaTranscription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

4,690+ automated tests across 517 test files. Not just a relay — a full operational platform.

| Feature | Why It Matters | | ---------------------------------------------------------------------------------------------------- | -------------------------------- | | 🧠 Smart 4-Tier Fallback — Subscription → API → Cheap → Free | Never stop coding, zero downtime | | 🔄 Format Translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API | Works with ANY