MCP Server / OmniRoute

OmniRoute

Never stop coding. The free AI gateway — one endpoint, 160+ providers, zero downtime. Smart 4-tier auto-fallback (Subscription → API → Cheap → Free), prompt compression (save 15-75% tokens), 3-level proxy for geo-blocks, MCP Server (29 tools), A2A Protocol, 10 multi-modal APIs, and Desktop/Android/PWA apps.

★ 3,830von @diegosouzapwMITGitHub →

Installation

Claude Code

claude mcp add omniroute -- npx -y omniroute

npx

npx -y omniroute

npm: omniroute

Transport

stdiossehttp

Tools (20)

Page

Screenshot

Mode

Savings

Problem

OmniRoute Solution

Problem

Solution

Step

Action

Image

Tag

Description

latest

~250MB

Platform

Install

Level

Badge

Tier

Provider

Monthly

GitHub users

Cerebras

**FREE** (1M tok/day)

Groq

**FREE** (30 RPM)

None

Best price/quality reasoning

None

Fastest + tool calling, ultralow

None

Reasoning flagship from xAI

Mistral

Free trial + paid

OpenRouter

Pay-per-use

None

Dokumentation

🚀 OmniRoute — The Free AI Gateway

Never stop coding. Save 15-75% tokens with prompt compression + auto-fallback to FREE & low-cost AI models.

The most complete open-source AI proxy — one endpoint, 160+ providers, 13 routing strategies, zero downtime. Multi-platform: Web, Desktop (Electron), Mobile (PWA + Termux). Fully extensible via MCP Server (29 tools), A2A Protocol, and Memory/Skills systems. Available in 40+ languages.

Chat Completions • Responses API • Embeddings • Image Generation • Video • Music • Audio Speech/Transcription • Reranking • Moderations • Web Search • MCP Server • A2A Protocol • 4,600+ Tests • 100% TypeScript

🔥 Limited offer: Sign up at AgentRouter and get $100 in free AI credits Access GPT-5, Claude, Gemini, DeepSeek & 100+ models. No credit card required. Claim your credits →

🚀 Quick Start • 💡 Features • 🗜️ Compression • 💰 Pricing • 🎯 Use Cases • 🌍 Proxy • ❓ FAQ • 📖 Docs • 💬 WhatsApp

🖼️ Main Dashboard

📸 Dashboard Preview

| Page | Screenshot | | -------------- | ------------------------------------------------- | | Providers | | | Combos | | | Analytics | | | Health | | | Translator | | | Settings | | | CLI Tools | | | Usage Logs | | | Endpoints | |

🤖 Free AI Provider for your favorite coding agents

Connect any AI-powered IDE or CLI tool through OmniRoute — free API gateway for unlimited coding.

📡 All agents connect via http://localhost:20128/v1 or http://cloud.omniroute.online/v1 — one config, unlimited models and quota

📺 OmniRoute in Action — Video Guides

🎬 Made a video about OmniRoute? We'd love to feature it here! Open an issue or discussion with the link and we'll add it to this showcase.

🤔 Why OmniRoute?

Stop wasting money, tokens and hitting limits:

❌ Subscription quota expires unused every month ❌ Rate limits stop you mid-coding ❌ Tool outputs (git diff, grep, ls...) burn tokens fast ❌ Expensive APIs ($20-50/month per provider) ❌ Manual switching between providers ❌ Each provider has a different API format ❌ AI providers blocked in your country

OmniRoute solves all of this:

✅ Prompt Compression — auto-compress prompts & tool outputs, save 15-75% tokens per request ✅ Maximize subscriptions — track quota, use every bit before reset ✅ Auto fallback — Subscription → API Key → Cheap → Free, zero downtime ✅ Multi-account — round-robin between accounts per provider ✅ Format translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API, any tool works ✅ 3-level proxy — bypass geo-blocks with global, per-provider, and per-key proxies ✅ 10 multi-modal APIs — chat, images, video, music, audio, search in one endpoint ✅ MCP + A2A — 29 MCP tools + agent-to-agent protocol, production-ready ✅ Universal — works with Claude Code, Codex, Gemini CLI, Cursor, Cline, OpenClaw, any CLI tool

📧 Support

💬 Join our community! WhatsApp Group — Get help, share tips, and stay updated.

Website: omniroute.online
GitHub: github.com/diegosouzapw/OmniRoute
Issues: github.com/diegosouzapw/OmniRoute/issues
WhatsApp: Community Group
Contributing: See CONTRIBUTING.md, open a PR, or pick a good first issue
Original Project: 9router by decolua

🐛 Reporting a Bug?

When opening an issue, please run the system-info command and attach the generated file:

npm run system-info

This generates a system-info.txt with your Node.js version, OmniRoute version, OS details, installed CLI tools (qoder, gemini, claude, codex, antigravity, droid, etc.), Docker/PM2 status, and system packages — everything we need to reproduce your issue quickly. Attach the file directly to your GitHub issue.

🛠️ Supported CLI Tools

OmniRoute works seamlessly with 16+ AI coding tools — one config, all tools:

📖 Full setup for each tool: docs/CLI-TOOLS.md

🌐 Supported Providers — 160+

🔐 OAuth Providers

🆓 Free Providers (No Cost)

🔑 API Key Providers (120+)

Alibaba · Amazon Q · AssemblyAI · Baidu Qianfan · Baseten · Black Forest Labs · Blackbox · Brave Search · Bytez · CablyAI · Cartesia · ChatGPT Web · Chutes.ai · Clarifai · Codestral · CrofAI · DataRobot · Deepgram · ElevenLabs · Empower · Exa Search · Fal.ai · Featherless AI · FenayAI · FriendliAI · Galadriel · GigaChat · GitLab Duo · GLHF Chat · GoAPI · Heroku AI · Hyperbolic · IBM watsonx · Inference.net · Inworld · Jina AI · Kilo Gateway · Lambda AI · LaoZhang · Linkup Search · LlamaGate · Maritalk · Modal · Moonshot AI · Morph · Muse Spark · NanoBanana · NanoGPT · NLP Cloud · Nous Research · Novita AI · nScale · OCI · Ollama Cloud · OVHcloud · PiAPI · PlayHT · Poe · Predibase · PublicAI · Qwen Code · Recraft · Reka · Runway · SAP · Scaleway · SearchAPI · SearXNG · Serper · Stability AI · Synthetic · Tavily · TheB.AI · Topaz · Upstage · v0 (Vercel) · Vercel AI Gateway · Volcengine · Voyage AI · W&B Inference · Xiaomi MiMo · You.com · Z.AI · + OpenAI/Anthropic-compatible custom endpoints

🏠 Self-Hosted

🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, Gemini CLI, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌──────────────────────────────────────────────────┐
│              OmniRoute (Smart Router)             │
│  • 🗜️ Prompt Compression (save 15-75% tokens)    │
│  • Format translation (OpenAI ↔ Claude ↔ Gemini) │
│  • Quota tracking + Embeddings + Images          │
│  • Auto token refresh + Rate limit management    │
└──────┬───────────────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, Gemini CLI
       │   ↓ quota exhausted
       ├─→ [Tier 2: API KEY] DeepSeek, Groq, xAI, Mistral, NVIDIA NIM, etc.
       │   ↓ budget limit
       ├─→ [Tier 3: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 4: FREE] Qoder, Qwen, Kiro (unlimited)

Result: Never stop coding, minimal cost + 15-75% token savings

🗜️ Prompt Compression — Save 15-75% Tokens Automatically

Why use many token when few token do trick? OmniRoute's built-in compression pipeline reduces token usage on every request — before it even reaches the provider. Inspired by Caveman (⭐ 51K+).

How It Works

Every request passes through the compression pipeline transparently — no client changes needed:

┌──────────────────┐     ┌─────────────────────────────┐     ┌──────────────┐
│   Client sends   │────▶│  OmniRoute Compression      │────▶│  Provider    │
│   full prompt    │     │  Pipeline (5 modes)          │     │  receives    │
│   (10,000 tok)   │     │                              │     │  compressed  │
│                  │     │  🪶 Lite ........... ~15%     │     │  (2,500 tok) │
│                  │     │  🪨 Standard ....... ~30%     │     │              │
│                  │     │  ⚡ Aggressive ..... ~50%     │     │  💰 75% saved │
│                  │     │  🔥 Ultra .......... ~75%     │     │              │
└──────────────────┘     └─────────────────────────────┘     └──────────────┘

5 Compression Modes

| Mode | Savings | Technique | Best For | | ------------------------- | ------- | ----------------------------------------------------------------------------------------------- | -------------------------------------- | | Off | 0% | No compression | When you need exact prompts | | 🪶 Lite | ~15% | Whitespace collapse, dedup system prompts, image URL shortening | Always-on safe default | | 🪨 Standard (Caveman) | ~30% | 30+ regex rules: filler removal, context condensation, structural compression, multi-turn dedup | Daily coding with Claude/Codex | | ⚡ Aggressive | ~50% | All standard + progressive message aging + tool result summarization + LLM-based compression | Long sessions with many tool calls | | 🔥 Ultra | ~75% | All aggressive + heuristic token pruning + stopword removal + score-based filtering | Maximum savings when tokens are scarce |

Before & After (Standard/Caveman Mode)

🗣️ Before compression (69 tokens):

"The reason your React component is re-rendering is likely because you're creating a new object reference on each render cycle. When you pass an inline object as a prop, React's shallow comparison sees it as a different object every time, which triggers a re-render. I would recommend using useMemo to memoize the object."

🪨 After compression (19 tokens):

"New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo."

Same answer. 72% less tokens. Zero accuracy loss.

Architecture

Request Body
  │
  ├─ strategySelector.ts ─── Picks mode (config / combo override / auto-trigger)
  │
  ├─ lite.ts ─────────────── Whitespace, dedup, image URLs, redundant content
  ├─ caveman.ts ──────────── 30+ regex rules via cavemanRules.ts
  │   └─ preservation.ts ─── Protects code blocks, URLs, JSON from compression
  ├─ aggressive.ts ───────── Summarizer + tool result compressor + progressive aging
  │   ├─ summarizer.ts ───── Rule-based message summarization
  │   ├─ toolResultCompressor.ts ── file/grep/shell/JSON/error compression
  │   └─ progressiveAging.ts ──── Older messages → shorter summaries
  └─ ultra.ts ────────────── Heuristic token scoring + pruning
      └─ ultraHeuristic.ts ─ Stopword detection, score thresholds, force-preserve

Configuration

Dashboard → Settings → Compression → Pick your mode

Or per-combo override:

{
  "comboOverrides": {
    "my-coding-combo": "standard",
    "my-cheap-combo": "ultra"
  }
}

Auto-trigger: set autoTriggerTokens to automatically enable compression when a request exceeds a token threshold.

🪨 Fun fact: The standard/caveman mode is inspired by Caveman — the viral project that proved "caveman speak" cuts 65% of tokens while keeping 100% technical accuracy. OmniRoute takes this further with a 5-mode pipeline that goes from gentle whitespace cleanup all the way to aggressive heuristic pruning.

📖 Full compression documentation: docs/COMPRESSION_GUIDE.md

🎯 What OmniRoute Solves

Every developer using AI tools faces these problems daily. OmniRoute solves them all.

| # | Problem | OmniRoute Solution | | --- | ---------------------------------------- | ----------------------------------------------------------------------------------------------- | | 💸 | Subscription quota expires mid-coding | Smart 4-Tier Fallback — auto-routes Subscription → API Key → Cheap → Free | | 🔌 | Each provider has a different API format | Format Translation — unified endpoint translates OpenAI ↔ Claude ↔ Gemini ↔ Responses | | 🌐 | AI providers block my country/region | 3-Level Proxy — global, per-provider, and per-key proxy with TLS fingerprint spoofing | | 🆓 | Can't afford AI subscriptions | 11 Free Providers — Kiro, Qoder, Pollinations, LongCat, Cloudflare AI, NVIDIA NIM... | | 🔒 | Gateway is exposed without protection | API Key Management — scoping, rotation, IP filtering, rate limiting, prompt injection guard | | 🛑 | Provider went down, lost coding flow | Circuit Breakers — auto-failover with cooldown, retry, anti-thundering herd | | 🔧 | Configuring each CLI tool is tedious | CLI Tools Dashboard — one-click setup for Claude Code, Codex, Cursor, OpenClaw, Kilo | | 🔑 | Managing OAuth tokens is hell | Auto Token Refresh — OAuth PKCE for 8 providers, multi-account, LAN/remote fix | | 📊 | Don't know how much I'm spending | Cost Analytics — per-token tracking, budget limits, usage stats per API key | | 🐛 | Can't diagnose errors in AI calls | Unified Logs — 4-tab dashboard (request, proxy, audit, console) + p50/p95/p99 telemetry |

| # | Problem | --- | --------------------------------------------- | 11 | Deploying/maintaining is complex | 12 | Interface is English-only | 13 | Need more than chat (images, audio, video) | 14 | No way to test/compare models | 15 | Need to scale without losing performance | 16 | Want to control model behavior globally | 17 | Need MCP tools as first-class features | 18 | Need A2A orchestration | 19 | Need real MCP process health | 20 | Need auditable MCP execution | 21 | Need scoped MCP permissions | 22 | Need operational controls | 23 | Need A2A task lifecycle visibility | 24 | Need active stream metrics | 25 | Need standard agent discovery | 26 | Need protocol discoverability | 27 | Need E2E protocol validation | 28 | Need unified observability | 29 | Need one runtime for proxy + tools + agents | 30 | Need agentic workflows without glue-code | 31 | Long sessions crash with context limits | Solution | | -------------------------------------------------------------------------------------------------- | | npm global, Docker multi-arch, Electron, Termux — deploy anywhere | | 40+ languages with RTL support | | 10 multi-modal APIs: embeddings, images, video, music, TTS, STT, moderation, rerank, search, batch | | LLM Evals, Translator Playground, Chat Tester, Live Monitor | | Semantic cache, request dedup, rate limit detection, queue & pacing | | System prompt injection, thinking budget, wildcard routing | | 29 MCP tools, 3 transports (stdio/SSE/HTTP), 10 scopes, audit trail | | JSON-RPC 2.0 + SSE streaming, task lifecycle, sync + stream paths | | Runtime heartbeat, PID tracking, UI status cards | | SQLite-backed audit with filters, pagination, stats | | 10 granular scopes per integration | without redeploying | Combo switches, resilience tuning, breaker resets from dashboard | | Task listing/filtering, drill-down, cancellation | | Active stream counters, per-state counts, A2A dashboard cards | | Agent Card at /.well-known/agent.json | | Consolidated Endpoints page with Proxy, MCP, A2A, API tabs | | Real MCP SDK + A2A client flows in test:protocols:e2e | | Health + audit + telemetry across OpenAI, MCP, and A2A layers | | OpenAI proxy + MCP + A2A in one stack with shared auth/resilience | | Unified endpoint, protocol UIs, production-ready foundations | | Proactive context compression, structural integrity guards, multi-layer dropping |

📖 Deep dives: Resilience Guide • Proxy Guide • Setup Guide • Compression Guide

🆓 Start Free — Zero Configuration Cost

Setup AI coding in minutes at $0/month. Connect these free accounts and use the built-in Free Stack combo.

| Step | Action | Providers Unlocked | | ---- | -------------------------------------------------- | ------------------------------------------------------------------ | | 1 | Connect Kiro (AWS Builder ID OAuth) | Claude Sonnet 4.5, Haiku 4.5 — unlimited | | 2 | Connect Qoder (Google OAuth) | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1... — unlimited | | 3 | Connect Qwen (Device Code) | qwen3-coder-plus, qwen3-coder-flash... — unlimited | | 4 | Connect Gemini CLI (Google OAuth) | gemini-3-flash, gemini-2.5-pro — 180K/mo free | | 5 | /dashboard/combos → Free Stack ($0) template | Round-robin all free providers automatically |

Point any IDE/CLI to: http://localhost:20128/v1 · API Key: any-string · Done.

Optional extra coverage (also free): Groq API key (30 RPM free), NVIDIA NIM (40 RPM free, 70+ models), Cerebras (1M tok/day), LongCat API key (50M tokens/day!), Cloudflare Workers AI (10K Neurons/day, 50+ models).

⚡ Quick Start

1) Install and run

npm install -g omniroute
omniroute

Dashboard opens at http://localhost:20128 · API at http://localhost:20128/v1.

2) Connect providers

Dashboard → Providers → connect at least one provider (OAuth or API key)
Dashboard → Endpoints → create an API key
Dashboard → Combos → set your fallback chain (optional)

3) Point your coding tool

Base URL: http://localhost:20128/v1
API Key:  [copy from Endpoint page]
Model:    if/kimi-k2-thinking (or any provider/model)

Works with Claude Code, Codex CLI, Gemini CLI, Cursor, Cline, OpenClaw, OpenCode, and any OpenAI-compatible tool.

Docker:

docker run -d --name omniroute --restart unless-stopped -p 20128:20128 -v omniroute-data:/app/data diegosouzapw/omniroute:latest

From source:

cp .env.example .env && npm install
PORT=20128 DASHBOARD_PORT=20129 NEXT_PUBLIC_BASE_URL=http://localhost:20129 npm run dev

pnpm: pnpm install -g omniroute && pnpm approve-builds -g && omniroute

Arch Linux (AUR): yay -S omniroute-bin && systemctl --user enable --now omniroute.service

MCP: omniroute --mcp (stdio transport)

CLI options: omniroute --port 3000, omniroute --no-open, omniroute --help

Split-port mode: PORT=20128 DASHBOARD_PORT=20129 omniroute

Uninstall: npm run uninstall (keeps data) or npm run uninstall:full (removes everything)

📖 Full details: Setup Guide · Docker · Void Linux template

🐳 Docker

OmniRoute is available as a public Docker image on Docker Hub.

Quick run:

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

With environment file:

# Copy and edit .env first
cp .env.example .env

docker run -d \
  --name omniroute \
  --restart unless-stopped \
  --stop-timeout 40 \
  --env-file .env \
  -p 20128:20128 \
  -v omniroute-data:/app/data \
  diegosouzapw/omniroute:latest

Using Docker Compose:

# Base profile (no CLI tools)
docker compose --profile base up -d

# CLI profile (Claude Code, Codex, OpenClaw built-in)
docker compose --profile cli up -d

Dashboard support for Docker deployments now includes a one-click Cloudflare Quick Tunnel on Dashboard → Endpoints. The first enable downloads cloudflared only when needed, starts a temporary tunnel to your current /v1 endpoint, and shows the generated https://*.trycloudflare.com/v1 URL directly below your normal public URL. Endpoint tunnel panels, including Cloudflare, Tailscale, and ngrok, can be shown or hidden from Settings → Appearance without changing active tunnel state.

Notes:

Quick Tunnel URLs are temporary and change after every restart.
Quick Tunnels are not auto-restored after an OmniRoute or container restart. Re-enable them from the dashboard when needed.
Managed install currently supports Linux, macOS, and Windows on x64 / arm64.
Managed Quick Tunnels default to HTTP/2 transport to avoid noisy QUIC UDP buffer warnings in constrained container environments. Set CLOUDFLARED_PROTOCOL=quic or auto if you want a different transport.
Docker images bundle system CA roots and pass them to managed cloudflared, which avoids TLS trust failures when the tunnel bootstraps inside the container.
SQLite runs in WAL mode. docker stop should be allowed to finish so OmniRoute can checkpoint the latest changes back into storage.sqlite.
The bundled Compose files already set a 40s stop grace period. If you run the image directly, keep --stop-timeout 40 (or similar) so manual stops do not cut off shutdown cleanup.
Set CLOUDFLARED_BIN=/absolute/path/to/cloudflared if you want OmniRoute to use an existing binary instead of downloading one.

Using Docker Compose with Caddy (HTTPS Auto-TLS):

OmniRoute can be securely exposed using Caddy's automatic SSL provisioning. Ensure your domain's DNS A record points to your server's IP.

services:
  omniroute:
    image: diegosouzapw/omniroute:latest
    container_name: omniroute
    restart: unless-stopped
    volumes:
      - omniroute-data:/app/data
    environment:
      - PORT=20128
      - NEXT_PUBLIC_BASE_URL=https://your-domain.com

  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    command: caddy reverse-proxy --from https://your-domain.com --to http://omniroute:20128

volumes:
  omniroute-data:

| Image | Tag | Size | Description | | ------------------------ | -------- | ------ | --------------------- | | diegosouzapw/omniroute | latest | ~250MB | Latest stable release | | diegosouzapw/omniroute | 3.7.8 | ~250MB | Current version |

📖 Full Docker documentation: docs/DOCKER_GUIDE.md — Compose profiles, Caddy HTTPS, Cloudflare tunnels, and more.

📱 Multi-Platform — Run Anywhere

OmniRoute runs on Web, Desktop (Electron), Android (Termux), and as a Progressive Web App (PWA).

| Platform | Install | Highlights | | -------------- | -------------------------------------------- | -------------------------------------------------------------------------- | | 🖥️ Desktop | npm run electron:build | Native window, system tray, auto-start, offline mode — Windows/macOS/Linux | | 📱 Android | pkg install nodejs-lts && npx -y omniroute | ARM native, no root, 24/7 via Termux:Boot — your phone is an AI server | | 📲 PWA | "Add to Home Screen" in browser | Fullscreen, offline page, service worker caching — Android/iOS/Desktop |

Native Electron app with system tray, auto-start, native notifications
One-click install: NSIS (Windows), DMG (macOS), AppImage (Linux)
Dev: npm run electron:dev · Build: npm run electron:build
📖 Full docs: electron/README.md

pkg update && pkg install nodejs-lts python build-essential git
npx -y omniroute@latest

Access from any device on the same network: http://PHONE_IP:20128/v1

📖 Full guide: docs/TERMUX_GUIDE.md

Android (Chrome): ⋮ → "Add to Home screen"
iOS (Safari): Share → "Add to Home Screen"
Desktop (Chrome/Edge): Install icon in address bar
📖 Full docs: docs/PWA_GUIDE.md

🌍 Bypass Geographic Blocks — Use AI From Any Country

🇷🇺 🇨🇳 🇮🇷 🇨🇺 🇹🇷 In Russia, China, Iran, or any blocked region? OmniRoute's 3-level proxy system solves this completely.

| Level | Badge | Configure In | Use Case | | ------------------ | ----- | ------------------ | ------------------------------- | | Global | 🟢 | Settings → Proxy | All traffic through one proxy | | Per-Provider | 🟡 | Provider → Proxy | Only specific providers proxied | | Per-Connection | 🔵 | Connection → Proxy | Each API key uses its own proxy |

What gets proxied: API requests ✅ • OAuth flows ✅ • Connection tests ✅ • Token refresh ✅ • Model sync ✅

Protocols: HTTP/HTTPS, SOCKS5 (ENABLE_SOCKS5_PROXY=true), Authenticated proxies

🆓 1proxy — Free Proxy Marketplace

Contributed by @oyi77 — #1847

No proxy? Use the built-in 1proxy integration for hundreds of free, validated proxies worldwide:

One-click sync (up to 500 proxies) • Quality scores (0-100) • Country filter • Auto-rotation (quality/random/sequential) • Auto-degradation • Circuit breaker

Anti-Detection

🔒 TLS Fingerprint Spoofing — browser-like TLS via wreq-js
🔏 CLI Fingerprint Matching — matches native CLI binary signatures
🏠 Proxy IP Preservation — stealth + IP masking simultaneously

📖 Full proxy documentation: docs/PROXY_GUIDE.md

💰 Pricing at a Glance

| Tier | Provider | Cost | Quota Reset | Best For | | ------------------- | --------------------------- | ------------------------- | ---------------- | --------------------------------- | | 💳 SUBSCRIPTION | Claude Code (Pro) | $20/mo | 5h + weekly | Already subscribed | | | Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | | | Gemini CLI | FREE | 180K/mo + 1K/day | Everyone! | | | GitHub Copilot | $10-19/mo | Monthly | GitHub users | | 🔑 API KEY | NVIDIA NIM | FREE (dev forever) | ~40 RPM | 70+ open models | | | Cerebras | FREE (1M tok/day) | 60K TPM / 30 RPM | World's fastest | | | Groq | FREE (30 RPM) | 14.4K RPD | Ultra-fast Llama/Gemma | | | DeepSeek V3.2 | $0.27/$1.10 per 1M | None | Best price/quality reasoning | | | xAI Grok-4 Fast | $0.20/$0.50 per 1M 🆕 | None | Fastest + tool calling, ultralow | | | xAI Grok-4 (standard) | $0.20/$1.50 per 1M 🆕 | None | Reasoning flagship from xAI | | | Mistral | Free trial + paid | Rate limited | European AI | | | OpenRouter | Pay-per-use | None | 100+ models aggr. | | | AgentRouter 🆕 | Pay-per-use | None | $200 free credits at signup | | 💰 CHEAP | GLM-5 (via Z.AI) 🆕 | $0.5/1M | Daily 10AM | 128K output, newest flagship | | | GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup | | | MiniMax M2.5 🆕 | $0.3/1M input | 5-hour rolling | Reasoning + agentic tasks | | | MiniMax M2.1 | $0.2/1M | 5-hour rolling | Cheapest option | | | Kimi K2.5 (Moonshot API) 🆕 | Pay-per-use | None | Direct Moonshot API access | | | Kimi K2 | $9/mo flat | 10M tokens/mo | Predictable cost | | 🆓 FREE | Qoder | $0 | Unlimited | 5 models unlimited | | | Qwen | $0 | Unlimited | 4 models unlimited | | | Kiro | $0 | Unlimited | Claude Sonnet/Haiku (AWS Builder) | | | LongCat Flash-Lite 🆕 | $0 (50M tok/day 🔥) | 1 RPS | Largest free quota on Earth | | | Pollinations AI 🆕 | $0 (no key needed) | 1 req/15s | GPT-5, Claude, DeepSeek, Llama 4 | | | Cloudflare Workers AI 🆕 | $0 (10K Neurons/day) | ~150 resp/day | 50+ models, global edge | | | Scaleway AI 🆕 | $0 (1M tokens total) | Rate limited | EU/GDPR, Qwen3 235B, Llama 70B |

🆕 New models added (Mar 2026): Grok-4 Fast family at $0.20/$0.50/M (benchmarked at 1143ms — 30% faster than Gemini 2.5 Flash), GLM-5 via Z.AI with 128K output, MiniMax M2.5 reasoning, DeepSeek V3.2 updated pricing, Kimi K2.5 via Moonshot direct API.

💡 See the full $0 Free Stack (11 providers) below.

💡 Understanding Dashboard Costs:

The "cost" displayed in the Usage Analytics page is for tracking and comparison purposes only. OmniRoute itself never charges you anything — it's free, open-source software running on your machine. If your dashboard shows "$290 total cost" while using free models, that's how much you saved compared to paid API pricing. Think of it as a savings tracker, not a bill.

🆓 Free Models — 11 Providers, $0 Forever

Combine all free providers into one unbreakable combo — OmniRoute auto-routes between them when quota runs out.

| Provider | Prefix | Free Models | Quota | | ----------------- | ----------- | ------------------------------------------------------------- | -------------------- | | Kiro | kr/ | Claude Sonnet 4.5, Haiku 4.5, Opus 4.6 | 50 CREDITS per month | | Qoder | if/ | kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1 | ♾️ Unlimited | | Qwen | qw/ | qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next | ♾️ Unlimited | | Pollinations | pol/ | GPT-5, Claude, Gemini, DeepSeek, Llama 4, Mistral | No key needed | | LongCat | lc/ | LongCat-Flash-Lite | 50M tokens/day 🔥 | | Gemini CLI | gc/ | gemini-3-flash, gemini-2.5-pro | 180K tok/mo | | Cloudflare AI | cf/ | 50+ models (Llama, Gemma, Mistral, Whisper) | 10K Neurons/day | | Groq | groq/ | Llama 3.3 70B, Qwen3 32B, Kimi K2 | 14.4K RPD | | NVIDIA NIM | nvidia/ | 129 models (DeepSeek, Llama, GLM, Kimi) | ~40 RPM | | Cerebras | cerebras/ | Qwen3 235B, GPT-OSS 120B, Llama 3.1 | 1M tok/day | | Scaleway | scw/ | Qwen3 235B, Llama 70B, DeepSeek V3 | 1M tokens (EU) |

Also free (API Key required): Mistral (1B tok/month) · OpenRouter (35+ :free models) · GitHub Models (GPT-5, 45+ models) · Cohere (1K calls/month) · Z.AI/GLM (permanent free Flash models) · SiliconFlow (1K RPM, 50K TPM) · Kilo Code (~200 req/hr auto-router) · HuggingFace ($0.10/mo credits) · Ollama Cloud (400+ models) · LLM7.io (30+ models) · Kluster AI · IBM watsonx (300K tok/month) · OpenCode Zen · Vercel AI Gateway ($5/mo)

Trial credits (one-time): Baseten ($30) · NLP Cloud ($15) · AI21 ($10) · Upstage ($10) · SambaNova ($5) · Modal ($5/mo) · Fireworks ($1) · Nebius ($1) · Inference.net ($1 + $25 survey) · Hyperbolic ($1) · Novita ($0.50)

China-based (free tiers): ModelScope · Tencent Hunyuan · Volcengine · ChatAnywhere · InternAI · Bigmodel

Combined capacity: ~31,000+ RPD · ~32B+ tokens/month · 500+ models · $0

📖 Complete free provider directory: docs/FREE_TIERS.md — 25+ providers, quotas, base URLs, model tables, and OmniRoute combo setup.

🎙️ Free Transcription Combo

Transcribe any audio/video for $0 — Deepgram leads with $200 free, AssemblyAI $50 fallback, Groq Whisper as unlimited emergency backup.

| Provider | Free Credits | Best Model | Rate Limit | | ----------------- | ---------------------- | -------------------------------------------- | ---------------------------- | | 🟢 Deepgram | $200 free (signup) | nova-3 — best accuracy, 30+ languages | No RPM limit on free credits | | 🔵 AssemblyAI | $50 free (signup) | universal-3-pro — chapters, sentiment, PII | No RPM limit on free credits | | 🔴 Groq | Free forever | whisper-large-v3 — OpenAI Whisper | 30 RPM (rate limited) |

Suggested combo in /dashboard/combos:

Name: free-transcription
Strategy: Priority
Nodes:
  [1] deepgram/nova-3          → uses $200 free first
  [2] assemblyai/universal-3-pro → fallback when Deepgram credits run out
  [3] groq/whisper-large-v3    → free forever, emergency fallback

Then in /dashboard/media → Transcription tab: upload any audio or video file → select your combo endpoint → get transcription in supported formats.

💡 Key Features

4,690+ automated tests across 517 test files. Not just a relay — a full operational platform.

| Feature | Why It Matters | | ---------------------------------------------------------------------------------------------------- | -------------------------------- | | 🧠 Smart 4-Tier Fallback — Subscription → API → Cheap → Free | Never stop coding, zero downtime | | 🔄 Format Translation — OpenAI ↔ Claude ↔ Gemini ↔ Responses API | Works with ANY