MCP Server / ask llm

ask llm

MCP server for AI-to-AI collaboration — bridge Claude with Gemini, Codex, and other LLMs for code review, second opinions, and plan debate

★ 6von @LykhoydaMITGitHub →

Installation

Claude Code

claude mcp add ask-llm -- npx -y ask-llm-mcp

npx

npx -y ask-llm-mcp

npm: ask-llm-mcp

Transport

stdio

Tools (17)

Package

Type

Downloads

Feature

Description

ask-gemini

ask-gemini-mcp

ask-gemini-edit

ask-gemini-mcp

fetch-chunk

ask-gemini-mcp

ask-codex

ask-codex-mcp

ask-ollama

ask-ollama-mcp

ask-llm

ask-llm-mcp

multi-llm

ask-llm-mcp

get-usage-stats

all

diagnose

ask-llm-mcp

ping

all

Provider

Default

Gemini

`gemini-3.1-pro-preview`

Codex

`gpt-5.5`

Ollama

`qwen2.5-coder:7b`

Dokumentation

Ask LLM

| Package | Type | Version | Downloads | |---------|------|---------|-----------| | ask-gemini-mcp | MCP Server | | | | ask-codex-mcp | MCP Server | | | | ask-ollama-mcp | MCP Server | | | | ask-llm-mcp | MCP Server | | | | @ask-llm/plugin | Claude Code Plugin | | /plugin install |

MCP servers + Claude Code plugin for AI-to-AI collaboration

MCP servers that bridge your AI client with multiple LLM providers for AI-to-AI collaboration. Works with Claude Code, Claude Desktop, Cursor, Warp, Copilot, and 40+ other MCP clients. Leverage Gemini's 1M+ token context, Codex's GPT-5.5, or local Ollama models — all via standard MCP.

Why?

Get a second opinion — Ask another AI to review your coding approach before committing
Debate plans — Send architecture proposals for critique and alternative suggestions
Review changes — Have multiple AIs analyze diffs to catch issues your primary AI might miss
Massive context — Gemini reads entire codebases (1M+ tokens) that would overflow other models
Local & private — Use Ollama for reviews where no data leaves your machine

Quick Start

Claude Code

# All-in-one — auto-detects installed providers
claude mcp add --scope user ask-llm -- npx -y ask-llm-mcp

claude mcp add --scope user gemini -- npx -y ask-gemini-mcp
claude mcp add --scope user codex -- npx -y ask-codex-mcp
claude mcp add --scope user ollama -- npx -y ask-ollama-mcp

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "ask-llm": {
      "command": "npx",
      "args": ["-y", "ask-llm-mcp"]
    }
  }
}

{
  "mcpServers": {
    "gemini": {
      "command": "npx",
      "args": ["-y", "ask-gemini-mcp"]
    },
    "codex": {
      "command": "npx",
      "args": ["-y", "ask-codex-mcp"]
    },
    "ollama": {
      "command": "npx",
      "args": ["-y", "ask-ollama-mcp"]
    }
  }
}

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "ask-llm": { "command": "npx", "args": ["-y", "ask-llm-mcp"] }
  }
}

Codex CLI (~/.codex/config.toml):

[mcp_servers.ask-llm]
command = "npx"
args = ["-y", "ask-llm-mcp"]

Any MCP Client (STDIO transport):

{ "command": "npx", "args": ["-y", "ask-llm-mcp"] }

Replace ask-llm-mcp with ask-gemini-mcp, ask-codex-mcp, or ask-ollama-mcp for a single provider.

Claude Code Plugin

The Ask LLM plugin adds multi-provider code review, brainstorming, and automated hooks directly into Claude Code:

/plugin marketplace add Lykhoyda/ask-llm
/plugin install ask-llm@ask-llm-plugins

What You Get

| Feature | Description | |:---|:---| | /multi-review | Parallel Gemini + Codex review with 4-phase validation pipeline and consensus highlighting | | /gemini-review | Gemini-only review with confidence filtering | | /codex-review | Codex-only review with confidence filtering | | /ollama-review | Local review — no data leaves your machine | | /brainstorm | Multi-LLM brainstorm: Claude Opus researches the topic against real files in parallel with external providers (Gemini/Codex/Ollama), then synthesizes all findings with verified findings weighted higher | | /compare | Side-by-side raw responses from multiple providers, no synthesis — for when you want to see how each provider phrases the same answer | | Pre-commit hook | Reviews staged changes before git commit, warns about critical issues |

The review agents use a 4-phase pipeline inspired by Anthropic's code-review plugin: context gathering, prompt construction with explicit false-positive exclusions, synthesis, and source-level validation of each finding.

See the plugin docs for details.

Prerequisites

Node.js v20.0.0 or higher (LTS)
At least one provider:
- Gemini CLI — npm install -g @google/gemini-cli && gemini login
- Codex CLI — installed and authenticated
- Ollama — running locally with a model pulled (ollama pull qwen2.5-coder:7b)

MCP Tools

| Tool | Package | Purpose | |------|---------|---------| | ask-gemini | ask-gemini-mcp | Send prompts to Gemini CLI with @ file syntax. 1M+ token context. Live progressive output via stream-json | | ask-gemini-edit | ask-gemini-mcp | Get structured OLD/NEW code edit blocks from Gemini | | fetch-chunk | ask-gemini-mcp | Retrieve chunks from cached large responses | | ask-codex | ask-codex-mcp | Send prompts to Codex CLI. GPT-5.5 with mini fallback. Native session resume via sessionId | | ask-ollama | ask-ollama-mcp | Send prompts to local Ollama. Fully private, zero cost. Server-side conversation replay via sessionId | | ask-llm | ask-llm-mcp | Unified orchestrator — pick provider per call. Fan out to all installed providers | | multi-llm | ask-llm-mcp | Dispatch the same prompt to multiple providers in parallel; returns per-provider responses + usage in one call | | get-usage-stats | all | Per-session token totals, fallback counts, breakdowns by provider/model — all in-memory, no persistence | | diagnose | ask-llm-mcp | Self-diagnosis: Node version, PATH resolution, provider CLI presence + versions. Read-only | | ping | all | Connection test — verify MCP setup |

All ask-* tools accept an optional sessionId parameter for multi-turn conversations and now return a structured AskResponse (provider, response, model, sessionId, usage) via MCP outputSchema alongside the human-readable text. The orchestrator (ask-llm-mcp) also exposes usage://current-session as an MCP Resource for live JSON snapshots.

Usage Examples

ask gemini to review the changes in @src/auth.ts for security issues
ask codex to suggest a better algorithm for @src/sort.ts
ask ollama to explain @src/config.ts (runs locally, no data sent anywhere)
use gemini to summarize @. the current directory
use multi-llm to compare what gemini and codex think about this approach

CLI Subcommands

The orchestrator binary (ask-llm-mcp) supports two CLI modes alongside the default MCP server:

# Interactive multi-provider REPL — switch providers, persist sessions, see usage live
npx ask-llm-mcp repl

# Diagnose your setup — Node version, PATH, provider CLI versions, env vars
npx ask-llm-mcp doctor          # human-readable
npx ask-llm-mcp doctor --json   # machine-readable, exit 1 on error

The REPL ships sessions per provider (/provider gemini, /provider codex, /new, /sessions, /usage) and inherits all the executor behavior (quota fallback, stream-json output for Gemini, native session resume).

Models

| Provider | Default | Fallback | |----------|---------|----------| | Gemini | gemini-3.1-pro-preview | gemini-3-flash-preview (on quota) | | Codex | gpt-5.5 | gpt-5.5-mini (on quota) | | Ollama | qwen2.5-coder:7b | qwen2.5-coder:1.5b (if not found) |

All providers automatically fall back to a lighter model on errors.

Documentation

Docs site: lykhoyda.github.io/ask-llm
AI-readable: llms.txt | llms-full.txt

Contributing

Contributions are welcome! See open issues for things to work on.

License

MIT License. See LICENSE for details.

Disclaimer: This is an unofficial, third-party tool and is not affiliated with, endorsed, or sponsored by Google or OpenAI.