MCP Server / lemonade
lemonade
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
Transport
Tools (5)
Platform
Build
Speech-to-text
Text-to-speech
Python
C++
PHP
Dokumentation
🍋 Lemonade: Refreshingly fast local AI
Lemonade is the local AI server that gives you the same capabilities as cloud APIs, except 100% free and private. Use the latest models for chat, coding, speech, and image generation on your own NPU and GPU.
Lemonade comes in two flavors:
- Lemonade Server installs a service you can connect to hundreds of great apps using standard OpenAI, Anthropic, and Ollama APIs.
- Embeddable Lemonade is a portable binary you can package into your own application to give it multi-modal local AI that auto-optimizes for your user’s PC.
This project is built by the community for every PC, with optimizations by AMD engineers to get the most from Ryzen AI, Radeon, and Strix Halo PCs.
Getting Started
- Install: Windows · Linux · macOS (beta) · Docker · Source
- Get Models: Browse and download with the Model Manager
- Generate: Try models with the built-in interfaces for chat, image gen, speech gen, and more
- Mobile: Take your lemonade to go: iOS · Android · Source
- Connect: Use Lemonade with your favorite apps:
Supported Platforms
| Platform | Build | |----------|-------| | | | | | | | | | | | | | | | | | | | | | | | |
Using the CLI
To run and chat with Gemma:
lemonade run Gemma-4-E2B-it-GGUF
To code with Lemonade models:
lemonade launch claude
Multi-modality:
# image gen
lemonade run SDXL-Turbo
# speech gen
lemonade run kokoro-v1
# transcription
lemonade run Whisper-Large-v3-Turbo
To see available models and download them:
lemonade list
lemonade pull Gemma-4-E2B-it-GGUF
To see the backends available on your PC:
lemonade backends
Model Library
Lemonade supports a wide variety of LLMs (GGUF, FLM, and ONNX), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.
Use lemonade pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.
Supported Configurations
Lemonade supports multiple inference engines for LLM, speech, TTS, and image generation, and each has its own backend and hardware requirements.
To check exactly which recipes/backends are supported on your own machine, run:
lemonade backends
Project Roadmap
| Under Development | Under Consideration | Recently Completed | |---------------------------|-----------------------------|------------------------| | Native multi-modal tool calling | vLLM support | Embeddable binary release | | More whisper.cpp backends | Port app to Tauri | Image generation | | More SD.cpp backends | | Speech-to-text | | MLX support | | Text-to-speech | | | | Apps marketplace |
Integrate Embeddable Lemonade in You Application
Embeddable Lemonade is a binary version of Lemonade that you can bundle into your own app to give it a portable, auto-optimizing, multi-modal local AI stack. This lets users focus on your app, with zero Lemonade installers, branding, or telemetry.
Check out the Embeddable Lemonade guide.
Connect Lemonade Server to Your Application
You can use any OpenAI-compatible client library by configuring it to use http://localhost:13305/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP | |--------|-----|------|----|---------|----|-------|------|-----| | openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
Python Client Example
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:13305/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Gemma-4-E2B-it-GGUF", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
For more detailed integration instructions, see the Integration Guide.
FAQ
To read our frequently asked questions, see our FAQ Guide
Contributing
Lemonade is built by the local AI community! If you would like to contribute to this project, please check out our contribution guide.
Maintainers
This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @kenvandine @Geramy @ramkrishna2910 @sawansri @siavashhub @sofiageo @superm1 @vgodsoe, and sponsored by AMD. You can reach us by filing an issue, emailing [email protected], or joining our Discord.
Code Signing Policy
Free code signing provided by SignPath.io, certificate by SignPath Foundation.
- Committers and reviewers: Maintainers of this repo
- Approvers: Owners
Privacy policy: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from Hugging Face Hub (see their privacy policy).
License and Attribution
This project is:
- Built with C++ (server) and React (app) with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in NOTICE.md.