Free LLM API — Unified Multi-Provider AI Gateway
Route requests to OpenAI, Anthropic Claude, Google Gemini, Groq, Together AI, and DeepSeek through a single REST API. OpenAI-compatible endpoint, response caching, automatic retries. 24+ models, one interface.
Why Use an LLM Router?
Building with LLMs means juggling multiple providers. OpenAI for GPT-4o, Anthropic for Claude, Google for Gemini, Groq for lightning-fast inference. Each has different authentication, request formats, and response schemas.
An LLM router gives you one unified API for all of them. Send the same request format to any model, get back a consistent response. Switch models by changing a single string — no code changes, no new SDKs, no integration work.
The Agent LLM Router is a free, self-hostable LLM gateway that routes to 6 providers and 24+ models through a single REST endpoint. It includes an OpenAI-compatible /v1/chat/completions endpoint, so any tool or library that speaks the OpenAI format works out of the box.
6 Providers
OpenAI, Anthropic, Google Gemini, Groq, Together AI, DeepSeek — all through one endpoint.
24+ Models
GPT-4o, Claude Opus, Gemini 2.0, Llama 3.3, Mixtral, DeepSeek Reasoner, and more.
OpenAI Compatible
Drop-in /v1/chat/completions endpoint. Use existing OpenAI SDKs and tools unchanged.
Response Caching
Automatic 5-minute cache for deterministic requests. Save money on repeated queries.
Auto Retries
Failed requests automatically retry once with a 1-second backoff. Built-in resilience.
Provider Key Storage
Store your provider API keys once, then omit them from subsequent requests.
Quick Start
Send a chat completion to any supported model in one request. You need your own provider API key (e.g., an OpenAI key for GPT models).
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
"provider_key": "sk-your-openai-key"
}'import requests
response = requests.post(
"https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat",
json={
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
"provider_key": "sk-your-openai-key"
}
)
data = response.json()
print(data["choices"][0]["message"]["content"])
print(f"Tokens used: {data['usage']['total_tokens']}")const response = await fetch(
"https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat",
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing in one sentence." }
],
provider_key: "sk-your-openai-key"
})
}
);
const data = await response.json();
console.log(data.choices[0].message.content);
console.log(`Tokens: ${data.usage.total_tokens}`);choices[0].message.content always has the text, usage always has token counts.
Supported Providers & Models
The router automatically detects the provider from the model name. Pass any model string and it routes to the right provider.
| Provider | Models | Notes |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini, o3-mini | 7 models. Full OpenAI API compatibility. |
| Anthropic | claude-opus-4-20250514, claude-sonnet-4-20250514, claude-haiku-4-5-20251001, claude-3-5-sonnet-20241022 | 4 models. Messages API format auto-converted. |
gemini-2.0-flash, gemini-2.0-flash-lite, gemini-1.5-pro, gemini-1.5-flash | 4 models. Gemini API format auto-converted. | |
| Groq | llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it | 4 models. Ultra-fast inference. |
| Together AI | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo, meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo, mistralai/Mixtral-8x7B-Instruct-v0.1 | 3 models. Open-source model hosting. |
| DeepSeek | deepseek-chat, deepseek-reasoner | 2 models. Reasoning-optimized. |
curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/models — returns all available models with their provider info. No API key needed.
API Reference
| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST | /api/chat | Key | Chat completion with any model |
POST | /v1/chat/completions | Key | OpenAI-compatible drop-in endpoint |
GET | /api/models | No | List all available models |
GET | /api/providers | No | List supported providers |
POST | /api/keys/provider | Key | Store a provider API key |
OpenAI-Compatible Endpoint
The /v1/chat/completions endpoint is a drop-in replacement for the OpenAI API. Any library, tool, or framework that uses the OpenAI format works without modification — just change the base URL.
Use with the OpenAI Python SDK
from openai import OpenAI
# Point the SDK at the LLM router instead of OpenAI
client = OpenAI(
base_url="https://agent-gateway-kappa.vercel.app/v1/agent-llm",
api_key="YOUR_GATEWAY_KEY" # your Agent Gateway key
)
# Use any supported model — even non-OpenAI ones
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Write a haiku about APIs"}
],
extra_headers={"X-Provider-Key": "sk-your-openai-key"}
)
print(response.choices[0].message.content)Use with the OpenAI Node.js SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://agent-gateway-kappa.vercel.app/v1/agent-llm",
apiKey: "YOUR_GATEWAY_KEY",
defaultHeaders: { "X-Provider-Key": "sk-your-openai-key" }
});
const completion = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Write a haiku about APIs" }]
});
console.log(completion.choices[0].message.content);"gpt-4o-mini" to "claude-sonnet-4-20250514" and pass an Anthropic key via X-Provider-Key header. Same code, different model.
Using Different Providers
The router normalizes all provider-specific formats. Here are examples for each major provider:
Anthropic Claude
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "Compare REST and GraphQL in 3 bullets."}
],
"provider_key": "sk-ant-your-anthropic-key",
"max_tokens": 500
}'Google Gemini
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"provider_key": "your-google-api-key"
}'Groq (Ultra-Fast Inference)
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "Explain recursion with a Python example."}
],
"provider_key": "gsk_your-groq-key"
}'DeepSeek Reasoner
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-reasoner",
"messages": [
{"role": "user", "content": "Solve: What is the derivative of x^3 * sin(x)?"}
],
"provider_key": "sk-your-deepseek-key"
}'Response Caching
The router automatically caches responses for deterministic requests (temperature 0 or unset, non-streaming). Cache key is derived from the model, messages, temperature, and max_tokens.
- TTL: 5 minutes (300 seconds)
- Max entries: 1,000 (LRU eviction)
- Cache indicator: Cached responses include
"cached": truein the response - Bypass cache: Set
"temperature": 0.1or higher to skip caching
# This request will be cached
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is 2+2?"}],"provider_key":"sk-..."}'
# Same request within 5 min — instant response from cache
# Response includes: "cached": trueProvider Key Storage
Instead of passing your provider key in every request, store it once and the router will use it automatically.
# Step 1: Store your provider key
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/keys/provider \
-H "Authorization: Bearer YOUR_GATEWAY_KEY" \
-H "Content-Type: application/json" \
-d '{"provider": "openai", "provider_key": "sk-your-openai-key"}'
# Step 2: Now make requests without provider_key
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
-H "Authorization: Bearer YOUR_GATEWAY_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello!"}]}'import requests
BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-llm"
HEADERS = {"Authorization": "Bearer YOUR_GATEWAY_KEY"}
# Store your OpenAI key once
requests.post(f"{BASE}/api/keys/provider",
headers=HEADERS,
json={"provider": "openai", "provider_key": "sk-..."})
# Now chat without provider_key
resp = requests.post(f"{BASE}/api/chat",
headers=HEADERS,
json={"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello!"}]})
print(resp.json()["choices"][0]["message"]["content"])Use Cases
Model A/B Testing
Compare GPT-4o vs Claude Opus vs Gemini Pro on the same prompts without rewriting integration code. Switch models by changing one string.
AI Agent Orchestration
Use cheap/fast models (GPT-4o-mini, Groq Llama) for routine tasks and expensive/smart models (GPT-4o, Claude Opus) for complex reasoning — one API for both.
Fallback & Redundancy
If OpenAI is down, switch to Anthropic or Groq. The unified format means your error handling code works the same across all providers.
Cost Optimization
Route simple queries to cheap models (gpt-3.5-turbo, llama-3.1-8b-instant) and complex ones to powerful models. Response caching saves money on repeated queries.
Multi-Model Pipeline Example
Build a pipeline that uses cheap models for classification and expensive models for generation:
import requests
BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat"
def llm_call(model, messages, provider_key):
resp = requests.post(BASE, json={
"model": model,
"messages": messages,
"provider_key": provider_key
})
return resp.json()["choices"][0]["message"]["content"]
user_question = "Explain the difference between TCP and UDP"
# Step 1: Classify complexity with a fast/cheap model
complexity = llm_call(
model="llama-3.1-8b-instant",
messages=[{"role": "user", "content":
f"Rate this question's complexity as 'simple' or 'complex': {user_question}"}],
provider_key="gsk_your-groq-key"
)
# Step 2: Route to appropriate model
if "complex" in complexity.lower():
answer = llm_call(
model="gpt-4o",
messages=[{"role": "user", "content": user_question}],
provider_key="sk-your-openai-key"
)
else:
answer = llm_call(
model="gpt-4o-mini",
messages=[{"role": "user", "content": user_question}],
provider_key="sk-your-openai-key"
)
print(f"Complexity: {complexity}")
print(f"Answer: {answer}")Comparison: LLM Router vs Direct Provider APIs
| Feature | Agent LLM Router | OpenRouter | LiteLLM | Direct APIs |
|---|---|---|---|---|
| Providers | 6 (OpenAI, Anthropic, Google, Groq, Together, DeepSeek) | 50+ | 100+ | 1 per SDK |
| Setup | Zero — just call the URL | Account required | Self-host + pip install | SDK per provider |
| OpenAI Compatible | Yes (/v1/chat/completions) | Yes | Yes | OpenAI only |
| Response Caching | Built-in (5 min TTL) | No | Optional (Redis) | No |
| Auto Retries | Yes (2 attempts) | Yes | Yes | SDK-dependent |
| Cost | Free (proxy only) | Markup on tokens | Free (self-hosted) | Direct pricing |
| Your Provider Keys | Yes — BYOK | Uses their keys | Yes — BYOK | Yes |
| Key Storage | Yes (per gateway key) | N/A | Config file | Env vars |
| Self-Hostable | Yes (Node.js) | No | Yes (Python) | N/A |
Discover Available Models
Query the models and providers endpoints to see everything that's available:
# List all 24+ models with provider info
curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/models | jq .
# List providers and their model counts
curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/providers | jq .Example response from /api/models:
{
"models": [
{"model": "gpt-4o", "provider": "openai", "providerName": "OpenAI"},
{"model": "gpt-4o-mini", "provider": "openai", "providerName": "OpenAI"},
{"model": "claude-opus-4-20250514", "provider": "anthropic", "providerName": "Anthropic"},
{"model": "gemini-2.0-flash", "provider": "google", "providerName": "Google (Gemini)"},
{"model": "llama-3.3-70b-versatile", "provider": "groq", "providerName": "Groq"},
{"model": "deepseek-chat", "provider": "deepseek", "providerName": "DeepSeek"}
],
"total": 24
}Error Handling
The router returns clear error messages with upstream provider details:
# Missing provider key
{"error": "Provider API key required. Pass as \"provider_key\" in body or set via POST /api/keys/provider"}
# Unknown model
{"error": "Unknown model: gpt-5. Use GET /api/models to see available models."}
# Upstream provider error
{"error": "OpenAI API error 429: Rate limit exceeded"}When a request fails, the router refunds the credit automatically. You only pay for successful completions.
Frequently Asked Questions
stream parameter and forwards it to providers. Streaming disables response caching. The response format follows the provider's streaming implementation."cached": true./v1/chat/completions endpoint is OpenAI-compatible. Set the base URL to the router endpoint and it works with LangChain, LlamaIndex, AutoGen, CrewAI, and any framework that supports OpenAI's API format.Start Routing LLM Requests
One API, 6 providers, 24+ models. No signup, no credit card.
Get Your Free API KeyMore Free APIs
The Agent LLM Router is part of a suite of 39+ free APIs for developers and AI agents:
- Free Web Scraping API — extract content from any URL
- Free Screenshot API — capture website screenshots
- Free Code Execution API — run Python, JavaScript, Bash
- Free AI Agent APIs — build autonomous agents
- Free Crypto Price API — real-time token prices
- Free Uptime Monitoring API — monitor URLs & get alerts
- Free Webhook Testing Tool — inspect HTTP callbacks