Free LLM API — Unified Multi-Provider AI Gateway

Route requests to OpenAI, Anthropic Claude, Google Gemini, Groq, Together AI, and DeepSeek through a single REST API. OpenAI-compatible endpoint, response caching, automatic retries. 24+ models, one interface.

LLM API AI Gateway OpenAI Compatible Multi-Model March 2026 12 min read

Why Use an LLM Router?

Building with LLMs means juggling multiple providers. OpenAI for GPT-4o, Anthropic for Claude, Google for Gemini, Groq for lightning-fast inference. Each has different authentication, request formats, and response schemas.

An LLM router gives you one unified API for all of them. Send the same request format to any model, get back a consistent response. Switch models by changing a single string — no code changes, no new SDKs, no integration work.

The Agent LLM Router is a free, self-hostable LLM gateway that routes to 6 providers and 24+ models through a single REST endpoint. It includes an OpenAI-compatible /v1/chat/completions endpoint, so any tool or library that speaks the OpenAI format works out of the box.

6 Providers

OpenAI, Anthropic, Google Gemini, Groq, Together AI, DeepSeek — all through one endpoint.

24+ Models

GPT-4o, Claude Opus, Gemini 2.0, Llama 3.3, Mixtral, DeepSeek Reasoner, and more.

OpenAI Compatible

Drop-in /v1/chat/completions endpoint. Use existing OpenAI SDKs and tools unchanged.

Response Caching

Automatic 5-minute cache for deterministic requests. Save money on repeated queries.

Auto Retries

Failed requests automatically retry once with a 1-second backoff. Built-in resilience.

Provider Key Storage

Store your provider API keys once, then omit them from subsequent requests.

Quick Start

Send a chat completion to any supported model in one request. You need your own provider API key (e.g., an OpenAI key for GPT models).

curl
Python
Node.js
Chat with GPT-4o-mini
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one sentence."}
    ],
    "provider_key": "sk-your-openai-key"
  }'
Chat with GPT-4o-mini
import requests

response = requests.post(
    "https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat",
    json={
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in one sentence."}
        ],
        "provider_key": "sk-your-openai-key"
    }
)

data = response.json()
print(data["choices"][0]["message"]["content"])
print(f"Tokens used: {data['usage']['total_tokens']}")
Chat with GPT-4o-mini
const response = await fetch(
  "https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat",
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "gpt-4o-mini",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain quantum computing in one sentence." }
      ],
      provider_key: "sk-your-openai-key"
    })
  }
);

const data = await response.json();
console.log(data.choices[0].message.content);
console.log(`Tokens: ${data.usage.total_tokens}`);
Response format: All responses are normalized to the OpenAI format — regardless of which provider you use. choices[0].message.content always has the text, usage always has token counts.

Supported Providers & Models

The router automatically detects the provider from the model name. Pass any model string and it routes to the right provider.

ProviderModelsNotes
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo, o1, o1-mini, o3-mini7 models. Full OpenAI API compatibility.
Anthropicclaude-opus-4-20250514, claude-sonnet-4-20250514, claude-haiku-4-5-20251001, claude-3-5-sonnet-202410224 models. Messages API format auto-converted.
Googlegemini-2.0-flash, gemini-2.0-flash-lite, gemini-1.5-pro, gemini-1.5-flash4 models. Gemini API format auto-converted.
Groqllama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it4 models. Ultra-fast inference.
Together AImeta-llama/Meta-Llama-3.1-70B-Instruct-Turbo, meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo, mistralai/Mixtral-8x7B-Instruct-v0.13 models. Open-source model hosting.
DeepSeekdeepseek-chat, deepseek-reasoner2 models. Reasoning-optimized.
Discover models programmatically: curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/models — returns all available models with their provider info. No API key needed.

API Reference

MethodEndpointAuthDescription
POST/api/chatKeyChat completion with any model
POST/v1/chat/completionsKeyOpenAI-compatible drop-in endpoint
GET/api/modelsNoList all available models
GET/api/providersNoList supported providers
POST/api/keys/providerKeyStore a provider API key

OpenAI-Compatible Endpoint

The /v1/chat/completions endpoint is a drop-in replacement for the OpenAI API. Any library, tool, or framework that uses the OpenAI format works without modification — just change the base URL.

Use with the OpenAI Python SDK

OpenAI SDK — routed through Agent LLM
from openai import OpenAI

# Point the SDK at the LLM router instead of OpenAI
client = OpenAI(
    base_url="https://agent-gateway-kappa.vercel.app/v1/agent-llm",
    api_key="YOUR_GATEWAY_KEY"  # your Agent Gateway key
)

# Use any supported model — even non-OpenAI ones
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Write a haiku about APIs"}
    ],
    extra_headers={"X-Provider-Key": "sk-your-openai-key"}
)

print(response.choices[0].message.content)

Use with the OpenAI Node.js SDK

OpenAI SDK — Node.js
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://agent-gateway-kappa.vercel.app/v1/agent-llm",
  apiKey: "YOUR_GATEWAY_KEY",
  defaultHeaders: { "X-Provider-Key": "sk-your-openai-key" }
});

const completion = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Write a haiku about APIs" }]
});

console.log(completion.choices[0].message.content);
Switch models instantly: Change "gpt-4o-mini" to "claude-sonnet-4-20250514" and pass an Anthropic key via X-Provider-Key header. Same code, different model.

Using Different Providers

The router normalizes all provider-specific formats. Here are examples for each major provider:

Anthropic Claude

Chat with Claude Sonnet
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {"role": "system", "content": "You are a concise technical writer."},
      {"role": "user", "content": "Compare REST and GraphQL in 3 bullets."}
    ],
    "provider_key": "sk-ant-your-anthropic-key",
    "max_tokens": 500
  }'

Google Gemini

Chat with Gemini 2.0 Flash
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.0-flash",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "provider_key": "your-google-api-key"
  }'

Groq (Ultra-Fast Inference)

Chat with Llama 3.3 70B via Groq
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "user", "content": "Explain recursion with a Python example."}
    ],
    "provider_key": "gsk_your-groq-key"
  }'

DeepSeek Reasoner

Chat with DeepSeek Reasoner
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-reasoner",
    "messages": [
      {"role": "user", "content": "Solve: What is the derivative of x^3 * sin(x)?"}
    ],
    "provider_key": "sk-your-deepseek-key"
  }'

Response Caching

The router automatically caches responses for deterministic requests (temperature 0 or unset, non-streaming). Cache key is derived from the model, messages, temperature, and max_tokens.

First request — hits provider, cached
# This request will be cached
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is 2+2?"}],"provider_key":"sk-..."}'

# Same request within 5 min — instant response from cache
# Response includes: "cached": true

Provider Key Storage

Instead of passing your provider key in every request, store it once and the router will use it automatically.

curl
Python
Store your OpenAI key
# Step 1: Store your provider key
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/keys/provider \
  -H "Authorization: Bearer YOUR_GATEWAY_KEY" \
  -H "Content-Type: application/json" \
  -d '{"provider": "openai", "provider_key": "sk-your-openai-key"}'

# Step 2: Now make requests without provider_key
curl -X POST https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat \
  -H "Authorization: Bearer YOUR_GATEWAY_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello!"}]}'
Store key & use without provider_key
import requests

BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-llm"
HEADERS = {"Authorization": "Bearer YOUR_GATEWAY_KEY"}

# Store your OpenAI key once
requests.post(f"{BASE}/api/keys/provider",
    headers=HEADERS,
    json={"provider": "openai", "provider_key": "sk-..."})

# Now chat without provider_key
resp = requests.post(f"{BASE}/api/chat",
    headers=HEADERS,
    json={"model": "gpt-4o-mini",
          "messages": [{"role": "user", "content": "Hello!"}]})

print(resp.json()["choices"][0]["message"]["content"])
Provider keys are stored in memory. They persist as long as the server is running. If the server restarts, you'll need to re-store your keys.

Use Cases

Model A/B Testing

Compare GPT-4o vs Claude Opus vs Gemini Pro on the same prompts without rewriting integration code. Switch models by changing one string.

AI Agent Orchestration

Use cheap/fast models (GPT-4o-mini, Groq Llama) for routine tasks and expensive/smart models (GPT-4o, Claude Opus) for complex reasoning — one API for both.

Fallback & Redundancy

If OpenAI is down, switch to Anthropic or Groq. The unified format means your error handling code works the same across all providers.

Cost Optimization

Route simple queries to cheap models (gpt-3.5-turbo, llama-3.1-8b-instant) and complex ones to powerful models. Response caching saves money on repeated queries.

Multi-Model Pipeline Example

Build a pipeline that uses cheap models for classification and expensive models for generation:

Python — Smart routing pipeline
import requests

BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/chat"

def llm_call(model, messages, provider_key):
    resp = requests.post(BASE, json={
        "model": model,
        "messages": messages,
        "provider_key": provider_key
    })
    return resp.json()["choices"][0]["message"]["content"]

user_question = "Explain the difference between TCP and UDP"

# Step 1: Classify complexity with a fast/cheap model
complexity = llm_call(
    model="llama-3.1-8b-instant",
    messages=[{"role": "user", "content":
        f"Rate this question's complexity as 'simple' or 'complex': {user_question}"}],
    provider_key="gsk_your-groq-key"
)

# Step 2: Route to appropriate model
if "complex" in complexity.lower():
    answer = llm_call(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_question}],
        provider_key="sk-your-openai-key"
    )
else:
    answer = llm_call(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_question}],
        provider_key="sk-your-openai-key"
    )

print(f"Complexity: {complexity}")
print(f"Answer: {answer}")

Comparison: LLM Router vs Direct Provider APIs

FeatureAgent LLM RouterOpenRouterLiteLLMDirect APIs
Providers6 (OpenAI, Anthropic, Google, Groq, Together, DeepSeek)50+100+1 per SDK
SetupZero — just call the URLAccount requiredSelf-host + pip installSDK per provider
OpenAI CompatibleYes (/v1/chat/completions)YesYesOpenAI only
Response CachingBuilt-in (5 min TTL)NoOptional (Redis)No
Auto RetriesYes (2 attempts)YesYesSDK-dependent
CostFree (proxy only)Markup on tokensFree (self-hosted)Direct pricing
Your Provider KeysYes — BYOKUses their keysYes — BYOKYes
Key StorageYes (per gateway key)N/AConfig fileEnv vars
Self-HostableYes (Node.js)NoYes (Python)N/A

Discover Available Models

Query the models and providers endpoints to see everything that's available:

List all models (no auth needed)
# List all 24+ models with provider info
curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/models | jq .

# List providers and their model counts
curl https://agent-gateway-kappa.vercel.app/v1/agent-llm/api/providers | jq .

Example response from /api/models:

Response
{
  "models": [
    {"model": "gpt-4o", "provider": "openai", "providerName": "OpenAI"},
    {"model": "gpt-4o-mini", "provider": "openai", "providerName": "OpenAI"},
    {"model": "claude-opus-4-20250514", "provider": "anthropic", "providerName": "Anthropic"},
    {"model": "gemini-2.0-flash", "provider": "google", "providerName": "Google (Gemini)"},
    {"model": "llama-3.3-70b-versatile", "provider": "groq", "providerName": "Groq"},
    {"model": "deepseek-chat", "provider": "deepseek", "providerName": "DeepSeek"}
  ],
  "total": 24
}

Error Handling

The router returns clear error messages with upstream provider details:

Error response example
# Missing provider key
{"error": "Provider API key required. Pass as \"provider_key\" in body or set via POST /api/keys/provider"}

# Unknown model
{"error": "Unknown model: gpt-5. Use GET /api/models to see available models."}

# Upstream provider error
{"error": "OpenAI API error 429: Rate limit exceeded"}

When a request fails, the router refunds the credit automatically. You only pay for successful completions.

Frequently Asked Questions

Do I need my own provider API keys? +
Yes. The LLM Router is a proxy/gateway — it routes your requests to providers using your keys. You need an API key from the provider whose models you want to use (e.g., an OpenAI key for GPT models). The router itself is free to use.
Does the router add latency? +
Minimal. The router adds approximately 10-30ms of overhead for format transformation and routing. For cached responses, latency is under 5ms. The vast majority of response time is from the upstream provider.
Is streaming supported? +
The router accepts the stream parameter and forwards it to providers. Streaming disables response caching. The response format follows the provider's streaming implementation.
How does response caching work? +
When temperature is 0 or unset and streaming is off, responses are cached by a SHA-256 hash of (model, messages, temperature, max_tokens). Cached entries expire after 5 minutes. Up to 1,000 entries are stored with LRU eviction. Cached responses include "cached": true.
Can I use this with LangChain or other frameworks? +
Yes. The /v1/chat/completions endpoint is OpenAI-compatible. Set the base URL to the router endpoint and it works with LangChain, LlamaIndex, AutoGen, CrewAI, and any framework that supports OpenAI's API format.
What happens if a provider is down? +
The router automatically retries failed requests once with a 1-second delay. If both attempts fail, you get a clear error message with the upstream error details, and your credit is refunded. You can implement client-side fallback to a different model/provider.

Start Routing LLM Requests

One API, 6 providers, 24+ models. No signup, no credit card.

Get Your Free API Key

More Free APIs

The Agent LLM Router is part of a suite of 39+ free APIs for developers and AI agents: