Free Web Scraping API — Extract Content from Any URL
Need to scrape a website and get clean, structured content? Forget setting up Puppeteer, managing headless Chrome, or dealing with proxy rotation. Agent Scraper is a free API that fetches any URL and returns clean markdown or plain text — with metadata, links, and images extracted automatically.
Try it right now — no API key, no signup:
# Scrape any URL and get clean markdown
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://example.com"
That's it. One HTTP call. You get back structured JSON with the page content converted to markdown, plus metadata (title, description, OG tags), all outbound links, and image URLs.
What You Get Back
{
"url": "https://example.com/",
"format": "markdown",
"content": "# Example Domain\n\nThis domain is for use in documentation...",
"meta": {
"title": "Example Domain",
"description": "",
"author": "",
"ogTitle": "",
"ogDescription": "",
"ogImage": "",
"canonical": "",
"language": "en"
},
"links": ["https://iana.org/domains/example"],
"images": [],
"contentLength": 167,
"scrapedAt": "2026-03-03T17:42:42.267Z",
"cached": false
}
Two Endpoints, Two Formats
GET /api/scrape
Returns markdown content with full structure preserved — headings, lists, links, code blocks. Best for AI/LLM consumption and content analysis.
POST /api/extract
Returns plain text only — all HTML stripped, just the readable content. Best for text analysis, NLP, and search indexing.
Code Examples
curl
# Scrape to markdown (GET)
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://news.ycombinator.com"
# Extract plain text (POST)
curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract" \
-H "Content-Type: application/json" \
-d '{"url": "https://news.ycombinator.com"}'
Python
import requests
# Scrape a URL and get markdown content
BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper"
# Method 1: Markdown with metadata
result = requests.get(f"{BASE}/api/scrape", params={"url": "https://news.ycombinator.com"}).json()
print(f"Title: {result['meta']['title']}")
print(f"Content length: {result['contentLength']} chars")
print(f"Links found: {len(result['links'])}")
print(result["content"][:500])
# Method 2: Plain text only
text = requests.post(f"{BASE}/api/extract",
json={"url": "https://news.ycombinator.com"}).json()
print(text["text"])
Node.js
const BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper";
// Scrape to markdown
const res = await fetch(`${BASE}/api/scrape?url=https://github.com`);
const data = await res.json();
console.log(data.meta.title); // "GitHub"
console.log(data.content); // Full page as markdown
console.log(data.links.length); // Number of outbound links
// Extract plain text
const text = await fetch(`${BASE}/api/extract`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ url: "https://github.com" })
}).then(r => r.json());
console.log(text.text);
Real-World Use Cases
1. Feed Content to an LLM
Scrape a page, feed the markdown to Claude or GPT for summarization, analysis, or Q&A. The markdown format preserves structure that LLMs can reason about.
import requests
# Scrape an article
page = requests.get(
"https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
params={"url": "https://en.wikipedia.org/wiki/Web_scraping"}
).json()
# Send to an LLM for summarization
prompt = f"""Summarize this article in 3 bullet points:
{page['content'][:4000]}"""
# Use with any LLM API (Claude, GPT, Ollama, etc.)
2. Monitor Competitor Pricing
Scrape competitor pages on a schedule, extract pricing data, and track changes over time.
import requests, json, time
competitors = [
"https://example-saas.com/pricing",
"https://another-tool.io/plans",
]
for url in competitors:
result = requests.get(
"https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
params={"url": url}
).json()
print(f"\n--- {result['meta']['title']} ---")
print(result["content"][:1000])
3. Build a Search Index
Extract clean text from a list of URLs and index it for full-text search.
const urls = [
"https://docs.example.com/getting-started",
"https://docs.example.com/api-reference",
"https://docs.example.com/tutorials",
];
const documents = [];
for (const url of urls) {
const res = await fetch(
`https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract`,
{ method: "POST", headers: {"Content-Type": "application/json"},
body: JSON.stringify({ url }) }
);
const data = await res.json();
documents.push({ url, title: data.meta.title, text: data.text });
}
// Index `documents` into your search engine (Elasticsearch, Meilisearch, etc.)
4. Extract Metadata at Scale
Pull OG tags, titles, descriptions, and canonical URLs from a batch of pages.
import requests
urls = ["https://github.com", "https://news.ycombinator.com", "https://reddit.com"]
for url in urls:
r = requests.get(
"https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
params={"url": url}
).json()
m = r["meta"]
print(f"{m['title']} | OG: {m['ogTitle']} | Canonical: {m['canonical']}")
API Reference
GET /api/scrape
Scrape a URL and return content as markdown.
| Parameter | Type | Description |
|---|---|---|
url | string (query) | URL to scrape (required) |
POST /api/extract
Extract plain text from a URL.
| Parameter | Type | Description |
|---|---|---|
url | string (body JSON) | URL to extract text from (required) |
Response Fields
| Field | Type | Description |
|---|---|---|
url | string | Final URL (after redirects) |
content / text | string | Page content as markdown or plain text |
meta | object | Title, description, OG tags, canonical, language |
links | array | All outbound links found on the page |
images | array | All image URLs found on the page |
contentLength | number | Character count of extracted content |
scrapedAt | string | ISO timestamp of when the scrape occurred |
cached | boolean | Whether the result was served from cache |
Comparison: Agent Scraper vs Alternatives
| Feature | Agent Scraper | Firecrawl | ScrapingBee | Apify |
|---|---|---|---|---|
| Free tier | 50 credits | 500 credits | 1,000 credits | $5/mo free |
| Signup required | No | Yes | Yes | Yes |
| Markdown output | Yes | Yes | No | No |
| Plain text output | Yes | Yes | Yes | Yes |
| Metadata extraction | Yes (OG, meta, canonical) | Yes | Limited | Configurable |
| Link extraction | Yes | Yes | No | Configurable |
| JavaScript rendering | No | Yes | Yes | Yes |
| Starting price | Free / $0.002/req | $19/mo | $49/mo | $49/mo |
When to use Agent Scraper: You need fast, cheap scraping of static pages with clean markdown output — especially for feeding into LLMs or building search indexes. No signup friction.
When to use alternatives: You need JavaScript rendering (SPAs), anti-bot bypass, or enterprise-scale crawling with proxy rotation.
Rate Limits & Pricing
- Free tier: 30 requests/minute, no API key needed
- With API key: 120 requests/minute, 50 free credits
- Paid: $0.002 per request (top up with USDC on Base)
Get an API key for higher limits:
curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/keys/create"
# Response:
{ "apiKey": "sk-abc123...", "credits": 50, "rateLimit": "120/min" }
FAQ
Is it really free?
Yes. You get 30 requests/minute without any API key. Create a free API key for 50 credits and 120 req/min. After that, credits are $0.002 each.
Can it scrape JavaScript-rendered pages (SPAs)?
Agent Scraper fetches the initial HTML response, so it works great for server-rendered pages, blogs, docs, news sites, wikis, and most of the web. For SPAs that require JavaScript execution, you'd need a headless browser service (see our Screenshot API for visual rendering).
What about rate limiting and blocking?
Agent Scraper makes standard HTTP requests. Sites with aggressive bot protection (Cloudflare challenges, CAPTCHAs) may block requests. For most public content — docs, blogs, news, wikis — it works reliably.
Is the content cached?
Results are cached briefly to improve performance. The cached field in the response tells you whether you got a cached result. For real-time data, the cache duration is short (minutes).
Can I use this for AI/LLM applications?
Yes — this is one of the primary use cases. The markdown output preserves document structure (headings, lists, code blocks) that LLMs can reason about effectively. Many AI agent frameworks use web scraping as a core tool.
How does this compare to BeautifulSoup or Cheerio?
Those are parsing libraries — you still need to fetch the page, handle errors, manage sessions, and write extraction logic. Agent Scraper handles all of that and returns clean, ready-to-use content via a single API call.