Free Web Scraping API — Extract Content from Any URL

March 2026 8 min read No signup required

Need to scrape a website and get clean, structured content? Forget setting up Puppeteer, managing headless Chrome, or dealing with proxy rotation. Agent Scraper is a free API that fetches any URL and returns clean markdown or plain text — with metadata, links, and images extracted automatically.

Try it right now — no API key, no signup:

# Scrape any URL and get clean markdown
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://example.com"

That's it. One HTTP call. You get back structured JSON with the page content converted to markdown, plus metadata (title, description, OG tags), all outbound links, and image URLs.

What You Get Back

{
  "url": "https://example.com/",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in documentation...",
  "meta": {
    "title": "Example Domain",
    "description": "",
    "author": "",
    "ogTitle": "",
    "ogDescription": "",
    "ogImage": "",
    "canonical": "",
    "language": "en"
  },
  "links": ["https://iana.org/domains/example"],
  "images": [],
  "contentLength": 167,
  "scrapedAt": "2026-03-03T17:42:42.267Z",
  "cached": false
}

Two Endpoints, Two Formats

GET /api/scrape

Returns markdown content with full structure preserved — headings, lists, links, code blocks. Best for AI/LLM consumption and content analysis.

POST /api/extract

Returns plain text only — all HTML stripped, just the readable content. Best for text analysis, NLP, and search indexing.

Code Examples

curl

# Scrape to markdown (GET)
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://news.ycombinator.com"

# Extract plain text (POST)
curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com"}'

Python

import requests

# Scrape a URL and get markdown content
BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper"

# Method 1: Markdown with metadata
result = requests.get(f"{BASE}/api/scrape", params={"url": "https://news.ycombinator.com"}).json()

print(f"Title: {result['meta']['title']}")
print(f"Content length: {result['contentLength']} chars")
print(f"Links found: {len(result['links'])}")
print(result["content"][:500])

# Method 2: Plain text only
text = requests.post(f"{BASE}/api/extract",
    json={"url": "https://news.ycombinator.com"}).json()

print(text["text"])

Node.js

const BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper";

// Scrape to markdown
const res = await fetch(`${BASE}/api/scrape?url=https://github.com`);
const data = await res.json();
console.log(data.meta.title);   // "GitHub"
console.log(data.content);      // Full page as markdown
console.log(data.links.length); // Number of outbound links

// Extract plain text
const text = await fetch(`${BASE}/api/extract`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://github.com" })
}).then(r => r.json());
console.log(text.text);

Real-World Use Cases

1. Feed Content to an LLM

Scrape a page, feed the markdown to Claude or GPT for summarization, analysis, or Q&A. The markdown format preserves structure that LLMs can reason about.

import requests

# Scrape an article
page = requests.get(
    "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
    params={"url": "https://en.wikipedia.org/wiki/Web_scraping"}
).json()

# Send to an LLM for summarization
prompt = f"""Summarize this article in 3 bullet points:

{page['content'][:4000]}"""

# Use with any LLM API (Claude, GPT, Ollama, etc.)

2. Monitor Competitor Pricing

Scrape competitor pages on a schedule, extract pricing data, and track changes over time.

import requests, json, time

competitors = [
    "https://example-saas.com/pricing",
    "https://another-tool.io/plans",
]

for url in competitors:
    result = requests.get(
        "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
        params={"url": url}
    ).json()
    print(f"\n--- {result['meta']['title']} ---")
    print(result["content"][:1000])

3. Build a Search Index

Extract clean text from a list of URLs and index it for full-text search.

const urls = [
  "https://docs.example.com/getting-started",
  "https://docs.example.com/api-reference",
  "https://docs.example.com/tutorials",
];

const documents = [];
for (const url of urls) {
  const res = await fetch(
    `https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract`,
    { method: "POST", headers: {"Content-Type": "application/json"},
      body: JSON.stringify({ url }) }
  );
  const data = await res.json();
  documents.push({ url, title: data.meta.title, text: data.text });
}
// Index `documents` into your search engine (Elasticsearch, Meilisearch, etc.)

4. Extract Metadata at Scale

Pull OG tags, titles, descriptions, and canonical URLs from a batch of pages.

import requests

urls = ["https://github.com", "https://news.ycombinator.com", "https://reddit.com"]

for url in urls:
    r = requests.get(
        "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
        params={"url": url}
    ).json()
    m = r["meta"]
    print(f"{m['title']} | OG: {m['ogTitle']} | Canonical: {m['canonical']}")

API Reference

GET /api/scrape

Scrape a URL and return content as markdown.

Parameter	Type	Description
`url`	string (query)	URL to scrape (required)

POST /api/extract

Extract plain text from a URL.

Parameter	Type	Description
`url`	string (body JSON)	URL to extract text from (required)

Response Fields

Field	Type	Description
`url`	string	Final URL (after redirects)
`content` / `text`	string	Page content as markdown or plain text
`meta`	object	Title, description, OG tags, canonical, language
`links`	array	All outbound links found on the page
`images`	array	All image URLs found on the page
`contentLength`	number	Character count of extracted content
`scrapedAt`	string	ISO timestamp of when the scrape occurred
`cached`	boolean	Whether the result was served from cache

Comparison: Agent Scraper vs Alternatives

Feature	Agent Scraper	Firecrawl	ScrapingBee	Apify
Free tier	50 credits	500 credits	1,000 credits	$5/mo free
Signup required	No	Yes	Yes	Yes
Markdown output	Yes	Yes	No	No
Plain text output	Yes	Yes	Yes	Yes
Metadata extraction	Yes (OG, meta, canonical)	Yes	Limited	Configurable
Link extraction	Yes	Yes	No	Configurable
JavaScript rendering	No	Yes	Yes	Yes
Starting price	Free / $0.002/req	$19/mo	$49/mo	$49/mo

When to use Agent Scraper: You need fast, cheap scraping of static pages with clean markdown output — especially for feeding into LLMs or building search indexes. No signup friction.

When to use alternatives: You need JavaScript rendering (SPAs), anti-bot bypass, or enterprise-scale crawling with proxy rotation.

Rate Limits & Pricing

Free tier: 30 requests/minute, no API key needed
With API key: 120 requests/minute, 50 free credits
Paid: $0.002 per request (top up with USDC on Base)

Get an API key for higher limits:

curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/keys/create"

# Response:
{ "apiKey": "sk-abc123...", "credits": 50, "rateLimit": "120/min" }

Start Scraping for Free

No signup. No credit card. Just send a request.

Get Started

Getting Started Guide · Swagger Docs

FAQ

Is it really free?

Yes. You get 30 requests/minute without any API key. Create a free API key for 50 credits and 120 req/min. After that, credits are $0.002 each.

Can it scrape JavaScript-rendered pages (SPAs)?

Agent Scraper fetches the initial HTML response, so it works great for server-rendered pages, blogs, docs, news sites, wikis, and most of the web. For SPAs that require JavaScript execution, you'd need a headless browser service (see our Screenshot API for visual rendering).

What about rate limiting and blocking?

Agent Scraper makes standard HTTP requests. Sites with aggressive bot protection (Cloudflare challenges, CAPTCHAs) may block requests. For most public content — docs, blogs, news, wikis — it works reliably.

Is the content cached?

Results are cached briefly to improve performance. The cached field in the response tells you whether you got a cached result. For real-time data, the cache duration is short (minutes).

Can I use this for AI/LLM applications?

Yes — this is one of the primary use cases. The markdown output preserves document structure (headings, lists, code blocks) that LLMs can reason about effectively. Many AI agent frameworks use web scraping as a core tool.

How does this compare to BeautifulSoup or Cheerio?

Those are parsing libraries — you still need to fetch the page, handle errors, manage sessions, and write extraction logic. Agent Scraper handles all of that and returns clean, ready-to-use content via a single API call.

Is there an OpenAPI spec?

Yes: agent-scraper/openapi.json