Free Web Scraping API — Extract Content from Any URL

March 2026 8 min read No signup required

Need to scrape a website and get clean, structured content? Forget setting up Puppeteer, managing headless Chrome, or dealing with proxy rotation. Agent Scraper is a free API that fetches any URL and returns clean markdown or plain text — with metadata, links, and images extracted automatically.

Try it right now — no API key, no signup:

# Scrape any URL and get clean markdown
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://example.com"

That's it. One HTTP call. You get back structured JSON with the page content converted to markdown, plus metadata (title, description, OG tags), all outbound links, and image URLs.

What You Get Back

{
  "url": "https://example.com/",
  "format": "markdown",
  "content": "# Example Domain\n\nThis domain is for use in documentation...",
  "meta": {
    "title": "Example Domain",
    "description": "",
    "author": "",
    "ogTitle": "",
    "ogDescription": "",
    "ogImage": "",
    "canonical": "",
    "language": "en"
  },
  "links": ["https://iana.org/domains/example"],
  "images": [],
  "contentLength": 167,
  "scrapedAt": "2026-03-03T17:42:42.267Z",
  "cached": false
}

Two Endpoints, Two Formats

GET /api/scrape

Returns markdown content with full structure preserved — headings, lists, links, code blocks. Best for AI/LLM consumption and content analysis.

POST /api/extract

Returns plain text only — all HTML stripped, just the readable content. Best for text analysis, NLP, and search indexing.

Code Examples

curl

# Scrape to markdown (GET)
curl "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape?url=https://news.ycombinator.com"

# Extract plain text (POST)
curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com"}'

Python

import requests

# Scrape a URL and get markdown content
BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper"

# Method 1: Markdown with metadata
result = requests.get(f"{BASE}/api/scrape", params={"url": "https://news.ycombinator.com"}).json()

print(f"Title: {result['meta']['title']}")
print(f"Content length: {result['contentLength']} chars")
print(f"Links found: {len(result['links'])}")
print(result["content"][:500])

# Method 2: Plain text only
text = requests.post(f"{BASE}/api/extract",
    json={"url": "https://news.ycombinator.com"}).json()

print(text["text"])

Node.js

const BASE = "https://agent-gateway-kappa.vercel.app/v1/agent-scraper";

// Scrape to markdown
const res = await fetch(`${BASE}/api/scrape?url=https://github.com`);
const data = await res.json();
console.log(data.meta.title);   // "GitHub"
console.log(data.content);      // Full page as markdown
console.log(data.links.length); // Number of outbound links

// Extract plain text
const text = await fetch(`${BASE}/api/extract`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://github.com" })
}).then(r => r.json());
console.log(text.text);

Real-World Use Cases

1. Feed Content to an LLM

Scrape a page, feed the markdown to Claude or GPT for summarization, analysis, or Q&A. The markdown format preserves structure that LLMs can reason about.

import requests

# Scrape an article
page = requests.get(
    "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
    params={"url": "https://en.wikipedia.org/wiki/Web_scraping"}
).json()

# Send to an LLM for summarization
prompt = f"""Summarize this article in 3 bullet points:

{page['content'][:4000]}"""

# Use with any LLM API (Claude, GPT, Ollama, etc.)

2. Monitor Competitor Pricing

Scrape competitor pages on a schedule, extract pricing data, and track changes over time.

import requests, json, time

competitors = [
    "https://example-saas.com/pricing",
    "https://another-tool.io/plans",
]

for url in competitors:
    result = requests.get(
        "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
        params={"url": url}
    ).json()
    print(f"\n--- {result['meta']['title']} ---")
    print(result["content"][:1000])

3. Build a Search Index

Extract clean text from a list of URLs and index it for full-text search.

const urls = [
  "https://docs.example.com/getting-started",
  "https://docs.example.com/api-reference",
  "https://docs.example.com/tutorials",
];

const documents = [];
for (const url of urls) {
  const res = await fetch(
    `https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/extract`,
    { method: "POST", headers: {"Content-Type": "application/json"},
      body: JSON.stringify({ url }) }
  );
  const data = await res.json();
  documents.push({ url, title: data.meta.title, text: data.text });
}
// Index `documents` into your search engine (Elasticsearch, Meilisearch, etc.)

4. Extract Metadata at Scale

Pull OG tags, titles, descriptions, and canonical URLs from a batch of pages.

import requests

urls = ["https://github.com", "https://news.ycombinator.com", "https://reddit.com"]

for url in urls:
    r = requests.get(
        "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/scrape",
        params={"url": url}
    ).json()
    m = r["meta"]
    print(f"{m['title']} | OG: {m['ogTitle']} | Canonical: {m['canonical']}")

API Reference

GET /api/scrape

Scrape a URL and return content as markdown.

ParameterTypeDescription
urlstring (query)URL to scrape (required)

POST /api/extract

Extract plain text from a URL.

ParameterTypeDescription
urlstring (body JSON)URL to extract text from (required)

Response Fields

FieldTypeDescription
urlstringFinal URL (after redirects)
content / textstringPage content as markdown or plain text
metaobjectTitle, description, OG tags, canonical, language
linksarrayAll outbound links found on the page
imagesarrayAll image URLs found on the page
contentLengthnumberCharacter count of extracted content
scrapedAtstringISO timestamp of when the scrape occurred
cachedbooleanWhether the result was served from cache

Comparison: Agent Scraper vs Alternatives

FeatureAgent ScraperFirecrawlScrapingBeeApify
Free tier50 credits500 credits1,000 credits$5/mo free
Signup requiredNoYesYesYes
Markdown outputYesYesNoNo
Plain text outputYesYesYesYes
Metadata extractionYes (OG, meta, canonical)YesLimitedConfigurable
Link extractionYesYesNoConfigurable
JavaScript renderingNoYesYesYes
Starting priceFree / $0.002/req$19/mo$49/mo$49/mo

When to use Agent Scraper: You need fast, cheap scraping of static pages with clean markdown output — especially for feeding into LLMs or building search indexes. No signup friction.

When to use alternatives: You need JavaScript rendering (SPAs), anti-bot bypass, or enterprise-scale crawling with proxy rotation.

Rate Limits & Pricing

Get an API key for higher limits:

curl -X POST "https://agent-gateway-kappa.vercel.app/v1/agent-scraper/api/keys/create"

# Response:
{ "apiKey": "sk-abc123...", "credits": 50, "rateLimit": "120/min" }

Start Scraping for Free

No signup. No credit card. Just send a request.

Get Started

Getting Started Guide · Swagger Docs

FAQ

Is it really free?

Yes. You get 30 requests/minute without any API key. Create a free API key for 50 credits and 120 req/min. After that, credits are $0.002 each.

Can it scrape JavaScript-rendered pages (SPAs)?

Agent Scraper fetches the initial HTML response, so it works great for server-rendered pages, blogs, docs, news sites, wikis, and most of the web. For SPAs that require JavaScript execution, you'd need a headless browser service (see our Screenshot API for visual rendering).

What about rate limiting and blocking?

Agent Scraper makes standard HTTP requests. Sites with aggressive bot protection (Cloudflare challenges, CAPTCHAs) may block requests. For most public content — docs, blogs, news, wikis — it works reliably.

Is the content cached?

Results are cached briefly to improve performance. The cached field in the response tells you whether you got a cached result. For real-time data, the cache duration is short (minutes).

Can I use this for AI/LLM applications?

Yes — this is one of the primary use cases. The markdown output preserves document structure (headings, lists, code blocks) that LLMs can reason about effectively. Many AI agent frameworks use web scraping as a core tool.

How does this compare to BeautifulSoup or Cheerio?

Those are parsing libraries — you still need to fetch the page, handle errors, manage sessions, and write extraction logic. Agent Scraper handles all of that and returns clean, ready-to-use content via a single API call.

Is there an OpenAPI spec?

Yes: agent-scraper/openapi.json