Extract clean, readable markdown or plain text from any URL. Full JavaScript rendering via headless Chrome — works on React apps, SPAs, and dynamically-loaded pages where simple HTTP scrapers fail.
curl "https://urlsnap.dev/api/extract?url=https://example.com/article&format=markdown" \
-H "x-api-key: YOUR_API_KEY"
GET https://urlsnap.dev/api/extract
| Parameter | Type | Description |
|---|---|---|
| url required | string | The URL to extract content from. Must be a public http/https URL. |
| format optional | string | Output format: markdown (default) or text (plain text, no formatting). |
Pass your API key in the x-api-key header or as the api_key query parameter.
# Header (recommended)
curl -H "x-api-key: YOUR_KEY" "https://urlsnap.dev/api/extract?url=..."
# Query parameter
curl "https://urlsnap.dev/api/extract?url=...&api_key=YOUR_KEY"
| Field | Type | Description |
|---|---|---|
| title | string | The extracted article title. |
| markdown | string | Clean GitHub-flavored markdown. Present when format=markdown (default). |
| text | string | Plain text content. Present when format=text. |
| wordCount | number | Number of words in the extracted content. |
| tokenEstimate | number | Estimated token count (~1.3 tokens/word). Useful for LLM context planning. |
| duration_ms | number | Total processing time in milliseconds. |
const response = await fetch(
'https://urlsnap.dev/api/extract?url=https://example.com/article',
{ headers: { 'x-api-key': 'YOUR_API_KEY' } }
);
const { title, markdown, wordCount, tokenEstimate } = await response.json();
console.log(`Extracted "${title}" — ${wordCount} words, ~${tokenEstimate} tokens`);
console.log(markdown);
import requests
resp = requests.get(
'https://urlsnap.dev/api/extract',
params={'url': 'https://example.com/article', 'format': 'markdown'},
headers={'x-api-key': 'YOUR_API_KEY'}
)
data = resp.json()
print(f"Title: {data['title']}")
print(f"Words: {data['wordCount']}, Tokens: {data['tokenEstimate']}")
print(data['markdown'])
import requests
from openai import OpenAI
# 1. Extract the web page content
page = requests.get(
'https://urlsnap.dev/api/extract',
params={'url': 'https://techcrunch.com/some-article', 'format': 'markdown'},
headers={'x-api-key': 'YOUR_API_KEY'}
).json()
# 2. Use it as context for an LLM
client = OpenAI()
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': f"Summarize this article:\n\n{page['markdown']}"}
]
)
print(response.choices[0].message.content)
curl "https://urlsnap.dev/api/extract?url=https://example.com&format=text" \
-H "x-api-key: YOUR_API_KEY" \
| jq '.text'
Feed clean web content to LLM agents without parsing raw HTML or managing Puppeteer.
Build knowledge bases from live web sources. Token estimates help you chunk efficiently.
Extract full article text from any publication — no per-site parsing rules needed.
Automatically extract and process content from hundreds of URLs in your pipeline.
| Status | Code | Meaning |
|---|---|---|
| 401 | Unauthorized | Missing or invalid API key. |
| 403 | Forbidden | Your current plan doesn't include Extract. Upgrade to Starter → |
| 422 | Unprocessable | The page loaded but no readable article content was found (e.g. login page, dashboard). |
| 429 | Rate Limited | Daily request limit reached for your plan. |
| 500 | Server Error | Something went wrong (timeout, navigation error, etc.). Retry with backoff. |
Get your API key in 30 seconds — free tier available, no credit card required.
Get API Key Free → View Pricing