New Endpoint

Web Page → Markdown API

Extract clean, readable markdown or plain text from any URL. Full JavaScript rendering via headless Chrome — works on React apps, SPAs, and dynamically-loaded pages where simple HTTP scrapers fail.

⚡ Requires Starter or Pro plan — upgrade from $9/mo

Quick Start

curl

curl "https://urlsnap.dev/api/extract?url=https://example.com/article&format=markdown" \
  -H "x-api-key: YOUR_API_KEY"

Response

{ "title": "Example Article Title", "markdown": "# Example Article Title\n\nThis is the clean extracted content...", "wordCount": 842, "tokenEstimate": 1095, "duration_ms": 2341 }

Endpoint

GET https://urlsnap.dev/api/extract

Parameters

Parameter	Type	Description
url required	string	The URL to extract content from. Must be a public http/https URL.
format optional	string	Output format: `markdown` (default) or `text` (plain text, no formatting).

Authentication

Pass your API key in the x-api-key header or as the api_key query parameter.

Examples

# Header (recommended)
curl -H "x-api-key: YOUR_KEY" "https://urlsnap.dev/api/extract?url=..."

# Query parameter
curl "https://urlsnap.dev/api/extract?url=...&api_key=YOUR_KEY"

Response Fields

Field	Type	Description
title	string	The extracted article title.
markdown	string	Clean GitHub-flavored markdown. Present when `format=markdown` (default).
text	string	Plain text content. Present when `format=text`.
wordCount	number	Number of words in the extracted content.
tokenEstimate	number	Estimated token count (~1.3 tokens/word). Useful for LLM context planning.
duration_ms	number	Total processing time in milliseconds.

Code Examples

JavaScript / Node.js

javascript

const response = await fetch(
  'https://urlsnap.dev/api/extract?url=https://example.com/article',
  { headers: { 'x-api-key': 'YOUR_API_KEY' } }
);

const { title, markdown, wordCount, tokenEstimate } = await response.json();
console.log(`Extracted "${title}" — ${wordCount} words, ~${tokenEstimate} tokens`);
console.log(markdown);

Python

python

import requests

resp = requests.get(
    'https://urlsnap.dev/api/extract',
    params={'url': 'https://example.com/article', 'format': 'markdown'},
    headers={'x-api-key': 'YOUR_API_KEY'}
)
data = resp.json()
print(f"Title: {data['title']}")
print(f"Words: {data['wordCount']}, Tokens: {data['tokenEstimate']}")
print(data['markdown'])

Using with OpenAI / Claude

python — RAG pipeline example

import requests
from openai import OpenAI

# 1. Extract the web page content
page = requests.get(
    'https://urlsnap.dev/api/extract',
    params={'url': 'https://techcrunch.com/some-article', 'format': 'markdown'},
    headers={'x-api-key': 'YOUR_API_KEY'}
).json()

# 2. Use it as context for an LLM
client = OpenAI()
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': f"Summarize this article:\n\n{page['markdown']}"}
    ]
)
print(response.choices[0].message.content)

cURL — Plain Text Output

curl

curl "https://urlsnap.dev/api/extract?url=https://example.com&format=text" \
  -H "x-api-key: YOUR_API_KEY" \
  | jq '.text'

Why Use URLSnap Extract?

URLSnap Extract

Full JavaScript rendering (headless Chrome)
Works on React / Vue / Angular SPAs
Readability algorithm removes ads & nav
Returns token estimate for LLM budgeting
Simple REST API — one call, clean output
No IP blocking (rotating user-agent)

Simple HTTP scrapers

Can't execute JavaScript
Returns raw HTML with boilerplate
Breaks on dynamic content
Must parse & clean HTML yourself
Often IP-blocked by sites
Requires your own infrastructure

Built For

🤖

AI Agents

Feed clean web content to LLM agents without parsing raw HTML or managing Puppeteer.

📚

RAG Pipelines

Build knowledge bases from live web sources. Token estimates help you chunk efficiently.

📰

News Aggregators

Extract full article text from any publication — no per-site parsing rules needed.

🔍

Research Tools

Automatically extract and process content from hundreds of URLs in your pipeline.

Error Codes

Status	Code	Meaning
401	Unauthorized	Missing or invalid API key.
403	Forbidden	Your current plan doesn't include Extract. Upgrade to Starter →
422	Unprocessable	The page loaded but no readable article content was found (e.g. login page, dashboard).
429	Rate Limited	Daily request limit reached for your plan.
500	Server Error	Something went wrong (timeout, navigation error, etc.). Retry with backoff.

Limits & Notes

Timeout: 30 seconds per request (covers JS-heavy pages)
Private/internal IP addresses are blocked for security
Each extract request counts as 1 request toward your daily limit
Average response time: 2–5 seconds depending on page complexity
Content is not cached — each request fetches live data

Ready to Extract?

Get your API key in 30 seconds — free tier available, no credit card required.

Get API Key Free → View Pricing