New Endpoint

Web Page → Markdown API

Extract clean, readable markdown or plain text from any URL. Full JavaScript rendering via headless Chrome — works on React apps, SPAs, and dynamically-loaded pages where simple HTTP scrapers fail.

⚡ Requires Starter or Pro plan — upgrade from $9/mo

Quick Start

curl
curl "https://urlsnap.dev/api/extract?url=https://example.com/article&format=markdown" \
  -H "x-api-key: YOUR_API_KEY"

Response

{ "title": "Example Article Title", "markdown": "# Example Article Title\n\nThis is the clean extracted content...", "wordCount": 842, "tokenEstimate": 1095, "duration_ms": 2341 }

Endpoint

GET https://urlsnap.dev/api/extract

Parameters

ParameterTypeDescription
url required string The URL to extract content from. Must be a public http/https URL.
format optional string Output format: markdown (default) or text (plain text, no formatting).

Authentication

Pass your API key in the x-api-key header or as the api_key query parameter.

Examples
# Header (recommended)
curl -H "x-api-key: YOUR_KEY" "https://urlsnap.dev/api/extract?url=..."

# Query parameter
curl "https://urlsnap.dev/api/extract?url=...&api_key=YOUR_KEY"

Response Fields

FieldTypeDescription
title string The extracted article title.
markdown string Clean GitHub-flavored markdown. Present when format=markdown (default).
text string Plain text content. Present when format=text.
wordCount number Number of words in the extracted content.
tokenEstimate number Estimated token count (~1.3 tokens/word). Useful for LLM context planning.
duration_ms number Total processing time in milliseconds.

Code Examples

JavaScript / Node.js

javascript
const response = await fetch(
  'https://urlsnap.dev/api/extract?url=https://example.com/article',
  { headers: { 'x-api-key': 'YOUR_API_KEY' } }
);

const { title, markdown, wordCount, tokenEstimate } = await response.json();
console.log(`Extracted "${title}" — ${wordCount} words, ~${tokenEstimate} tokens`);
console.log(markdown);

Python

python
import requests

resp = requests.get(
    'https://urlsnap.dev/api/extract',
    params={'url': 'https://example.com/article', 'format': 'markdown'},
    headers={'x-api-key': 'YOUR_API_KEY'}
)
data = resp.json()
print(f"Title: {data['title']}")
print(f"Words: {data['wordCount']}, Tokens: {data['tokenEstimate']}")
print(data['markdown'])

Using with OpenAI / Claude

python — RAG pipeline example
import requests
from openai import OpenAI

# 1. Extract the web page content
page = requests.get(
    'https://urlsnap.dev/api/extract',
    params={'url': 'https://techcrunch.com/some-article', 'format': 'markdown'},
    headers={'x-api-key': 'YOUR_API_KEY'}
).json()

# 2. Use it as context for an LLM
client = OpenAI()
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': f"Summarize this article:\n\n{page['markdown']}"}
    ]
)
print(response.choices[0].message.content)

cURL — Plain Text Output

curl
curl "https://urlsnap.dev/api/extract?url=https://example.com&format=text" \
  -H "x-api-key: YOUR_API_KEY" \
  | jq '.text'

Why Use URLSnap Extract?

URLSnap Extract

  • Full JavaScript rendering (headless Chrome)
  • Works on React / Vue / Angular SPAs
  • Readability algorithm removes ads & nav
  • Returns token estimate for LLM budgeting
  • Simple REST API — one call, clean output
  • No IP blocking (rotating user-agent)

Simple HTTP scrapers

  • Can't execute JavaScript
  • Returns raw HTML with boilerplate
  • Breaks on dynamic content
  • Must parse & clean HTML yourself
  • Often IP-blocked by sites
  • Requires your own infrastructure

Built For

🤖

AI Agents

Feed clean web content to LLM agents without parsing raw HTML or managing Puppeteer.

📚

RAG Pipelines

Build knowledge bases from live web sources. Token estimates help you chunk efficiently.

📰

News Aggregators

Extract full article text from any publication — no per-site parsing rules needed.

🔍

Research Tools

Automatically extract and process content from hundreds of URLs in your pipeline.

Error Codes

StatusCodeMeaning
401 Unauthorized Missing or invalid API key.
403 Forbidden Your current plan doesn't include Extract. Upgrade to Starter →
422 Unprocessable The page loaded but no readable article content was found (e.g. login page, dashboard).
429 Rate Limited Daily request limit reached for your plan.
500 Server Error Something went wrong (timeout, navigation error, etc.). Retry with backoff.

Limits & Notes

Ready to Extract?

Get your API key in 30 seconds — free tier available, no credit card required.

Get API Key Free → View Pricing