extract-content - Web Scraping API

🎮 Interactive Playground

📋 cURL Command:

🔖 Bookmarklet:

Drag this link to your bookmarks bar: 📌 Extract Content

💡 How to use:
1. Drag the link above to your bookmarks bar
2. Visit any webpage
3. Click the bookmark to fetch and display content

Response:

📖 Resources

Interactive Demo Tutorial Series Example Pen GitHub Repo

📡 API Endpoints

GET / - Extract Text Content

Extracts text content from HTML elements using CSS selectors.

Parameter	Description	Required
`from`	URL to fetch content from	Yes
`extract`	JSON object mapping names to selectors	Yes

GET /html - Extract HTML Content

Extracts HTML markup or returns raw HTML from a page.

Parameter	Description	Required
`from`	URL to fetch content from	Yes
`extract`	JSON object mapping names to selectors	No

GET /raw - Raw Proxy

Returns the raw HTML from the target URL (acts as a simple proxy).

Parameter	Description	Required
`from`	URL to fetch content from	Yes

💡 Examples

Example 1: Wikipedia Article with Links

URL: https://en.wikipedia.org/wiki/Deno_(software)

const extract = {
  "title": "h1",
  "title_link": "h1 a@href",  // Extract link using @href
  "intro": ".mw-parser-output > p"
};

Example 2: GitHub Repository

URL: https://github.com/denoland/deno

const extract = {
  "repoName": "h1 strong a",
  "description": "p.f4",
  "stars": "#repo-stars-counter-star"
};

Example 3: Dev.to Articles

URL: https://dev.to

const extract = {
  "headlines": "h2.crayons-story__title a",
  "authors": ".crayons-story__secondary a"
};

Example 4: Hacker News

URL: https://news.ycombinator.com

const extract = {
  "headlines": ".titleline > a",
  "scores": ".score"
};

Example 5: Reddit Posts

URL: https://old.reddit.com/r/programming

const extract = {
  "titles": ".title > a",
  "domain": ".domain"
};

Example 6: Extract HTML Markup

URL: https://en.wikipedia.org/wiki/Web_scraping

Get the actual HTML markup instead of text:

const extract = {
  "infobox": ".infobox",
  "firstPara": ".mw-parser-output > p"
};

Example 7: Bold.dk News with Links 🇩🇰

URL: https://bold.dk

Extract Danish news headlines with clickable links (naming convention):

const extract = {
  "overskrifter": ".article-headline",
  "overskrifter_link": ".thumb.article_list_item@href",  // _link suffix creates clickable links
  "kategorier": ".ArticleListItem__tag"
};

🔧 Response Format

Single Element

If a selector matches one element, returns a string:

{
  "title": "Deno - A modern runtime for JavaScript and TypeScript"
}

Multiple Elements

If a selector matches multiple elements, returns an array:

{
  "headlines": [
    "First headline",
    "Second headline",
    "Third headline"
  ]
}

No Match

If a selector matches no elements, returns an empty string:

{
  "missing": ""
}