🚀 extract-content

A lightweight web scraping API powered by Deno that extracts data from any website using CSS selectors

📚 View Interactive Tutorial

🎮 Interactive Playground

⏳ Fetching data...
📋 cURL Command:
🔖 Bookmarklet:
Drag this link to your bookmarks bar: 📌 Extract Content
💡 How to use:
1. Drag the link above to your bookmarks bar
2. Visit any webpage
3. Click the bookmark to fetch and display content

Response:

📖 Resources

Interactive Demo Tutorial Series Example Pen GitHub Repo

📡 API Endpoints

GET / - Extract Text Content

Extracts text content from HTML elements using CSS selectors.

Parameter Description Required
from URL to fetch content from Yes
extract JSON object mapping names to selectors Yes

GET /html - Extract HTML Content

Extracts HTML markup or returns raw HTML from a page.

Parameter Description Required
from URL to fetch content from Yes
extract JSON object mapping names to selectors No

GET /raw - Raw Proxy

Returns the raw HTML from the target URL (acts as a simple proxy).

Parameter Description Required
from URL to fetch content from Yes

💡 Examples

Example 1: Wikipedia Article with Links

URL: https://en.wikipedia.org/wiki/Deno_(software)

const extract = {
  "title": "h1",
  "title_link": "h1 a@href",  // Extract link using @href
  "intro": ".mw-parser-output > p"
};

Example 2: GitHub Repository

URL: https://github.com/denoland/deno

const extract = {
  "repoName": "h1 strong a",
  "description": "p.f4",
  "stars": "#repo-stars-counter-star"
};

Example 3: Dev.to Articles

URL: https://dev.to

const extract = {
  "headlines": "h2.crayons-story__title a",
  "authors": ".crayons-story__secondary a"
};

Example 4: Hacker News

URL: https://news.ycombinator.com

const extract = {
  "headlines": ".titleline > a",
  "scores": ".score"
};

Example 5: Reddit Posts

URL: https://old.reddit.com/r/programming

const extract = {
  "titles": ".title > a",
  "domain": ".domain"
};

Example 6: Extract HTML Markup

URL: https://en.wikipedia.org/wiki/Web_scraping

Get the actual HTML markup instead of text:

const extract = {
  "infobox": ".infobox",
  "firstPara": ".mw-parser-output > p"
};

Example 7: Bold.dk News with Links 🇩🇰

URL: https://bold.dk

Extract Danish news headlines with clickable links (naming convention):

const extract = {
  "overskrifter": ".article-headline",
  "overskrifter_link": ".thumb.article_list_item@href",  // _link suffix creates clickable links
  "kategorier": ".ArticleListItem__tag"
};

🔧 Response Format

Single Element

If a selector matches one element, returns a string:

{
  "title": "Deno - A modern runtime for JavaScript and TypeScript"
}

Multiple Elements

If a selector matches multiple elements, returns an array:

{
  "headlines": [
    "First headline",
    "Second headline",
    "Third headline"
  ]
}

No Match

If a selector matches no elements, returns an empty string:

{
  "missing": ""
}