ScrapeNinja icon

ScrapeNinja

Consume ScrapeNinja Web Scraping API - See full documentation at https://scrapeninja.net/docs/

Overview

The Extract Primary Content operation of the ScrapeNinja n8n node is designed to extract the main content from a block of HTML. This is particularly useful for scenarios where you want to isolate the core article, blog post, or readable section from a web page—removing navigation, ads, and other non-essential elements. Common use cases include:

  • Summarizing articles for newsletters.
  • Preparing clean text for further processing (e.g., feeding into an LLM).
  • Converting web content into Markdown for documentation or note-taking.

Properties

Display Name Type Description
HTML String HTML content to extract main content from. Required.
Output as Markdown Boolean Whether to convert the extracted HTML content to Markdown format.

Details

  • HTML: Paste or provide the raw HTML source from which you want to extract the primary content.
  • Output as Markdown: If enabled (true), the extracted content will be converted from HTML to Markdown format.

Output

The node outputs a JSON object with the following structure:

{
  "content": "<string>"
}
  • content: The extracted main content as a string. If "Output as Markdown" is enabled, this will be in Markdown format; otherwise, it will be in cleaned-up HTML.

Dependencies

  • No external API keys or credentials are required for the "Extract Primary Content" operation.
  • All processing is done within the node; no external services are called for this operation.

Troubleshooting

Common Issues:

  • Malformed HTML: If the provided HTML is incomplete or malformed, extraction quality may suffer, resulting in empty or partial content.
  • Empty Output: If the main content cannot be detected, the output may be empty. Ensure that the input HTML contains a clear main section (e.g., <article>, <main>, or similar).
  • Markdown Conversion Issues: If "Output as Markdown" is enabled, some complex HTML structures may not convert perfectly to Markdown.

Error Messages:

  • "error": "Cannot read property '...' of undefined": Indicates missing or incorrect input. Make sure the "HTML" property is populated.
  • "No additional details available": A generic error message when more specific information isn't available. Double-check your input data.

Links and References

Discussion