Actions6
Overview
The Extract Primary Content operation of the ScrapeNinja n8n node is designed to extract the main content from a block of HTML. This is particularly useful for scenarios where you want to isolate the core article, blog post, or readable section from a web page—removing navigation, ads, and other non-essential elements. Common use cases include:
- Summarizing articles for newsletters.
- Preparing clean text for further processing (e.g., feeding into an LLM).
- Converting web content into Markdown for documentation or note-taking.
Properties
| Display Name | Type | Description |
|---|---|---|
| HTML | String | HTML content to extract main content from. Required. |
| Output as Markdown | Boolean | Whether to convert the extracted HTML content to Markdown format. |
Details
- HTML: Paste or provide the raw HTML source from which you want to extract the primary content.
- Output as Markdown: If enabled (
true), the extracted content will be converted from HTML to Markdown format.
Output
The node outputs a JSON object with the following structure:
{
"content": "<string>"
}
- content: The extracted main content as a string. If "Output as Markdown" is enabled, this will be in Markdown format; otherwise, it will be in cleaned-up HTML.
Dependencies
- No external API keys or credentials are required for the "Extract Primary Content" operation.
- All processing is done within the node; no external services are called for this operation.
Troubleshooting
Common Issues:
- Malformed HTML: If the provided HTML is incomplete or malformed, extraction quality may suffer, resulting in empty or partial content.
- Empty Output: If the main content cannot be detected, the output may be empty. Ensure that the input HTML contains a clear main section (e.g.,
<article>,<main>, or similar). - Markdown Conversion Issues: If "Output as Markdown" is enabled, some complex HTML structures may not convert perfectly to Markdown.
Error Messages:
"error": "Cannot read property '...' of undefined": Indicates missing or incorrect input. Make sure the "HTML" property is populated."No additional details available": A generic error message when more specific information isn't available. Double-check your input data.