Actions6
Overview
The ScrapeNinja node's "Extract Custom" operation allows users to extract specific data from HTML content using a custom JavaScript extraction function. This is particularly useful when you need to parse and retrieve tailored information from web pages, such as headlines, product details, or any structured data embedded in HTML. Common scenarios include scraping article titles, prices, or metadata from web pages where standard extraction methods are insufficient.
Practical Example:
Suppose you want to extract the main headline from a news article's HTML. You can provide the HTML content and a custom extraction function that targets the <h1> tag, returning its text content.
Properties
| Display Name | Type | Description |
|---|---|---|
| HTML | String | HTML content to extract data from. This is the raw HTML source you want to process. (Required) |
| Extraction Function | String | JavaScript function executed to extract data. Receives HTML content and a Cheerio instance as arguments. (Required) |
Details
- HTML: Paste or supply the HTML code you wish to analyze.
- Extraction Function: Write a JavaScript function (as a string) that defines how to extract your desired data. The function signature should be:
function extract(html, cheerioInstance) { // Your extraction logic here return { ... }; }html: The HTML content provided.cheerioInstance: A Cheerio library instance for jQuery-like HTML parsing.
Output
The output is a JSON object containing the data returned by your custom extraction function.
The structure of the output depends entirely on what your extraction function returns.
For example, if your function is:
function extract(html, cheerioInstance) { const $ = cheerioInstance.load(html); return { title: $("h1").text().trim() }; }The output will be:
{ "title": "Extracted Headline" }Binary Data: This operation does not output binary data; it only outputs JSON.
Dependencies
- External Libraries: The extraction function uses the Cheerio library for HTML parsing.
- No API Key Required: This operation does not require external API credentials.
- n8n Configuration: No special configuration is needed for this operation.
Troubleshooting
Common Issues
- Syntax Errors in Extraction Function: If your JavaScript extraction function contains errors, the node will fail with a message indicating the problem. Double-check your function syntax.
- Invalid HTML Input: Supplying malformed HTML may result in incomplete or incorrect extraction results.
- Cheerio Usage Errors: Ensure you use the Cheerio instance correctly (e.g.,
cheerioInstance.load(html)).
Error Messages
- "error": "
", "details": " : If an error occurs, the output will contain an"errorfield with the message and adetailsfield with more information. Review these messages to identify issues in your extraction function or input data.