N8N Tools - Web Scraper icon

N8N Tools - Web Scraper

Scrape data from websites using N8N Tools platform

Overview

This node, "N8N Tools - Web Scraper," enables scraping data from websites by sending requests to the N8N Tools platform's web scraping API. It supports three main operations:

  • Scrape Single Page: Extract data from a single webpage using specified CSS selectors.
  • Scrape Multiple Pages: Extract data from multiple webpages by providing a list of URLs.
  • Monitor Page Changes: Monitor a webpage for changes over time.

Typical use cases include gathering structured data from product pages, news articles, or any website where automated extraction of specific elements is needed. For example, you could scrape product prices and descriptions from an e-commerce site or monitor a blog page for new posts.

Properties

Name Meaning
URL The URL of the webpage to scrape (used in "Scrape Single Page" and "Monitor Page Changes" operations).
URLs Multiple URLs to scrape, one per line (used in "Scrape Multiple Pages" operation).
Selectors CSS selectors to extract data. Each selector includes:
- Name: Field name in output.
- CSS Selector: The CSS query to select elements.
- Attribute: Which attribute to extract (e.g., text, href, src).
- Multiple: Whether to extract multiple elements matching the selector.
Options Additional options for scraping:
- Wait for Selector: CSS selector to wait for before scraping.
- Wait Time (seconds): Delay before scraping.
- User Agent: Custom user agent string.
- Enable JavaScript: Enable or disable JavaScript execution on the page.
- Screenshot: Take a screenshot of the page.
- Follow Redirects: Follow HTTP redirects during scraping.
Operation The scraping operation to perform:
- Scrape Single Page
- Scrape Multiple Pages
- Monitor Page Changes

Output

The node outputs JSON data containing the scraped results combined with metadata about the operation. The structure depends on the operation but generally includes:

  • Extracted fields as defined by the selectors, with their corresponding values.
  • An operation field indicating which operation was performed.
  • A success boolean indicating if the scraping succeeded.
  • In case of failure (if continuing on fail), an error message and success: false.

If the "Screenshot" option is enabled, the response may also include binary data representing the screenshot image (though this is not explicitly detailed in the code).

Example output JSON snippet:

{
  "operation": "scrapePage",
  "success": true,
  "fieldName1": "extracted value",
  "fieldName2": ["value1", "value2"],
  ...
}

Dependencies

  • Requires an API key credential for the N8N Tools platform to authenticate requests.
  • The node sends HTTP POST requests to the N8N Tools API endpoints (/api/v1/scraper/single, /multiple, or /monitor) depending on the operation.
  • No other external dependencies are required within n8n.

Troubleshooting

  • Common issues:

    • Invalid or missing API key credential will cause authentication failures.
    • Incorrect CSS selectors may result in empty or incomplete data extraction.
    • Network issues or unreachable URLs can cause request failures.
    • If JavaScript execution is disabled on pages that require it, data might not load properly.
  • Error messages:

    • "Unknown operation: <operation>": Occurs if an unsupported operation is selected; ensure the operation is one of the supported options.
    • "Web scraping failed: <message>": Generic error when scraping fails; check the error message for details such as network errors or invalid parameters.
  • To handle errors gracefully, enable "Continue On Fail" in the node settings to receive error details in the output instead of stopping workflow execution.

Links and References

Discussion