N8N Tools - Web Scraper icon

N8N Tools - Web Scraper

Scrape data from websites using N8N Tools platform

Overview

This node, "N8N Tools - Web Scraper," enables users to scrape data from websites by specifying CSS selectors and various options. It supports scraping a single page, multiple pages, or monitoring a page for changes. The node sends requests to an external web scraping API service, which performs the actual scraping based on the provided parameters.

Common scenarios where this node is beneficial include:

  • Extracting structured data (e.g., product details, prices, headlines) from one or more webpages.
  • Monitoring a webpage for content changes and triggering workflows accordingly.
  • Collecting data from multiple URLs in batch mode without manual intervention.

Practical examples:

  • Scraping product names and prices from an e-commerce site by providing CSS selectors for those elements.
  • Monitoring a news website's homepage for new articles by watching specific HTML elements.
  • Gathering metadata like image URLs and links from a list of blog posts.

Properties

Name Meaning
Selectors A collection of CSS selectors to extract data from the webpage. Each selector includes:
- Name: Field name for output.
- CSS Selector: The CSS query string.
- Attribute: Which attribute to extract (e.g., text content, href, src).
- Multiple: Whether to extract multiple matching elements.
Options Additional settings controlling scraping behavior:
- Wait for Selector: CSS selector to wait for before scraping.
- Wait Time (seconds): Delay before scraping starts.
- User Agent: Custom user agent string.
- Enable JavaScript: Whether to execute JavaScript on the page.
- Screenshot: Whether to take a screenshot of the page.
- Follow Redirects: Whether HTTP redirects should be followed.

Output

The node outputs JSON objects containing the scraped data fields as specified by the selectors. The structure depends on the operation:

  • For single page scraping, the output contains the extracted fields from that page.
  • For multiple pages, the output includes an array or multiple entries corresponding to each URL.
  • For monitoring, the output reflects the current state of the monitored page.

Each output JSON also includes:

  • operation: The type of scraping performed (scrapePage, scrapeMultiple, or monitorPage).
  • success: Boolean indicating if the scraping was successful.
  • In case of failure, an error field with the error message and success set to false.

If the screenshot option is enabled, the node may include binary data representing the page screenshot (not detailed in the code but implied by the option).

Dependencies

  • Requires an external web scraping API service accessible via an API URL and authenticated using an API key credential.
  • The node expects these credentials to be configured in n8n under a generic API key authentication.
  • No other external dependencies are required within the node itself.

Troubleshooting

  • Common issues:

    • Invalid or missing API key credential will cause authentication failures.
    • Incorrect CSS selectors may result in empty or incomplete data extraction.
    • Network issues or unreachable target URLs can cause request failures.
    • If JavaScript execution is disabled but the page relies heavily on JS, data may not load properly.
  • Error messages:

    • "Unknown operation: ..." indicates an unsupported operation value; ensure the operation property is correctly set.
    • "Web scraping failed: ..." wraps errors from the API or network; check connectivity and API status.
  • Resolutions:

    • Verify API key and endpoint configuration.
    • Test CSS selectors independently in browser dev tools.
    • Adjust wait times or enable JavaScript if dynamic content is involved.
    • Enable "Continue On Fail" in the node settings to handle errors gracefully.

Links and References

Discussion