N8N Tools - Web Scraper

Scrape data from websites using N8N Tools platform

Actions3

Overview

The node "N8N Tools - Web Scraper" enables scraping and monitoring of web pages by interacting with an external web scraping API. It supports three main operations:

Scrape Single Page: Extract data from a single webpage using specified CSS selectors.
Scrape Multiple Pages: Extract data from multiple webpages, each defined by a URL.
Monitor Page Changes: Monitor a webpage for changes over time.

This node is useful in scenarios such as price tracking, content aggregation, competitive analysis, or monitoring website updates. For example, you can scrape product details from an e-commerce page or monitor a news site for new articles.

Properties

Name	Meaning
URL	The URL of the webpage to scrape or monitor (used in "Scrape Single Page" and "Monitor Page Changes").
URLs	Multiple URLs to scrape, one per line (used in "Scrape Multiple Pages").
Selectors	CSS selectors to extract data. Each selector includes: - Name: Field name in output. - CSS Selector: CSS query string. - Attribute: Attribute to extract (e.g., text, href, src). - Multiple: Whether to extract multiple elements matching the selector.
Options	Additional options for scraping: - Wait for Selector: CSS selector to wait for before scraping. - Wait Time (seconds): Delay before scraping. - User Agent: Custom user agent string. - Enable JavaScript: Enable or disable JavaScript execution. - Screenshot: Take a screenshot of the page. - Follow Redirects: Follow HTTP redirects during requests.
Operation	The action to perform: - Scrape Single Page - Scrape Multiple Pages - Monitor Page Changes

Output

The node outputs JSON data containing the results returned by the external scraping API. The structure varies depending on the operation but generally includes:

Extracted data fields as defined by the selectors.
Metadata such as operation type and success status.
In case of errors, an error message and a failure flag.

If the screenshot option is enabled, the output may include binary data representing the captured screenshot (though this is not explicitly detailed in the code).

Example output snippet:

{
  "operation": "monitorPage",
  "success": true,
  "dataField1": "...",
  "dataField2": "..."
}

Dependencies

Requires an API key credential for the external N8N Tools web scraping platform.
The node makes HTTP POST requests to the platform's API endpoints.
No other external dependencies are required.
Proper configuration of the API URL and API key credential within n8n is necessary.

Troubleshooting

Common Issues:
- Invalid or missing API key credential will cause authentication failures.
- Incorrect CSS selectors may result in empty or incomplete data extraction.
- Network issues or unreachable URLs can cause request failures.
- If JavaScript execution is disabled on pages that require it, data may not load properly.
Error Messages:
- "Unknown operation: <operation>": Indicates an unsupported operation was selected; verify the operation property.
- "Web scraping failed: <error message>": General failure during scraping; check network connectivity, API key validity, and input parameters.
Resolutions:
- Ensure the API key credential is correctly set up and valid.
- Validate CSS selectors using browser developer tools before use.
- Confirm URLs are reachable and correct.
- Adjust options like enabling JavaScript or increasing wait time if pages load slowly or rely heavily on scripts.