N8N Tools - Web Scraper

Scrape data from websites using N8N Tools platform

Actions3

Overview

This node enables scraping data from multiple webpages by sending requests to the N8N Tools web scraping platform. It is designed to extract structured information from a list of URLs using specified CSS selectors. The node supports advanced options such as waiting for specific elements to load, enabling JavaScript execution on pages, customizing user agent strings, taking screenshots, and handling HTTP redirects.

Common scenarios:

Extracting product details from multiple e-commerce pages.
Gathering news headlines or article metadata from several news sites.
Collecting contact information or links from a list of company websites.

Practical example:
You provide a list of URLs (one per line) pointing to different blog posts. You specify CSS selectors to extract the post title, author name, and publication date. The node scrapes each page, waits for the main content to load if needed, and returns the extracted data in a structured JSON format.

Properties

Name	Meaning
URLs	List of webpage URLs to scrape, entered one URL per line.
Selectors	One or more CSS selectors defining what data to extract from each page. Each selector includes:
	- Name: Field name for the output data.
	- CSS Selector: The CSS query to locate elements.
	- Attribute: Which attribute to extract (e.g., text content, href, src). Defaults to text.
	- Multiple: Whether to extract multiple matching elements (true/false).
Options	Additional settings to control scraping behavior:
	- Wait for Selector: CSS selector to wait for before starting extraction.
	- Wait Time (seconds): Delay before scraping starts (default 5 seconds).
	- User Agent: Custom user agent string to use for requests.
	- Enable JavaScript: Whether to execute JavaScript on the page (default true).
	- Screenshot: Whether to take a screenshot of the page during scraping (default false).
	- Follow Redirects: Whether to follow HTTP redirects (default true).

Output

The node outputs an array with one item containing a json object that includes:

The scraped data fields as defined by the selectors, with their extracted values.
An operation field indicating the operation performed (scrapeMultiple).
A success boolean indicating whether the scraping succeeded.

If the screenshot option is enabled, the output may also include binary data representing the captured image (not detailed in the code but implied by the option).

In case of failure (and if "Continue On Fail" is enabled), the output will contain an error message and success: false.

Dependencies

Requires an API key credential for the N8N Tools web scraping platform.
The node sends HTTP POST requests to the platform's API endpoints.
No other external dependencies are required.
Ensure the API URL and key are correctly configured in the node credentials.

Troubleshooting

Common issues:
- Invalid or missing API key will cause authentication failures.
- Incorrect CSS selectors may result in empty or incomplete data.
- Pages heavily reliant on JavaScript might require enabling the "Enable JavaScript" option.
- Network issues or blocked requests can cause timeouts or errors.
Error messages:
- "Web scraping failed: <error message>" indicates a failure during the scraping request. Check network connectivity, API key validity, and input parameters.
- "Unknown operation: <operation>" means an unsupported operation was selected; ensure "Scrape Multiple Pages" is chosen.
- If no data is returned, verify that URLs are correct and selectors match the target page structure.
To resolve errors, verify all inputs, enable debugging logs if available, and test selectors independently in browser developer tools.

Links and References

N8N Documentation
CSS Selectors Reference
N8N Tools Web Scraper Platform (for API details and usage)