Actions3
- Crawl Actions
- Extract Actions
Overview
This node integrates with the FetchFox AI Scraper service to crawl web pages and find URLs matching a specified URL pattern. It is particularly useful for scenarios where you want to discover multiple URLs under a certain directory or path on a website, such as finding all product pages under a category or all blog posts under a specific tag.
For example, if you want to find all URLs under https://www.example.com/directory/ that match a pattern like https://www.example.com/directory/*, this node will crawl the site up to a maximum number of pages and return the matching URLs.
Properties
| Name | Meaning |
|---|---|
| URL Pattern to Find. Include at Least One * Wildcard | The URL pattern to search for, which must include at least one * wildcard. For example: https://www.example.com/directory/*. This tells the crawler which URLs to look for. |
| Max Visits | The maximum number of pages the crawler will visit during the operation. Defaults to 50. |
| Proxy | The type of proxy to use when loading pages. Options are: - None ($0.01 per GB) - Datacenter ($0.01 per GB) - Residential ($8.00 per GB) - Residential, Load Images, Fonts, Etc ($8.50 per GB) |
Output
The output is an array of JSON objects, each representing a URL found by the crawler that matches the specified pattern. Each object has the following structure:
{
"url": "https://matched-url.com/page"
}
Additionally, the first item in the output array includes a _metrics field containing metrics about the crawl operation (such as performance data), but this is mainly for informational/debugging purposes.
No binary data is output by this operation.
Dependencies
- Requires an API key credential for the FetchFox AI Scraper service.
- The node makes authenticated HTTP POST requests to the FetchFox API endpoint at
https://api.fetchfox.ai/api/crawl. - Proxy usage depends on the selected option and may incur additional costs.
Troubleshooting
Common Issues:
- If the URL pattern does not contain at least one
*wildcard, the node will likely fail or return no results because the pattern is invalid. - Exceeding the
Max Visitslimit may result in incomplete URL discovery. - Using proxies incorrectly or without proper configuration might cause request failures or increased latency.
- If the URL pattern does not contain at least one
Error Messages:
- Authentication errors indicate missing or invalid API credentials; ensure the API key is correctly configured.
- Network or timeout errors may occur if the target website is unreachable or slow; consider adjusting proxy settings or max visits.
- Invalid pattern errors if the pattern format is incorrect; verify the pattern includes at least one
*.
Links and References
- FetchFox AI Scraper Documentation (general reference for the API)
- n8n Documentation on Creating Custom Nodes