FetchFox AI Scraper

Scrape public web data with FetchFox

Actions3

Crawl Actions
- Find URLs Matching a URL Pattern
Extract Actions
- Extract a Single Item per URL
- Extract Multiple Items per URL

Overview

This node integrates with the FetchFox AI Scraper service to crawl web pages and find URLs matching a specified URL pattern. It is particularly useful for scenarios where you want to discover multiple URLs under a certain directory or path on a website, such as finding all product pages under a category or all blog posts under a specific tag.

For example, if you want to find all URLs under https://www.example.com/directory/ that match a pattern like https://www.example.com/directory/*, this node will crawl the site up to a maximum number of pages and return the matching URLs.

Properties

Name	Meaning
URL Pattern to Find. Include at Least One * Wildcard	The URL pattern to search for, which must include at least one `` wildcard. For example: `https://www.example.com/directory/`. This tells the crawler which URLs to look for.
Max Visits	The maximum number of pages the crawler will visit during the operation. Defaults to 50.
Proxy	The type of proxy to use when loading pages. Options are: - None ($0.01 per GB) - Datacenter ($0.01 per GB) - Residential ($8.00 per GB) - Residential, Load Images, Fonts, Etc ($8.50 per GB)

Output

The output is an array of JSON objects, each representing a URL found by the crawler that matches the specified pattern. Each object has the following structure:

{
  "url": "https://matched-url.com/page"
}

Additionally, the first item in the output array includes a _metrics field containing metrics about the crawl operation (such as performance data), but this is mainly for informational/debugging purposes.

No binary data is output by this operation.

Dependencies

Requires an API key credential for the FetchFox AI Scraper service.
The node makes authenticated HTTP POST requests to the FetchFox API endpoint at https://api.fetchfox.ai/api/crawl.
Proxy usage depends on the selected option and may incur additional costs.

Troubleshooting

Common Issues:
- If the URL pattern does not contain at least one * wildcard, the node will likely fail or return no results because the pattern is invalid.
- Exceeding the Max Visits limit may result in incomplete URL discovery.
- Using proxies incorrectly or without proper configuration might cause request failures or increased latency.
Error Messages:
- Authentication errors indicate missing or invalid API credentials; ensure the API key is correctly configured.
- Network or timeout errors may occur if the target website is unreachable or slow; consider adjusting proxy settings or max visits.
- Invalid pattern errors if the pattern format is incorrect; verify the pattern includes at least one *.

Links and References

FetchFox AI Scraper Documentation (general reference for the API)
n8n Documentation on Creating Custom Nodes