FetchFox icon

FetchFox

Scrape data with FetchFox

Overview

This node integrates with the FetchFox API to crawl and extract data from web pages. Specifically, for the "Crawl" resource with the "Find URLs Matching a URL Pattern" operation, it allows users to find all URLs on the web that match a specified pattern containing at least one wildcard (*). This is useful for discovering multiple related pages under a common URL structure, such as all product pages in a category or all blog posts under a directory.

Common scenarios:

  • Automatically finding all URLs under a website section for further processing.
  • Gathering links matching a pattern before scraping their content.
  • Monitoring new pages added to a site that follow a known URL pattern.

Example:
If you want to find all URLs under https://www.example.com/directory/ that start with that path, you can specify the pattern https://www.example.com/directory/*. The node will return all matching URLs found by FetchFox.

Properties

Name Meaning
URL Pattern to Find. Include at Least One * Wildcard The URL pattern to search for, must include at least one * wildcard. For example, https://www.example.com/directory/*. FetchFox will find URLs matching this pattern.
Proxy Which proxy to use when loading pages. Options are:
- None ($0.01 per GB)
- Datacenter ($0.01 per GB)
- Residential ($8.00 per GB)
- Residential, Load Images, Fonts, Etc ($8.50 per GB)

Output

The output is an array of JSON objects, each representing a URL found matching the specified pattern. Each object has the following structure:

{
  "url": "https://matched-url.com/page",
  "_metrics": {
    // Optional metrics about the crawl request (only present on the first item)
  }
}
  • The url field contains the matched URL string.
  • The first item may include a _metrics field with metadata about the crawl operation (such as performance or usage statistics).
  • No binary data is output by this operation.

Dependencies

  • Requires an API key credential for authenticating with the FetchFox API.
  • The node makes HTTP POST requests to https://dev.api.fetchfox.ai/api/crawl.
  • Proxy options affect how requests are routed and may incur different costs.
  • Proper network access to the FetchFox API endpoint is required.

Troubleshooting

  • Missing or invalid API credentials: The node requires a valid API key credential. Ensure the credential is configured correctly in n8n.
  • Invalid URL pattern: The pattern must contain at least one * wildcard. Omitting this will likely cause an error or no results.
  • No URLs found: If the pattern is too restrictive or incorrect, no URLs may be returned.
  • Proxy issues: Selecting a proxy type that is not supported or misconfigured may cause request failures.
  • API errors: Network issues or API rate limits may cause errors. Check the node's error messages and ensure your API quota is sufficient.

Links and References

Discussion