Actions3
- Crawl Actions
- Extract Actions
Overview
This node integrates with the FetchFox API to crawl and extract data from web pages. Specifically, for the "Crawl" resource with the "Find URLs Matching a URL Pattern" operation, it allows users to find all URLs on the web that match a specified pattern containing at least one wildcard (*). This is useful for discovering multiple related pages under a common URL structure, such as all product pages in a category or all blog posts under a directory.
Common scenarios:
- Automatically finding all URLs under a website section for further processing.
- Gathering links matching a pattern before scraping their content.
- Monitoring new pages added to a site that follow a known URL pattern.
Example:
If you want to find all URLs under https://www.example.com/directory/ that start with that path, you can specify the pattern https://www.example.com/directory/*. The node will return all matching URLs found by FetchFox.
Properties
| Name | Meaning |
|---|---|
| URL Pattern to Find. Include at Least One * Wildcard | The URL pattern to search for, must include at least one * wildcard. For example, https://www.example.com/directory/*. FetchFox will find URLs matching this pattern. |
| Proxy | Which proxy to use when loading pages. Options are: - None ($0.01 per GB) - Datacenter ($0.01 per GB) - Residential ($8.00 per GB) - Residential, Load Images, Fonts, Etc ($8.50 per GB) |
Output
The output is an array of JSON objects, each representing a URL found matching the specified pattern. Each object has the following structure:
{
"url": "https://matched-url.com/page",
"_metrics": {
// Optional metrics about the crawl request (only present on the first item)
}
}
- The
urlfield contains the matched URL string. - The first item may include a
_metricsfield with metadata about the crawl operation (such as performance or usage statistics). - No binary data is output by this operation.
Dependencies
- Requires an API key credential for authenticating with the FetchFox API.
- The node makes HTTP POST requests to
https://dev.api.fetchfox.ai/api/crawl. - Proxy options affect how requests are routed and may incur different costs.
- Proper network access to the FetchFox API endpoint is required.
Troubleshooting
- Missing or invalid API credentials: The node requires a valid API key credential. Ensure the credential is configured correctly in n8n.
- Invalid URL pattern: The pattern must contain at least one
*wildcard. Omitting this will likely cause an error or no results. - No URLs found: If the pattern is too restrictive or incorrect, no URLs may be returned.
- Proxy issues: Selecting a proxy type that is not supported or misconfigured may cause request failures.
- API errors: Network issues or API rate limits may cause errors. Check the node's error messages and ensure your API quota is sufficient.
Links and References
- FetchFox API Documentation (hypothetical link, replace with actual if available)
- n8n Documentation - Creating Custom Nodes
- Understanding URL Patterns and Wildcards