Actions3
- Crawl Actions
- Extract Actions
Overview
This node integrates with the FetchFox API to scrape and extract data from web pages. Specifically, for the Extract resource with the Extract a Single Item per URL operation, it fetches a single structured data item from a specified URL based on user-defined fields.
Typical use cases include:
- Extracting product details (e.g., title, price, description) from an e-commerce page.
- Scraping article metadata (e.g., headline, author, publish date) from a news site.
- Gathering contact information or other structured data from directory listings.
By defining custom fields and descriptions, users can tailor the extraction to their specific needs without manual parsing.
Properties
| Name | Meaning |
|---|---|
| Target URL for Extraction | The exact URL of the webpage from which you want to scrape data. Example: https://www.example.com/directory/page-1. |
| Proxy | Selects the proxy type used to load the page. Options are: - None ($0.01 per GB) - Datacenter ($0.01 per GB) - Residential ($8.00 per GB) - Residential with assets like images and fonts ($8.50 per GB) |
| Content Transformation | Defines how the page content is transformed before extraction, affecting data size and AI cost: - Text Only - Text and Basic HTML (keeps links and image URLs only) - Full HTML - AI Automatically Selects |
| Data to Extract | A collection of named fields specifying what data to extract from the page. Each field requires: - Field Name: Identifier for the extracted data (e.g., "title") - Field Description: Instructions describing the data |
Output
The node outputs an array of JSON objects representing the extracted data items. For this operation, it returns a single item corresponding to the target URL, with keys matching the user-defined field names and values containing the extracted content.
Example output structure:
[
{
"title": "Example Page Title",
"author": "John Doe",
"publishDate": "2024-06-01",
"_metrics": {
"someMetricKey": "metricValue"
}
}
]
- The
_metricsproperty contains metadata about the extraction process (such as performance metrics) and is included in the first item if available. - If binary data were involved, it would be summarized here, but this node focuses on JSON data extraction.
Dependencies
- Requires an active connection to the FetchFox API service.
- Needs an API authentication token configured in n8n credentials (referred generically as "an API key credential").
- Network access to the target URLs, optionally routed through selected proxies.
- No additional environment variables are required beyond standard n8n credential setup.
Troubleshooting
Common Issues:
- Invalid or unreachable target URL: Ensure the URL is correct and accessible from the network where n8n runs.
- Insufficient permissions or invalid API key: Verify that the API key credential is valid and has access to FetchFox services.
- Proxy misconfiguration: Selecting a proxy type incompatible with your network may cause failures; try switching to "None" or another proxy option.
- Incorrect field definitions: Missing or vague field descriptions may lead to incomplete or inaccurate extraction results.
Error Messages:
- Authentication errors typically indicate issues with the API key credential; reconfigure or update the credential.
- HTTP request failures may point to network issues or invalid URLs.
- API response errors might occur if the FetchFox service rejects the request due to malformed parameters; double-check the input properties.
Links and References
- FetchFox API Documentation (base URL referenced in code)
- n8n Documentation on Creating Custom Nodes
- General Web Scraping Best Practices and Legal Considerations: https://en.wikipedia.org/wiki/Web_scraping