FetchFox icon

FetchFox

Scrape data with FetchFox

Overview

This node integrates with the FetchFox API to scrape and extract data from web pages. Specifically, for the Extract resource with the Extract a Single Item per URL operation, it fetches a single structured data item from a specified URL based on user-defined fields.

Typical use cases include:

  • Extracting product details (e.g., title, price, description) from an e-commerce page.
  • Scraping article metadata (e.g., headline, author, publish date) from a news site.
  • Gathering contact information or other structured data from directory listings.

By defining custom fields and descriptions, users can tailor the extraction to their specific needs without manual parsing.

Properties

Name Meaning
Target URL for Extraction The exact URL of the webpage from which you want to scrape data. Example: https://www.example.com/directory/page-1.
Proxy Selects the proxy type used to load the page. Options are:
- None ($0.01 per GB)
- Datacenter ($0.01 per GB)
- Residential ($8.00 per GB)
- Residential with assets like images and fonts ($8.50 per GB)
Content Transformation Defines how the page content is transformed before extraction, affecting data size and AI cost:
- Text Only
- Text and Basic HTML (keeps links and image URLs only)
- Full HTML
- AI Automatically Selects
Data to Extract A collection of named fields specifying what data to extract from the page. Each field requires:
- Field Name: Identifier for the extracted data (e.g., "title")
- Field Description: Instructions describing the data

Output

The node outputs an array of JSON objects representing the extracted data items. For this operation, it returns a single item corresponding to the target URL, with keys matching the user-defined field names and values containing the extracted content.

Example output structure:

[
  {
    "title": "Example Page Title",
    "author": "John Doe",
    "publishDate": "2024-06-01",
    "_metrics": {
      "someMetricKey": "metricValue"
    }
  }
]
  • The _metrics property contains metadata about the extraction process (such as performance metrics) and is included in the first item if available.
  • If binary data were involved, it would be summarized here, but this node focuses on JSON data extraction.

Dependencies

  • Requires an active connection to the FetchFox API service.
  • Needs an API authentication token configured in n8n credentials (referred generically as "an API key credential").
  • Network access to the target URLs, optionally routed through selected proxies.
  • No additional environment variables are required beyond standard n8n credential setup.

Troubleshooting

  • Common Issues:

    • Invalid or unreachable target URL: Ensure the URL is correct and accessible from the network where n8n runs.
    • Insufficient permissions or invalid API key: Verify that the API key credential is valid and has access to FetchFox services.
    • Proxy misconfiguration: Selecting a proxy type incompatible with your network may cause failures; try switching to "None" or another proxy option.
    • Incorrect field definitions: Missing or vague field descriptions may lead to incomplete or inaccurate extraction results.
  • Error Messages:

    • Authentication errors typically indicate issues with the API key credential; reconfigure or update the credential.
    • HTTP request failures may point to network issues or invalid URLs.
    • API response errors might occur if the FetchFox service rejects the request due to malformed parameters; double-check the input properties.

Links and References

Discussion