FetchFox icon

FetchFox

Scrape data with FetchFox

Overview

The node integrates with the FetchFox API to scrape and extract structured data from web pages. Specifically, for the Extract Multiple Items per URL operation, it extracts multiple data items from a single target URL based on user-defined fields. This is useful when you want to gather lists or collections of similar data entries from a webpage, such as product listings, article summaries, or directory entries.

Typical use cases include:

  • Extracting multiple product details from an e-commerce category page.
  • Scraping multiple job postings from a job board page.
  • Collecting multiple event details from an event listing page.

By defining the fields to extract and providing the target URL, users can automate data collection at scale without manual scraping.

Properties

Name Meaning
Target URL for Extraction The URL of the webpage from which data will be scraped. Example: https://www.example.com/directory/page-1.
Proxy Selects the proxy type used to load the page. Options include:
- None ($0.01 per GB)
- Datacenter ($0.01 per GB)
- Residential ($8.00 per GB)
- Residential with assets (images, fonts, etc.) ($8.50 per GB)
Content Transformation Defines how the page content is transformed before extraction, affecting data size and AI cost:
- Text Only
- Text and Basic HTML (keeps links and image URLs only)
- Full HTML
- AI Automatically Selects
Data to Extract A collection of fields specifying what data to extract from the page. Each field requires:
- Field Name: Identifier for the extracted data (e.g., "title")
- Field Description: Explanation of the data to extract (e.g., "Title of the post")

Output

The node outputs an array of JSON objects, each representing one extracted item from the target URL. Each object contains key-value pairs where keys correspond to the user-defined field names and values are the extracted data.

Additionally, the first item in the output array may include a _metrics property containing metadata about the extraction process (such as performance metrics).

No binary data output is produced by this node.

Example output structure:

[
  {
    "title": "Example Product 1",
    "price": "$19.99",
    "description": "A great product.",
    "_metrics": {
      "someMetric": 123,
      "anotherMetric": 456
    }
  },
  {
    "title": "Example Product 2",
    "price": "$29.99",
    "description": "Another great product."
  }
]

Dependencies

  • Requires an API key credential for the FetchFox service to authenticate requests.
  • Uses the FetchFox API endpoint at https://dev.api.fetchfox.ai/api/extract.
  • Supports optional proxy usage to route requests through different proxy types.
  • No additional environment variables are required beyond the API authentication setup.

Troubleshooting

  • Common Issues:

    • Invalid or missing API credentials will cause authentication failures.
    • Incorrectly specified target URLs or unreachable pages will result in empty or error responses.
    • Misconfigured field definitions (missing names or descriptions) may lead to incomplete extraction results.
    • Using proxies incorrectly or selecting expensive residential proxies unintentionally may increase costs.
  • Error Messages:

    • Authentication errors typically indicate invalid API keys; verify and update credentials.
    • Network errors suggest connectivity issues or blocked access; check URL validity and proxy settings.
    • API response errors may occur if the request body is malformed; ensure all required properties are correctly set.

Links and References

Discussion