FetchFox AI Scraper icon

FetchFox AI Scraper

Scrape public web data with FetchFox

Overview

This node integrates with the FetchFox AI Scraper service to extract structured data from web pages. Specifically, the "Extract Multiple Items per URL" operation allows users to scrape multiple data items from a single target URL by defining custom fields for extraction. This is useful for scenarios such as scraping product listings, article summaries, or any repeated data elements on a webpage.

Practical examples include:

  • Extracting all product names, prices, and descriptions from an e-commerce category page.
  • Collecting multiple news headlines and their publication dates from a news portal's front page.
  • Gathering multiple event details (title, date, location) from an event listing page.

Properties

Name Meaning
Target URL for Extraction The URL of the webpage from which data will be scraped.
Proxy Selects the proxy type used to load pages: None, Datacenter, Residential, or Residential with assets. Proxy choice affects cost and how the page is loaded.
Content Transformation Defines how the page content is transformed before extraction to reduce data size and AI costs: Text Only, Text and Basic HTML (links and image URLs only), Full HTML, or AI automatically selects the best option.
Data to Extract A collection of named fields specifying what data to extract from the page. Each field has:
- Field Name: Identifier for the extracted data.
- Field Description: Instructions describing the data to extract (used by AI).

Output

The node outputs an array of JSON objects, each representing one extracted item from the target URL. Each object contains key-value pairs where keys correspond to the user-defined field names and values are the extracted data.

Additionally, the first item in the output array may include a _metrics property containing metadata about the extraction process (such as performance metrics).

No binary data output is produced by this node.

Example output structure:

[
  {
    "title": "Example Product 1",
    "price": "$19.99",
    "description": "A great product.",
    "_metrics": { /* extraction metadata */ }
  },
  {
    "title": "Example Product 2",
    "price": "$29.99",
    "description": "Another great product."
  }
]

Dependencies

  • Requires an API key credential for the FetchFox AI Scraper service.
  • The node makes authenticated HTTP POST requests to https://api.fetchfox.ai/api/extract.
  • Proxy options affect how pages are loaded and may incur different costs.
  • No additional environment variables are required beyond the API credential.

Troubleshooting

  • Invalid URL or unreachable page: Ensure the target URL is correct and publicly accessible. Network issues or incorrect proxy settings can cause failures.
  • Empty or incomplete extraction results: Verify that the field descriptions clearly instruct what data to extract. Ambiguous or vague descriptions may lead to poor extraction quality.
  • API authentication errors: Confirm that the API key credential is valid and properly configured in n8n.
  • High costs due to proxy usage: Using residential proxies or loading assets increases cost; choose proxy settings according to your budget.
  • Unexpected response format: If the API changes or returns errors, check the FetchFox service status and update the node accordingly.

Links and References

Discussion