FetchFox AI Scraper icon

FetchFox AI Scraper

Scrape public web data with FetchFox

Overview

This node integrates with the FetchFox AI Scraper service to extract structured data from web pages. Specifically, for the "Extract" resource and the "Extract a Single Item per URL" operation, it allows users to specify a single target URL and define custom fields that they want to scrape from that page. The node sends these instructions to the FetchFox API, which returns the extracted data.

Common scenarios include:

  • Extracting product details (e.g., title, price, description) from an e-commerce page.
  • Scraping article metadata (e.g., headline, author, publish date) from a news site.
  • Gathering contact information or event details from directory listings.

Practical example:
You want to scrape the title and summary of a blog post from a specific URL. You provide the URL and define two fields: "title" and "summary". The node sends this info to FetchFox, which returns the extracted text for those fields.

Properties

Name Meaning
Target URL for Extraction The exact URL of the webpage you want to scrape data from.
Proxy Selects the proxy type used to load the page. Options: None, Datacenter, Residential, Residential with assets (images, fonts, etc). Proxy choice affects cost and how the page is loaded.
Content Transformation Defines how the page content is transformed before extraction to reduce data size and AI costs. Options: Text Only, Text and Basic HTML (keeps links and image URLs), Full HTML, AI Automatically Selects.
Data to Extract A collection of named fields specifying what data to extract from the page. Each field has a name and a description telling the AI what to look for.

Output

The node outputs an array of JSON objects, each representing one extracted item from the target URL. For the "Extract a Single Item per URL" operation, typically there will be one object per input URL.

Each output object contains key-value pairs where keys correspond to the user-defined field names, and values are the extracted data matching the descriptions provided.

Additionally, the first item in the output may contain a special _metrics property with performance or usage metrics returned by the FetchFox API.

No binary data output is produced by this node.

Example output snippet:

[
  {
    "title": "Example Page Title",
    "summary": "This is a summary extracted from the page.",
    "_metrics": {
      "processingTimeMs": 123,
      "tokensUsed": 456
    }
  }
]

Dependencies

  • Requires an active FetchFox API key credential configured in n8n.
  • Uses the FetchFox API endpoint at https://api.fetchfox.ai/api/extract.
  • Network access to the internet to reach the FetchFox service.
  • Optional proxy selection affects how requests are routed and billed.

Troubleshooting

  • Invalid URL or unreachable page: If the target URL is incorrect or the page cannot be loaded, the node may return an error or empty results. Verify the URL and network connectivity.
  • Missing or invalid API credentials: Ensure the FetchFox API key is correctly set up in n8n credentials; otherwise, authentication errors will occur.
  • Incorrect field definitions: If field names or descriptions are missing or unclear, the AI may fail to extract meaningful data. Provide clear, descriptive field instructions.
  • Proxy-related issues: Selecting a proxy type without proper subscription or quota may cause request failures or increased latency.
  • API rate limits or usage caps: Exceeding FetchFox API limits can result in errors; monitor usage and adjust accordingly.

Links and References

Discussion