AI Scraper icon

AI Scraper

Scrape data from websites using the Parsera API

Actions3

Overview

The "Extract From URL" operation of the AI Scraper node enables users to extract structured data from a webpage by providing its URL. It leverages an external parsing API to analyze the webpage content and retrieve specific data fields defined by the user. This is particularly useful for automating data collection from websites without needing to manually parse HTML or write custom scrapers.

Common scenarios include:

  • Extracting product details (name, price, availability) from e-commerce pages.
  • Gathering news headlines and summaries from media sites.
  • Collecting contact information or event details from business or event pages.

For example, a user can specify a product page URL and define attributes like "productName" (string), "price" (number), and "inStock" (boolean). The node will then return these extracted values in JSON format.

Properties

Name Meaning
URL The webpage URL to extract data from. Required.
Attributes Input Mode How to define the data fields to extract:
• Fields — Define individual fields with name, type, and description.
• JSON — Define all attributes as a single JSON object, suitable for complex schemas or AI integration.
Attributes (Fields) When using "Fields" mode: A list of attribute definitions. Each includes:
- Field Name: key for output JSON.
- Type: data type (Any, Boolean, Integer, List, Number, Object, String).
- Description: natural language instruction on what to extract.
Attributes (JSON) When using "JSON" mode: A JSON object where each key is a field name and value is an object with "description" and "type". Allowed types are any, string, integer, number, bool, list, object.
Mode Extraction mode:
• Standard — balanced speed and accuracy.
• Precision — better for data hidden deeper inside HTML structures.
Proxy Country Optional proxy country to route the request through, enabling access to geo-specific content. Includes many countries and options like "Default" or "Random Country".
Cookies Optional cookies as a JSON array of objects (e.g., [{"name": "session", "value": "abc", "domain": ".example.com"}]) to send with the request, useful for authenticated or session-based scraping.

Output

The node outputs an array of items, each containing a json property with the extracted data fields as keys and their corresponding values. The structure directly reflects the attribute names defined in the input properties.

Example output item:

{
  "json": {
    "productName": "Wireless Mouse",
    "price": 29.99,
    "inStock": true
  }
}

No binary data output is produced by this operation.

Dependencies

  • Requires an active connection to the external Parsera API service at https://api.parsera.org/v1.
  • Requires an API authentication token credential configured in n8n for accessing the Parsera API.
  • Optionally uses proxy routing based on the selected country to access geo-restricted content.
  • Supports sending cookies for session or login-required pages.

Troubleshooting

  • Missing or invalid URL: The node throws an error if the URL parameter is empty or not a valid string. Ensure the URL is correctly provided.
  • Attributes definition errors: Errors occur if attribute fields or JSON are malformed, missing required properties (name, type, description), or contain invalid JSON. Validate attribute inputs carefully.
  • Invalid cookies JSON: If cookies are provided but not a valid JSON array, an error is thrown. Make sure cookies are formatted correctly.
  • API errors: Network issues, invalid API credentials, or rate limits from the Parsera API may cause failures. Verify API key validity and network connectivity.
  • Proxy issues: Selecting a proxy country that is unavailable or blocked may result in failed requests. Try switching proxy settings or use default.

Links and References

Discussion