AI Scraper

Scrape data from websites using the Parsera API

Actions3

Extractor Actions
- Extract From URL
- Parse HTML
Agent Scrape Actions
- Agent Scrape

Overview

The "Agent Scrape" operation of the AI Scraper node enables scraping data from webpages using a pre-configured scraping agent via an external scraping API. This operation sends a request specifying the agent's name and target URL, optionally routing through a proxy in a selected country and including cookies for session or authentication purposes.

This node is beneficial when you want to extract structured data from websites that may require specific scraping agents configured for particular sites or use cases. It is especially useful for accessing geo-specific content by routing requests through proxies in different countries or handling sessions with cookies.

Practical examples:

Scraping product details from an e-commerce site using a dedicated agent.
Extracting real estate listings from a website that requires location-based access.
Gathering news articles behind geo-restrictions by selecting appropriate proxy countries.
Maintaining login sessions on websites by providing cookies during scraping.

Properties

Name	Meaning
Agent Name	The name of the pre-configured scraping agent to use for extracting data from the webpage.
URL	The full URL of the webpage from which to scrape data.
Proxy Country	Route the scraping request through a proxy located in the selected country to access geo-specific content. Options include "Default", "Random Country", and a comprehensive list of countries such as Afghanistan, Australia, Brazil, Canada, United States, etc.
Cookies	Optional JSON array of cookie objects to send with the scraping request, e.g., `[{"name": "session", "value": "abc", "domain": ".example.com"}]`. Useful for maintaining sessions or authentication.

Output

The node outputs the scraped data in the json field of each item. The structure depends on the response from the scraping API but generally contains the extracted data fields as key-value pairs.

If the API returns an array of data objects, each object is output as a separate item with its JSON content.
If the API returns a single data object, it is output as one item.
No binary data output is indicated for this operation.

Dependencies

Requires an API key credential for authenticating with the external scraping service (Parsera API).
Network access to the scraping API endpoint at https://agents.parsera.org/v1/scrape.
Optional proxy configuration handled internally by the API based on the selected proxy country.
Optional provision of cookies in JSON format for session management.

Troubleshooting

Missing or invalid Agent Name: The node will throw an error if the Agent Name is empty or not provided. Ensure the agent name matches a valid pre-configured agent on the scraping service.
Invalid or missing URL: The node requires a non-empty URL string. Verify the URL is correct and accessible.
Malformed Cookies JSON: If cookies are provided, they must be valid JSON arrays of objects. Invalid JSON or incorrect structure will cause errors.
Proxy country issues: Selecting an unsupported or misspelled proxy country value might lead to unexpected results or errors.
API authentication errors: Ensure the API key credential is correctly set up and has permissions to use the scraping API.
Network or service downtime: Connectivity issues or service unavailability can cause request failures.

Links and References

Parsera API Documentation (for detailed API usage and agent configuration)
n8n documentation on Using Credentials
General web scraping best practices and legal considerations.