Actions3
- Extractor Actions
- Agent Scrape Actions
Overview
The "Agent Scrape" operation of the AI Scraper node enables scraping data from webpages using a pre-configured scraping agent via an external scraping API. This operation sends a request specifying the agent's name and target URL, optionally routing through a proxy in a selected country and including cookies for session or authentication purposes.
This node is beneficial when you want to extract structured data from websites that may require specific scraping agents configured for particular sites or use cases. It is especially useful for accessing geo-specific content by routing requests through proxies in different countries or handling sessions with cookies.
Practical examples:
- Scraping product details from an e-commerce site using a dedicated agent.
- Extracting real estate listings from a website that requires location-based access.
- Gathering news articles behind geo-restrictions by selecting appropriate proxy countries.
- Maintaining login sessions on websites by providing cookies during scraping.
Properties
| Name | Meaning |
|---|---|
| Agent Name | The name of the pre-configured scraping agent to use for extracting data from the webpage. |
| URL | The full URL of the webpage from which to scrape data. |
| Proxy Country | Route the scraping request through a proxy located in the selected country to access geo-specific content. Options include "Default", "Random Country", and a comprehensive list of countries such as Afghanistan, Australia, Brazil, Canada, United States, etc. |
| Cookies | Optional JSON array of cookie objects to send with the scraping request, e.g., [{"name": "session", "value": "abc", "domain": ".example.com"}]. Useful for maintaining sessions or authentication. |
Output
The node outputs the scraped data in the json field of each item. The structure depends on the response from the scraping API but generally contains the extracted data fields as key-value pairs.
- If the API returns an array of data objects, each object is output as a separate item with its JSON content.
- If the API returns a single data object, it is output as one item.
- No binary data output is indicated for this operation.
Dependencies
- Requires an API key credential for authenticating with the external scraping service (Parsera API).
- Network access to the scraping API endpoint at
https://agents.parsera.org/v1/scrape. - Optional proxy configuration handled internally by the API based on the selected proxy country.
- Optional provision of cookies in JSON format for session management.
Troubleshooting
- Missing or invalid Agent Name: The node will throw an error if the Agent Name is empty or not provided. Ensure the agent name matches a valid pre-configured agent on the scraping service.
- Invalid or missing URL: The node requires a non-empty URL string. Verify the URL is correct and accessible.
- Malformed Cookies JSON: If cookies are provided, they must be valid JSON arrays of objects. Invalid JSON or incorrect structure will cause errors.
- Proxy country issues: Selecting an unsupported or misspelled proxy country value might lead to unexpected results or errors.
- API authentication errors: Ensure the API key credential is correctly set up and has permissions to use the scraping API.
- Network or service downtime: Connectivity issues or service unavailability can cause request failures.
Links and References
- Parsera API Documentation (for detailed API usage and agent configuration)
- n8n documentation on Using Credentials
- General web scraping best practices and legal considerations.