Overview
This node integrates with ScrapeOps services, providing three main API functionalities: Proxy API, Parser API, and Data API. It enables users to scrape web pages through proxy servers, parse HTML content from various domains, or retrieve structured data from supported sources like Amazon.
Common scenarios include:
- Using the Proxy API to scrape websites that implement anti-bot protections by routing requests through specialized proxies with options for bypassing bot defenses, geo-targeting, and rendering JavaScript.
- Using the Parser API to extract structured information from raw HTML content of popular e-commerce and job listing sites.
- Using the Data API to fetch detailed product or search data from Amazon, supporting both direct product lookups and keyword-based searches.
Practical examples:
- Scraping a product page on Amazon using the Proxy API with Cloudflare bypass and US geo-targeting.
- Parsing an Indeed job listing HTML snippet to extract job details.
- Retrieving Amazon product data by ASIN or URL via the Data API.
Properties
| Name | Meaning |
|---|---|
| API | Select which ScrapeOps API to use: Proxy API, Parser API, or Data API. |
Proxy API Properties (shown when API = Proxy API)
| Name | Meaning |
|---|---|
| URL | The target URL to scrape. |
| Method | HTTP method to use: GET or POST. |
| Advanced Options | Collection of advanced settings including: - Bypass: Anti-bot bypass levels (Cloudflare, DataDome, Incapsula, etc.) - Country: Geo-targeting country code - Custom Cookies/Headers: JSON objects - Device Type: Desktop or Mobile user-agent - File Type: Expected file type (e.g., PDF, CSV) - Final/Initial Status Code: Whether to return these in response headers - Follow Redirects: Enable/disable following redirects - JS Scenario: JSON array of headless browser steps to run before response - Keep Headers: Use custom headers in request - LLM Data Schema: Page type for optimized extraction (e.g., product_page, job_page) - LLM Extract: Enable intelligent content parsing - LLM Extract Response Type: JSON or Markdown - Max Request Cost: Limit API credits per request - Mobile Proxies: Use mobile proxies - Optimize Request: Let ScrapeOps optimize request settings - Premium: Proxy performance level - Render JavaScript: Enable JS rendering - Residential Proxies: Use residential proxies - Screenshot: Return base64 screenshot (requires JS rendering) - Scroll: Pixels to scroll before response - Session Number: Sticky session IP reuse - Wait For: CSS selector to wait for before response - Wait Time: Milliseconds to wait before collecting data |
| Return Type | Format of returned response: Target Server Response (default) or JSON Response. |
Parser API Properties (shown when API = Parser API)
| Name | Meaning |
|---|---|
| Domain | Domain to parse (Amazon, eBay, Indeed, Redfin, Walmart). |
| URL | URL of the page being parsed. |
| HTML Content | Raw HTML content string to parse. |
Data API Properties (shown when API = Data API)
| Name | Meaning |
|---|---|
| Domain | Domain data to retrieve (currently only Amazon). |
| Amazon API Type | Type of Amazon API: Product API (by ASIN or URL) or Product Search API (by query or URL). |
| Input Type | For Product API: ASIN or URL. For Search API: Query or URL. |
| ASIN | Amazon Standard Identification Number (required if input type is ASIN). |
| Product URL | Full Amazon product URL (required if input type is URL). |
| Search Query | Search keywords for Amazon products (required if input type is query). |
| Search URL | Full Amazon search page URL (required if input type is URL). |
| Amazon API Options | Additional options such as country code and top-level domain (TLD) for Amazon scraping. |
Output
The node outputs an array of items where each item contains a json field with the response data from the selected ScrapeOps API.
- For Proxy API, the output can be either the raw server response or a JSON-formatted response depending on the "Return Type" property. If enabled, it may also include screenshots (base64 encoded) and status codes.
- For Parser API, the output JSON contains the parsed structured data extracted from the provided HTML content.
- For Data API, the output JSON includes detailed product or search results data from Amazon.
Binary data output is not explicitly mentioned but screenshots are returned as base64 strings within JSON.
Dependencies
- Requires a valid ScrapeOps API key credential configured in n8n.
- Network access to ScrapeOps endpoints.
- No other external dependencies are indicated.
Troubleshooting
- Missing or invalid API key: The node throws an error if the API key is missing or invalid. Ensure the ScrapeOps API key credential is correctly set up.
- Unsupported API type: Selecting an unsupported API type will cause an error; verify the API selection.
- Parameter validation: Required parameters like URLs, ASINs, or HTML content must be provided according to the selected API and operation.
- Request failures due to anti-bot protections: Use appropriate bypass options and proxy types to improve success rates.
- Timeouts or incomplete responses: Adjust wait times, enable JS rendering, or use headless browser scenarios if needed.
- Error handling: When "Continue On Fail" is enabled, errors are returned as JSON with suggestions to check credentials and parameters.