ScrapeOps

Interact with ScrapeOps Proxy, Parser and Data APIs

Overview

This node integrates with ScrapeOps services, providing three main API functionalities: Proxy API, Parser API, and Data API. It enables users to scrape web pages through proxy servers, parse HTML content from various domains, or retrieve structured data from supported sources like Amazon.

Common scenarios include:

Using the Proxy API to scrape websites that implement anti-bot protections by routing requests through specialized proxies with options for bypassing bot defenses, geo-targeting, and rendering JavaScript.
Using the Parser API to extract structured information from raw HTML content of popular e-commerce and job listing sites.
Using the Data API to fetch detailed product or search data from Amazon, supporting both direct product lookups and keyword-based searches.

Practical examples:

Scraping a product page on Amazon using the Proxy API with Cloudflare bypass and US geo-targeting.
Parsing an Indeed job listing HTML snippet to extract job details.
Retrieving Amazon product data by ASIN or URL via the Data API.

Properties

Name	Meaning
API	Select which ScrapeOps API to use: Proxy API, Parser API, or Data API.

Proxy API Properties (shown when API = Proxy API)

Name	Meaning
URL	The target URL to scrape.
Method	HTTP method to use: GET or POST.
Advanced Options	Collection of advanced settings including: - Bypass: Anti-bot bypass levels (Cloudflare, DataDome, Incapsula, etc.) - Country: Geo-targeting country code - Custom Cookies/Headers: JSON objects - Device Type: Desktop or Mobile user-agent - File Type: Expected file type (e.g., PDF, CSV) - Final/Initial Status Code: Whether to return these in response headers - Follow Redirects: Enable/disable following redirects - JS Scenario: JSON array of headless browser steps to run before response - Keep Headers: Use custom headers in request - LLM Data Schema: Page type for optimized extraction (e.g., product_page, job_page) - LLM Extract: Enable intelligent content parsing - LLM Extract Response Type: JSON or Markdown - Max Request Cost: Limit API credits per request - Mobile Proxies: Use mobile proxies - Optimize Request: Let ScrapeOps optimize request settings - Premium: Proxy performance level - Render JavaScript: Enable JS rendering - Residential Proxies: Use residential proxies - Screenshot: Return base64 screenshot (requires JS rendering) - Scroll: Pixels to scroll before response - Session Number: Sticky session IP reuse - Wait For: CSS selector to wait for before response - Wait Time: Milliseconds to wait before collecting data
Return Type	Format of returned response: Target Server Response (default) or JSON Response.

Parser API Properties (shown when API = Parser API)

Name	Meaning
Domain	Domain to parse (Amazon, eBay, Indeed, Redfin, Walmart).
URL	URL of the page being parsed.
HTML Content	Raw HTML content string to parse.

Data API Properties (shown when API = Data API)

Name	Meaning
Domain	Domain data to retrieve (currently only Amazon).
Amazon API Type	Type of Amazon API: Product API (by ASIN or URL) or Product Search API (by query or URL).
Input Type	For Product API: ASIN or URL. For Search API: Query or URL.
ASIN	Amazon Standard Identification Number (required if input type is ASIN).
Product URL	Full Amazon product URL (required if input type is URL).
Search Query	Search keywords for Amazon products (required if input type is query).
Search URL	Full Amazon search page URL (required if input type is URL).
Amazon API Options	Additional options such as country code and top-level domain (TLD) for Amazon scraping.

Output

The node outputs an array of items where each item contains a json field with the response data from the selected ScrapeOps API.

For Proxy API, the output can be either the raw server response or a JSON-formatted response depending on the "Return Type" property. If enabled, it may also include screenshots (base64 encoded) and status codes.
For Parser API, the output JSON contains the parsed structured data extracted from the provided HTML content.
For Data API, the output JSON includes detailed product or search results data from Amazon.

Binary data output is not explicitly mentioned but screenshots are returned as base64 strings within JSON.

Dependencies

Requires a valid ScrapeOps API key credential configured in n8n.
Network access to ScrapeOps endpoints.
No other external dependencies are indicated.

Troubleshooting

Missing or invalid API key: The node throws an error if the API key is missing or invalid. Ensure the ScrapeOps API key credential is correctly set up.
Unsupported API type: Selecting an unsupported API type will cause an error; verify the API selection.
Parameter validation: Required parameters like URLs, ASINs, or HTML content must be provided according to the selected API and operation.
Request failures due to anti-bot protections: Use appropriate bypass options and proxy types to improve success rates.
Timeouts or incomplete responses: Adjust wait times, enable JS rendering, or use headless browser scenarios if needed.
Error handling: When "Continue On Fail" is enabled, errors are returned as JSON with suggestions to check credentials and parameters.