ScrapingDog

Get data from ScrapingDog API

Actions6

Scrape URL Actions
- Get
Google Search Actions
- Search
Bing Search Actions
- Search
LinkedIn Profile Actions
- Search
LinkedIn Job Actions
- Search
Amazon Search Actions
- Search

Overview

This node performs web scraping by sending requests to the ScrapingDog API. It supports scraping a specified URL with options for JavaScript rendering, proxy usage, and output formatting. The node is useful when you want to programmatically extract HTML content or AI-optimized data from web pages without manually handling HTTP requests or parsing.

Common scenarios include:

Extracting raw HTML or markdown-formatted content from any webpage.
Scraping dynamic websites that require JavaScript execution to fully load content.
Using premium or super proxies to avoid IP blocking or geo-restrictions.
Leveraging AI extraction rules to parse and extract structured data from complex pages.

Practical example:

You want to scrape product details from an e-commerce site that loads content dynamically via JavaScript. By enabling JS rendering and setting a wait time, you ensure the page fully loads before scraping.
You need to scrape content from a region-restricted website, so you enable the premium proxy and select the target country to bypass geo-blocks.
You want the scraped HTML converted into markdown format for easier processing downstream.

Properties

Name	Meaning
URL to Scrape	The target URL of the webpage you want to scrape.
Javascript Rendering	Enable JavaScript rendering to allow scraping of dynamically loaded content.
Premium	Use a premium residential proxy instead of the default rotating proxy for better reliability.
Super Proxy	Enable the super proxy feature for enhanced proxy capabilities.
Markdown	Return the scraped HTML content in markdown format instead of raw HTML.
Wait (in ms)	Time in milliseconds to wait for the page to fully load when using JavaScript rendering.
Select Country	Choose the country for geotargeting when using the premium proxy option.
Additional Fields	Collection of optional fields:
- AI Query	Pass a user prompt to get an AI-optimized response from the scraped page.
- AI Extract Rules	Provide AI extraction rules to automatically extract structured data from the page without manual parsing.

Output

The node outputs JSON data with the following structure when performing the "Get" operation on "Scrape URL":

html: The raw HTML content of the scraped page as a string (unless markdown option is enabled).
url: The full request URL sent to the API including query parameters.
status: HTTP status code returned by the API.
contentType: The Content-Type header value from the response, indicating the type of data returned (e.g., text/html).

If the markdown option is enabled, the html field contains the page content converted to markdown format.

In case of errors, the output JSON includes:

error: Boolean true indicating an error occurred.
message: Error message describing the issue.
status: HTTP status code if applicable.
statusText: HTTP status text if applicable.

No binary data output is produced by this node.

Dependencies

Requires an API key credential for the ScrapingDog service.
Makes HTTP GET requests to the ScrapingDog API endpoint (default base URL: https://api.scrapingdog.com/).
Optional proxy features depend on the ScrapingDog service configuration.
No additional environment variables are required beyond the API key credential.

Troubleshooting

HTTP Errors: If the API returns non-200 status codes, the node throws an error with the HTTP status and description. Common causes include invalid API keys, exceeding rate limits, or malformed requests. Verify your API key and request parameters.
Empty or Unexpected Content: If the scraped HTML is empty or incomplete, consider enabling JavaScript rendering and increasing the wait time to allow dynamic content to load.
Proxy Issues: When using premium or super proxies, ensure your subscription supports these features and that the selected country is valid.
AI Extraction Failures: If AI extraction rules or queries do not return expected results, verify the syntax and relevance of the rules or prompts.