ScrapingDog icon

ScrapingDog

Get data from ScrapingDog API

Actions6

Overview

This node integrates with the ScrapingDog API to scrape web pages or perform related search operations. Specifically, for the Scrape URL resource and its default operation, it fetches the HTML content of a specified URL, optionally rendering JavaScript on the page, using proxies, or applying AI-powered extraction rules.

Common scenarios where this node is beneficial include:

  • Extracting raw HTML content from websites for further processing.
  • Scraping dynamic web pages that require JavaScript execution to fully load content.
  • Using residential or premium proxies to avoid IP blocking or geo-restrictions.
  • Converting scraped HTML into markdown format.
  • Leveraging AI to extract structured data from complex pages without manual parsing.

Practical example:

  • You want to scrape product details from an e-commerce site that loads content dynamically via JavaScript. By enabling JavaScript rendering and setting a wait time, you ensure the full page content is captured.
  • You need to scrape a website restricted to certain countries; by selecting a country and enabling premium proxy, you can access geo-targeted content.
  • You want to get a clean summary of a webpage's content using AI extraction rules instead of manually parsing HTML.

Properties

Name Meaning
URL to Scrape The target URL you want to scrape.
Javascript Rendering Whether to enable JavaScript rendering on the page (useful for dynamic content).
Premium Use a premium residential proxy instead of the normal rotating proxy.
Super Proxy Enable super proxy mode for enhanced proxy routing.
Markdown Return the scraped HTML content converted into markdown format.
Wait (in ms) Time in milliseconds to wait after page load when JavaScript rendering is enabled, allowing the page to fully load dynamic content.
Select Country Choose a country code to access geo-targeted content when using the premium proxy option. Available options: Australia, Brazil, Canada, China, France, Germany, India, Mexico, Italy, Japan, Russia, United States, United Kingdom.
Additional Fields Collection of optional fields:
- AI Query: A user prompt to get an AI-optimized response.
- AI Extract Rules: Rules to extract data from pages using AI, avoiding manual HTML parsing.

Output

The node outputs JSON data with the following structure for the Scrape URL resource:

  • html: The raw HTML content of the scraped page as a string.
  • url: The full API request URL used for scraping.
  • status: HTTP status code returned by the scraping API.
  • contentType: The Content-Type header value from the response, indicating the type of data returned (usually text/html or similar).

If an error occurs during the request, the output JSON will contain:

  • error: Boolean true indicating an error.
  • message: Error message describing what went wrong.
  • status: HTTP status code if applicable.
  • statusText: Description of the HTTP error if applicable.

The node does not output binary data.

Dependencies

  • Requires an API key credential for the ScrapingDog service.
  • The node makes HTTP GET requests to the ScrapingDog API endpoint (https://api.scrapingdog.com/scrape by default).
  • No additional environment variables are required beyond the API key credential.
  • Network connectivity to the ScrapingDog API is necessary.

Troubleshooting

  • Common issues:

    • Invalid or missing API key: The node will return an HTTP 401 Unauthorized error.
    • Incorrect URL format: The node expects a valid URL string; malformed URLs may cause errors.
    • Rate limiting or quota exceeded on the ScrapingDog API side may result in HTTP 429 errors.
    • Enabling JavaScript rendering without sufficient wait time might lead to incomplete page content.
    • Selecting premium proxy without having access rights may cause errors.
  • Error messages:

    • HTTP error! status: XXX <StatusText>: Indicates an HTTP error from the API. Check the status code and message for details.
    • Unknown error occurred: Generic catch-all error; check network connectivity and API key validity.
  • Resolutions:

    • Verify your API key is correctly configured in n8n credentials.
    • Ensure the URL is properly formatted and accessible.
    • Increase the "Wait (in ms)" parameter when scraping dynamic pages.
    • Confirm your subscription plan supports premium or super proxy features before enabling them.
    • Review API usage limits and upgrade if necessary.

Links and References

Discussion