ScrapingDog icon

ScrapingDog

Get data from ScrapingDog API

Actions6

Overview

The "Scrape URL" - "Get" operation of this node allows users to scrape web pages using the ScrapingDog API. It supports fetching HTML content from any given URL, optionally rendering JavaScript on the page, and extracting data in various formats including raw HTML or markdown. This node is useful for scenarios such as:

  • Extracting webpage content for data analysis or monitoring.
  • Collecting product details, news articles, or search results without official APIs.
  • Scraping images URLs from a page.
  • Using AI-powered extraction rules to parse complex pages without manual HTML parsing.
  • Accessing geo-targeted content by specifying country proxies.

Practical examples include scraping e-commerce product pages, gathering news headlines, or collecting SEO data from search engine result pages.

Properties

Name Meaning
URL to Scrape The target URL you want to scrape.
Javascript Rendering Whether to enable JavaScript rendering on the page (useful for dynamic websites that load content via JS).
Premium Use premium residential proxy instead of normal rotating proxy for better reliability and IP diversity.
Super Proxy Enable super proxy mode for enhanced proxy routing.
Markdown Return the scraped HTML content converted into markdown format.
Wait (in Ms) Time in milliseconds to wait before scraping when JavaScript rendering is enabled, allowing the page to fully load dynamic content.
Select Country Choose a country code to access geo-targeted content via proxy (available only with premium proxy enabled). Options: Australia, Brazil, Canada, China, France, Germany, India, Mexico, Italy, Japan, Russia, US, UK.
Additional Fields Collection of optional fields:
• AI Query: User prompt to get AI-optimized response.
• AI Extract Rules: Rules to extract data from pages using AI without manual HTML parsing.
Custom Headers Allow passing custom HTTP headers with the request.
Session Number String value to reuse the same proxy session across multiple requests for consistent IP usage.
Scrape Images Option to scrape image URLs from the page.

Output

The node outputs the full HTTP response from the ScrapingDog API, which includes:

  • The scraped webpage content in the requested format (HTML or markdown).
  • If enabled, extracted image URLs.
  • If AI extraction rules are used, the parsed data according to those rules.
  • Metadata about the request and response.

The output JSON field contains the raw or processed content depending on the options selected. Binary data is not explicitly handled by this node.

Dependencies

  • Requires an API key credential for the ScrapingDog service.
  • Internet access to reach the ScrapingDog API endpoint at https://api.scrapingdog.com/.
  • Optional proxy configurations are managed internally by the API based on parameters like premium, super proxy, and country selection.

Troubleshooting

  • Common Issues:

    • Invalid or missing API key will cause authentication errors.
    • Incorrect URL format may lead to request failures.
    • Enabling JavaScript rendering without sufficient wait time might result in incomplete page content.
    • Using geo-targeting without premium proxy enabled will ignore the country parameter.
    • Passing invalid AI extraction rules can cause parsing errors.
  • Error Messages:

    • Authentication errors indicate issues with the provided API key; verify and update credentials.
    • Timeout or network errors suggest connectivity problems or blocked access to the API.
    • Response errors related to invalid parameters require checking property values, especially boolean flags and country codes.

Links and References

Discussion