DataForSEO icon

DataForSEO

DataForSEO

Overview

The "Parse Page Content" operation of the "On Page" resource in this node allows you to fetch and analyze the content of a specified web page URL. It is designed to retrieve the HTML content and optionally emulate browser behaviors such as running JavaScript, rendering styles and images, handling XMLHttpRequests, and more. This enables deeper inspection of how a page appears and behaves when fully loaded in a browser environment.

Common scenarios where this node is beneficial include:

  • SEO audits where you need to analyze the actual rendered content of a page rather than just the raw HTML.
  • Web scraping tasks that require executing JavaScript on the page to load dynamic content.
  • Testing or monitoring websites for changes in their visible content or structure.
  • Extracting data from pages that rely heavily on client-side rendering.

Practical example:

  • You want to parse the content of a product page that loads prices dynamically via JavaScript. By enabling browser rendering and JavaScript execution, you can capture the final rendered HTML including the price information.

Properties

Name Meaning
Target Page URL The URL of the web page to be parsed. This is the main input specifying which page to fetch and analyze.
Load the scripts available on a page? Boolean option to enable or disable loading and execution of JavaScript on the page. Enabling this allows dynamic content generated by scripts to be included in the output.
Emulate browser rendering? Boolean option to emulate full browser rendering, including styles, images, fonts, animations, videos, and other resources. This simulates how a real browser would display the page.
Enable XMLHttpRequest on a page? Boolean option to allow XMLHttpRequests (XHR) on the page, enabling AJAX calls to complete during page load.
Additional Fields A collection of optional advanced settings:
- Custom User Agent Specify a custom user agent string to use when requesting the page.
- Custom Javascript Inject custom JavaScript code to run on the page after it loads.
- Preset for browser screen parameters Choose a preset for browser screen size and characteristics: Empty, Desktop, Mobile, Tablet.
- Browser Screen Width Set a custom width for the emulated browser screen in pixels.
- Browser Screen Height Set a custom height for the emulated browser screen in pixels.
- Browser Screen Scale Factor Set a scale factor for the emulated browser screen (e.g., for high DPI displays).
- Store HTML of a crawled page? Boolean option to store the raw HTML content of the crawled page in the output.
- Disable the popup requesting cookie consent from the user? Boolean option to disable cookie consent popups that might block content.
- Accept Language Set the HTTP Accept-Language header to specify preferred language/locale for the request.
- Switch proxy pool? Boolean option to switch between different proxy pools for the request.
- Proxy Pool Select a proxy pool region to route the request through (Empty, US, DE).

Output

The node outputs an array of JSON objects representing the parsed page content and related metadata. The exact structure depends on the API response but typically includes:

  • Parsed content elements extracted from the page.
  • Metadata about the page load, such as status codes, timing, and any errors encountered.
  • Optionally, the raw HTML of the page if the "Store HTML of a crawled page?" option is enabled.
  • Information about resources loaded during browser emulation if applicable.

If binary data is returned (not explicitly shown in the provided code), it would generally represent files or media fetched during page parsing, but this node primarily focuses on JSON content.

Dependencies

  • Requires an active connection to the DataForSEO API service.
  • Needs an API key credential configured in n8n for authentication with DataForSEO.
  • Internet access to fetch the target web pages.
  • Optional proxy configuration if using proxy pools.

Troubleshooting

  • Common issues:

    • Invalid or unreachable URL: Ensure the "Target Page URL" is correct and accessible.
    • API authentication errors: Verify that the API key credential is correctly set up and has necessary permissions.
    • Timeout or slow responses: Enabling full browser rendering and JavaScript execution can increase processing time; adjust timeout settings accordingly.
    • Proxy errors: If using proxy pools, ensure proxies are operational and correctly selected.
  • Error messages:

    • "Something went wrong": Generic error indicating failure in the operation; check network connectivity, API credentials, and input parameters.
    • NodeOperationError with specific messages may indicate invalid inputs or API response errors; review the node's input parameters and try again.

Links and References

Discussion