HeadlessX icon

HeadlessX

Interact with HeadlessX API for web scraping, screenshots, and PDF generation

Overview

The node "HeadlessX" provides advanced web page rendering and content extraction capabilities by interacting with the HeadlessX API. It supports operations such as fetching HTML content, posting data to web pages, capturing screenshots, generating PDFs, and performing advanced rendering tasks.

The 🎭 Advanced Render operation allows users to render a web page at a specified URL with fine-grained control over how the page is loaded, rendered, and captured. This includes options for custom headers, user agent strings, viewport dimensions, waiting strategies, scrolling behavior, and more.

Common scenarios where this node is beneficial include:

  • Extracting dynamically generated content from JavaScript-heavy websites.
  • Capturing full-page or partial screenshots of web pages for monitoring or archival.
  • Generating PDFs of web pages with customized layouts.
  • Automating web scraping workflows that require interaction with complex web pages.
  • Testing website appearance under different device emulations or network conditions.

Practical example:

  • A marketing team wants to capture daily screenshots of their landing page in both desktop and mobile views, hiding promotional popups and disabling animations to get clean images.
  • A data analyst needs to scrape product details from an e-commerce site that loads content dynamically after scrolling.
  • A developer wants to generate PDFs of documentation pages with dark mode enabled for better readability.

Properties

Name Meaning
URL The web page URL to process. Must be a valid URL string.
Advanced Options Collection of additional settings to customize the rendering process:
- Custom Headers HTTP headers to send with the request (e.g., Authorization tokens).
- Extra Wait Time (MS) Additional milliseconds to wait for dynamic content to load after initial page load. Default 5000 ms.
- Return Partial on Timeout Whether to return whatever content was loaded if the page load times out. Default true.
- Scroll to Bottom Whether to scroll through the entire page to trigger lazy loading of content. Default true.
- Timeout (MS) Maximum time to wait for the page to load before timing out. Range 5000 to 300000 ms. Default 60000 ms.
- User Agent Custom user agent string to use for the request. Leave empty for automatic realistic rotation.
- Viewport Browser viewport dimensions: width (320-3840 px, default 1920) and height (240-2160 px, default 1080).
- Wait for Network Idle Whether to wait until network activity settles before proceeding. Default true.
- Wait Until When to consider navigation complete: options are Network Idle (recommended), DOM Content Loaded, or Load Event. Default is Network Idle.
Additional Options Collection of screenshot-specific settings:
- Capture Full Page Capture the entire page or just the visible viewport. Default true.
- Custom Height Custom viewport height in pixels (100-4000), used if device emulation is set to custom.
- Custom Width Custom viewport width in pixels (100-4000), used if device emulation is set to custom.
- Dark Mode Enable dark mode styling for the screenshot. Default false.
- Device Emulation Choose device type for screenshot: Custom, Desktop (1920x1080), Mobile Phone (375x667), Mobile Phone Landscape (667x375), Tablet (768x1024), Tablet Landscape (1024x768). Default Desktop.
- Disable Animations Disable CSS animations and transitions before taking screenshot. Default true.
- Extra Wait Time (MS) Additional wait time before taking the screenshot (0-30000 ms). Default 2000 ms.
- Format Image format for screenshot: JPEG (lossy), PNG (lossless), WebP (modern compression). Default PNG.
- Hide Elements CSS selectors of elements to hide before screenshot (comma-separated).
- Quality Image quality (1-100) for JPEG and WebP formats. Default 80.
- Remove Elements CSS selectors of elements to remove before screenshot (comma-separated).
- Scroll Behavior Scrolling animation style during screenshot: Auto, Instant, Smooth. Default Auto.
- Timeout Request timeout in milliseconds (1000-120000). Default 30000 ms.
- User Agent Custom user agent string for screenshot requests. Leave empty for default.
- Wait for Network Idle Wait for network activity to finish before taking screenshot. Default true.
- Wait for Selector CSS selector to wait for before taking screenshot (e.g., ".content-loaded").
Advanced Options (JSON) JSON object with advanced options passed directly to the API for rendering customization.

Output

The node outputs JSON data containing the results of the rendering operation. Depending on the operation specifics, the output may include:

  • Extracted HTML content or processed page content.
  • Binary data representing screenshots or PDFs encoded appropriately.
  • Metadata about the request and response.

For the 🎭 Advanced Render operation, the output typically includes the rendered page content or image data according to the specified options.

If binary data is output (such as screenshots or PDFs), it represents the captured visual representation of the web page as per the configured format and dimensions.

Dependencies

  • Requires access to the HeadlessX API service.
  • Requires an API authentication token credential configured in n8n.
  • Network connectivity to the target URLs.
  • Proper configuration of request timeouts and user agents to avoid blocking or rate limiting by target sites.

Troubleshooting

  • Timeouts: If the page takes too long to load, increase the "Timeout (MS)" or "Extra Wait Time (MS)" properties. Enabling "Return Partial on Timeout" can help retrieve partial content.
  • Content not loading fully: Ensure "Scroll to Bottom" is enabled to trigger lazy loading. Adjust "Wait Until" and "Wait for Network Idle" settings to better match the page's loading behavior.
  • Incorrect viewport or device emulation: Verify viewport dimensions and device emulation settings to match the desired screen size and orientation.
  • Missing elements in screenshots: Use "Hide Elements" or "Remove Elements" CSS selectors carefully; incorrect selectors might hide needed content.
  • Authentication issues: Provide necessary custom headers (e.g., Authorization) in "Custom Headers" to access protected resources.
  • User agent problems: Some sites block unknown user agents; specify a realistic user agent string if automatic rotation fails.
  • API errors: Check that the API key credential is valid and has sufficient permissions. Review error messages for HTTP status codes and adjust accordingly.

Links and References

Discussion