HeadlessX icon

HeadlessX

Interact with HeadlessX API for web scraping, screenshots, and PDF generation

Overview

This node integrates with the HeadlessX API to perform web page interactions such as extracting HTML content via POST requests, capturing screenshots, generating PDFs, and rendering content. The "📄 Extract HTML (POST)" operation specifically sends a POST request to a given URL and retrieves the resulting HTML content. This is useful for scraping dynamic web pages that require POST data submission or interaction before the HTML can be obtained.

Common scenarios include:

  • Scraping search results or filtered content from websites that use POST requests.
  • Extracting HTML after submitting forms or interacting with APIs that respond with HTML.
  • Automating data extraction workflows where direct GET requests are insufficient.

Practical example:

  • You want to scrape product listings from an e-commerce site that requires sending POST parameters to filter products by category or price range. Using this node's HTML POST operation, you provide the URL and any advanced options to retrieve the updated HTML content for further processing.

Properties

Name Meaning
URL The web page URL to which the POST request will be sent. Must be a valid URL.
Advanced Options Additional settings to customize the HTTP request and page loading behavior:
- Custom Headers: Add extra HTTP headers (e.g., Authorization tokens).
- Extra Wait Time (MS): Extra delay for dynamic content.
- Return Partial on Timeout: Return partial content if timeout occurs.
- Scroll to Bottom: Scroll through the page to trigger lazy loading.
- Timeout (MS): Max time to wait for page load.
- User Agent: Custom user agent string or leave empty for automatic rotation.
- Viewport: Set browser viewport width and height.
- Wait for Network Idle: Wait until network activity settles.
- Wait Until: When to consider navigation complete (network idle, DOM loaded, or load event).
Advanced Options (JSON) JSON string to pass additional advanced options directly to the API for fine control over the request and rendering.

Note: The "Advanced Options" collection and the "Advanced Options (JSON)" string are mutually exclusive ways to specify detailed request parameters.

Output

The node outputs an array of items, each containing a json field with the extracted HTML content from the POST request. The structure typically includes:

{
  "html": "<!DOCTYPE html><html>...</html>"
}
  • The html field contains the full HTML markup retrieved from the target URL after the POST request and any dynamic content loading.
  • If configured, partial content may be returned in case of timeouts.
  • No binary data output is produced by this operation.

Dependencies

  • Requires an active connection to the HeadlessX API service.
  • Needs an API authentication token credential configured in n8n to authorize requests.
  • The node uses Puppeteer-like headless browser capabilities via the API to render and extract HTML content.
  • Network access to the target URLs must be allowed from the environment running n8n.

Troubleshooting

  • Timeouts: If the page takes too long to load, increase the "Timeout (MS)" or "Extra Wait Time (MS)" properties to allow more time for dynamic content.
  • Partial Content Returned: If "Return Partial on Timeout" is enabled, partial HTML may be returned when the timeout is reached. Disable it to force errors on incomplete loads.
  • Invalid URL: Ensure the URL property is a valid and reachable URL; otherwise, the node will throw an error.
  • Authentication Errors: Verify that the API key credential is correctly set up and has permissions to access the HeadlessX API.
  • Dynamic Content Not Loaded: Enable "Scroll to Bottom" and "Wait for Network Idle" to ensure lazy-loaded content is fully rendered before extraction.
  • Custom Headers Misconfiguration: Incorrect header names or values can cause request failures; double-check header entries.

Common error messages:

  • "Unknown operation": Indicates an unsupported operation was selected; verify the operation name.
  • API errors with status codes (e.g., 401 Unauthorized, 404 Not Found): Check credentials and URL correctness.
  • Network errors: Confirm network connectivity and firewall rules.

Links and References

Discussion