HeadlessX icon

HeadlessX

Interact with HeadlessX API for web scraping, screenshots, and PDF generation

Overview

The node "HeadlessX" provides a versatile interface to interact with the HeadlessX API, enabling web scraping, content extraction, screenshot capture, PDF generation, and rendering of web pages. The specific operation 📝 Extract Content (GET) fetches and extracts content from a specified web page URL.

This node is beneficial in scenarios such as:

  • Automatically retrieving and processing web page content for data analysis or monitoring.
  • Extracting textual or structured content from websites without manual browsing.
  • Integrating web content extraction into automated workflows for reporting or data pipelines.

For example, you can use this node to extract the main content of a news article by providing its URL, then process or store that content downstream in your workflow.

Properties

Name Meaning
URL The web page URL to process. Must be a valid URL string. Example: https://example.com
Additional Options Collection of optional parameters to customize the content extraction request:
- Timeout (MS) Override the default request timeout in milliseconds. A value of 0 uses the server's default timeout.
- Wait Until Defines when navigation is considered successful before extracting content. Options: Load, DOMContentLoaded, Network Idle. Default is Load.

Note: The above properties are specifically for the 📝 Extract Content (GET) operation under the Default resource.

Output

The node outputs JSON data representing the extracted content from the specified URL. The exact structure depends on the HeadlessX API response but generally includes the HTML or processed content retrieved from the web page.

If errors occur during execution, the output JSON will contain an error field with the error message, the operation name, and a timestamp.

No binary data output is indicated for this operation.

Dependencies

  • Requires an active connection to the HeadlessX API service.
  • Needs an API key credential configured in n8n for authentication with the HeadlessX API.
  • Network access to the target URLs to perform HTTP requests.
  • No additional environment variables are explicitly required beyond the API credentials.

Troubleshooting

  • Common Issues:

    • Invalid or unreachable URL: Ensure the URL is correct and accessible from the n8n environment.
    • Request timeouts: Adjust the "Timeout (MS)" property if the target site is slow to respond.
    • Authentication errors: Verify that the API key credential is correctly set up and has necessary permissions.
    • Network restrictions: Confirm that firewall or proxy settings allow outbound requests to the HeadlessX API and target URLs.
  • Error Messages:

    • "Unknown operation: ...": Indicates an unsupported operation was selected; verify the operation name.
    • API-related errors include HTTP status codes and messages returned by the HeadlessX API, which may indicate issues like rate limiting, invalid parameters, or server errors.
    • Timeout errors suggest increasing the timeout setting or checking network connectivity.

In case of failure, if the node is configured to continue on fail, it returns error details in the output JSON for each failed item.

Links and References


This summary is based solely on static analysis of the provided source code and property definitions.

Discussion