HeadlessX

Interact with HeadlessX API for web scraping, screenshots, and PDF generation

Actions8

Overview

This node enables batch processing of multiple web pages using the HeadlessX API, supporting operations such as capturing screenshots, extracting clean text content, retrieving raw HTML source, and generating PDF documents. It is particularly useful for automating tasks that involve processing many URLs at once, such as bulk website monitoring, content scraping, visual regression testing, or archiving web pages.

Practical examples include:

Taking screenshots of a list of competitor websites to monitor design changes.
Extracting article text from multiple news sites for sentiment analysis.
Downloading raw HTML from several product pages for offline parsing.
Generating PDFs of invoices or reports from various URLs in one go.

Properties

Name	Meaning
Batch Type	Choose the type of batch processing: - 📸 Screenshot Batch: Capture screenshots of multiple pages simultaneously - 📝 Content Extraction Batch: Extract clean text content - 📄 HTML Extraction Batch: Get raw HTML source - 📋 PDF Generation Batch: Generate PDF documents
URLs (JSON Array)	JSON array of URLs to process in batch (maximum 10 URLs). Example: `["https://example.com", "https://another-site.com"]`
Additional Options	Collection of options affecting screenshot capture and page rendering: - Capture Full Page (boolean): Capture entire page or just visible viewport - Device Emulation (options): Desktop, Mobile, Tablet, Custom with custom width/height - Custom Width/Height (numbers): Viewport size when using custom device emulation - Dark Mode (boolean): Enable dark mode - Disable Animations (boolean): Disable CSS animations/transitions - Extra Wait Time (ms): Additional wait before capture - Format (options): jpeg, png, webp - Quality (number): Image quality for jpeg/webp (1-100) - Hide Elements (string): CSS selectors to hide before capture - Remove Elements (string): CSS selectors to remove before capture - Scroll Behavior (options): auto, instant, smooth scrolling - Timeout (ms): Request timeout - User Agent (string): Custom user agent string - Wait for Network Idle (boolean): Wait for network activity to finish - Wait for Selector (string): CSS selector to wait for before capture
Advanced Batch Options	Collection of advanced settings: - Concurrency Limit (number): Number of URLs processed simultaneously (recommended 1-5) - Error Handling (options): Continue on error, Stop on first error, Retry failed URLs once - Timeout (ms): Max time to wait per URL - Wait Between Requests (ms): Delay between processing each URL
Simplify	Whether to return a simplified version of the response instead of raw data (boolean)

Output

The node outputs an array of items corresponding to each processed URL. Each item contains a json field with the result of the batch operation:

For Screenshot Batch: The output includes image data in the specified format (JPEG, PNG, WebP). If binary data is returned, it represents the captured screenshot image.
For Content Extraction Batch: The output contains clean extracted text content from the web pages.
For HTML Extraction Batch: The output provides the raw HTML source code of the pages.
For PDF Generation Batch: The output includes generated PDF documents as binary data.

If the "Simplify" option is enabled, the output will be a streamlined version focusing on essential data rather than full raw responses.

In case of errors during processing, if "Continue on Error" is selected, the output item for the failed URL will contain an error field describing the issue along with metadata like operation name and timestamp.

Dependencies

Requires an active connection to the HeadlessX API service via an API key credential.
The node depends on the HeadlessX API base URL configured in credentials.
No additional external dependencies are required beyond the API access.
Proper network connectivity to the target URLs is necessary for successful processing.

Troubleshooting

Common Issues:
- Exceeding the maximum number of URLs (max 10) in the batch may cause errors.
- Incorrectly formatted JSON array for URLs can lead to parsing failures.
- Network timeouts if the target URLs are slow or unresponsive.
- Errors due to invalid CSS selectors in "Hide Elements," "Remove Elements," or "Wait for Selector."
- Insufficient concurrency limits may slow down batch processing.
Error Messages:
- "Unknown operation": Indicates an unsupported operation was selected; ensure "Batch Processing" operation is chosen.
- API request errors with HTTP status codes: May indicate authentication issues, rate limiting, or invalid parameters.
- Timeout errors: Increase the timeout values in properties or check network conditions.
Resolutions:
- Validate JSON input for URLs carefully.
- Adjust concurrency and timeout settings based on network and server performance.
- Use "Continue on Error" to allow partial batch completion despite some failures.
- Verify CSS selectors used for hiding/removing elements are correct and exist on target pages.

Links and References

HeadlessX API Documentation (for detailed API capabilities and parameters)
n8n Documentation (general guidance on creating and using nodes)
CSS Selectors Reference (to help with element targeting)