Actions8
Overview
The node "HeadlessX" integrates with the HeadlessX API to perform various web automation tasks such as web scraping, content extraction, screenshot capture, PDF generation, and rendering. Specifically, the 📝 Extract Content (POST) operation allows users to send a POST request to a specified URL and extract the resulting content. This is useful for interacting with web pages that require form submissions or other POST-based interactions to retrieve dynamic content.
Common scenarios include:
- Submitting search forms or filters on websites and extracting the returned data.
- Interacting with APIs or endpoints that require POST requests to fetch HTML or JSON content.
- Automating workflows where content needs to be extracted after sending specific data to a web service.
Practical example:
- A user wants to submit a login form or a search query via POST and then scrape the resulting page content for further processing or analysis.
Properties
| Name | Meaning |
|---|---|
| URL | The URL of the web page to which the POST request will be sent. Must be a valid URL. |
| Advanced Options (JSON) | A JSON object containing additional advanced options to customize the POST request and page processing behavior. |
Advanced Options (for POST operations)
| Name | Meaning |
|---|---|
| Custom Headers | Additional HTTP headers to include in the POST request, e.g., Authorization tokens or custom cookies. |
| Extra Wait Time (MS) | Additional time in milliseconds to wait after page load to allow dynamic content to render fully. |
| Return Partial on Timeout | Whether to return partial content if the page load times out instead of failing completely. |
| Scroll to Bottom | Whether to scroll through the page to trigger lazy loading of content before extraction. |
| Timeout (MS) | Maximum time in milliseconds to wait for the page to load before timing out. |
| User Agent | Custom user agent string to use for the request; leave empty to enable automatic realistic rotation. |
| Viewport | Browser viewport dimensions settings: width and height in pixels to emulate different screen sizes. |
| Wait for Network Idle | Whether to wait until network activity settles before proceeding with content extraction. |
| Wait Until | Defines when navigation is considered complete: options include waiting for network idle, DOM content loaded, or load event fired. |
Output
The node outputs an array of items, each containing a json field with the extracted content from the POST request to the specified URL. The exact structure of the JSON depends on the response from the HeadlessX API and typically includes the HTML or processed content retrieved from the target page.
If the operation involves binary data (not typical for content extraction), it would be included accordingly, but for this operation, the focus is on textual content extraction.
Dependencies
- Requires an active connection to the HeadlessX API, authenticated via an API key credential.
- The node expects the HeadlessX API base URL and authentication credentials to be configured in n8n.
- No other external dependencies are required beyond the HeadlessX API service.
Troubleshooting
- Timeouts: If the page takes too long to load, increase the "Timeout (MS)" or "Extra Wait Time (MS)" properties to allow more time for dynamic content to render.
- Partial Content Returned: If the node returns partial content due to timeout, consider disabling "Return Partial on Timeout" to force failure and investigate issues.
- Invalid URL: Ensure the URL provided is valid and accessible from the environment where n8n runs.
- Authentication Issues: If using custom headers for authorization, verify that the headers are correctly set and the token is valid.
- Dynamic Content Not Loaded: Enable "Scroll to Bottom" and "Wait for Network Idle" to ensure lazy-loaded content is fully rendered before extraction.
- User Agent Problems: If the target site blocks certain user agents, try setting a custom user agent string.
Common error messages relate to unknown operations (should not occur if configured properly), network errors, or API authentication failures. Review the error details and adjust configuration accordingly.