Actions6
- Job Actions
- Tool Actions
Overview
This node integrates with the Workfloows API to scrape content from web pages. It allows users to extract data such as HTML, Markdown, and links from a specified URL. The node is useful for automating data extraction tasks from websites without manual copying, enabling workflows like content aggregation, link analysis, or preparing data for further processing.
Practical examples include:
- Scraping product descriptions or reviews from e-commerce sites.
- Extracting all hyperlinks from a news article for research purposes.
- Converting webpage content into Markdown format for documentation or publishing.
Properties
| Name | Meaning |
|---|---|
| URL | Full URL of the web page to scrape (including protocol). Example: https://workfloows.com |
| Simplify | Whether to return a simplified version of the response instead of the raw data (true or false) |
| Options | Collection of additional options: |
- Use Proxy: Whether to use a proxy to scrape the web page (true or false) |
|
- Return HTML: Whether to return the scraped content as HTML (true or false) |
|
- Return Markdown: Whether to return the scraped content as Markdown (true or false) |
|
- Return Links: Whether to return the links found in the scraped page (true or false) |
Output
The node outputs JSON data containing the results of the web scraping operation. Depending on the "Simplify" property:
- If Simplify is enabled (
true), the output contains only the relevant extracted data fields (e.g., text content, links, HTML, or Markdown) under aresultskey. - If Simplify is disabled (
false), the output includes the full raw response from the API, which may contain additional metadata or nested structures.
The output structure varies based on the selected options (HTML, Markdown, links), but generally includes the scraped content in the requested formats.
The node does not output binary data.
Dependencies
- Requires an API key credential for authenticating with the Workfloows API.
- The node makes HTTP requests to the Workfloows API endpoints.
- No additional environment variables are required beyond the API key credential configuration.
Troubleshooting
Common issues:
- Invalid or missing API key will cause authentication failures.
- Providing an invalid or unreachable URL may result in errors or empty responses.
- Using proxy option incorrectly might lead to connection timeouts or failures.
- Requesting multiple output formats simultaneously may increase response size and processing time.
Error messages:
- Authentication errors typically indicate problems with the API key; verify the key is correct and has necessary permissions.
- Network errors suggest connectivity issues; check internet access and proxy settings if enabled.
- API rate limits may cause request rejections; consider monitoring usage or upgrading the plan.
Links and References
- Workfloows API Documentation (for detailed API capabilities and parameters)
- n8n Documentation (for general guidance on using custom nodes and credentials)