Overview
This node integrates with the Handinger API to scrape data from the internet based on user-defined actions. It supports fetching website content, metadata, screenshots, or processing content with a language model (LLM). This node is useful for automating data extraction from websites, generating structured data from web content, or capturing visual snapshots of web pages. For example, it can fetch and clean website content in Markdown or HTML, extract metadata for analysis, take screenshots with customizable viewport settings, or use an LLM to process fetched content according to a prompt and JSON schema.
Use Case Examples
- Extracting and cleaning article content from a news website in Markdown format.
- Taking a screenshot of a product page with specific viewport dimensions for visual documentation.
- Fetching metadata from a blog for SEO analysis.
- Using the LLM action to process website content with a custom prompt and JSON schema for structured data extraction.
Properties
| Name | Meaning |
|---|---|
| Action | Specifies the type of data to fetch or process from the website. Options include LLM processing, content fetching, metadata fetching, or taking a screenshot. |
| Website URL | The URL of the website to fetch content from. |
| Fresh | Boolean flag to indicate whether to fetch fresh data (bypass cache). |
| Prompt | The prompt to use for the LLM processing (required when action is LLM). |
| JSON Schema | The JSON schema to use for the LLM processing (required when action is LLM). |
| Content Type | The format to return the content in, either Markdown or HTML (used when action is Content). |
| Link Style | How links are presented in the fetched content (used when action is Content). |
| Clean Content | Boolean flag to clean the content before returning it (used when action is Content). |
| Inline Images | Boolean flag to include inline images in the content (used when action is Content). |
| Advanced Scraping | Boolean flag to enable advanced scraping techniques (used when action is Content or Metadata). |
| Image Type | The image format for the screenshot, either PNG or JPEG (used when action is Screenshot). |
| Viewport Width | The width of the viewport for the screenshot (used when action is Screenshot). |
| Viewport Height | The height of the viewport for the screenshot (used when action is Screenshot). |
| Timeout | The timeout duration in milliseconds for the screenshot request (used when action is Screenshot). |
| Delay | The delay in milliseconds before taking the screenshot (used when action is Screenshot). |
Output
JSON
success- Indicates whether the API request was successful.response- The response data from the Handinger API, which varies based on the selected action.error- Error message if the request failed (present only if success is false).
Dependencies
- Handinger API with authentication credentials
Troubleshooting
- Common issues include invalid or missing API credentials, resulting in authentication errors.
- Incorrect or malformed URLs can cause request failures.
- Timeouts or delays in screenshot capture may require adjusting the timeout and delay parameters.
- Errors in LLM prompt or JSON schema formatting can cause processing failures.
- If the node is set to continue on fail, errors will be returned in the output JSON; otherwise, the node execution will stop on error.
Links
- Handinger API Documentation - Official documentation for the Handinger API used by this node.