Hyperbrowser icon

Hyperbrowser

Interact with websites using Hyperbrowser

Overview

The node "Hyperbrowser" enables interaction with websites through various operations such as scraping, crawling, and extracting data using AI. It supports advanced browser automation tasks controlled by different agents, including AI-driven agents for browsing and computer control.

For the Extract operation specifically, the node extracts structured data from a given webpage URL based on an extraction query and an optional schema. This is useful when you want to programmatically retrieve specific information from web pages, such as product prices, reviews, or any other targeted content, without manually parsing HTML.

Practical examples:

  • Extracting all product prices from an e-commerce page.
  • Retrieving contact information from a company website.
  • Pulling structured data like event dates or article metadata from news sites.

Properties

Name Meaning
URL The webpage URL to extract data from.
Extraction Query A natural language description of what data to extract (e.g., "Extract all product prices").
Extraction Schema JSON schema defining the structure of the data to extract, allowing precise data shaping.
Options Collection of additional settings:
- Use Proxy Whether to route requests through a proxy server.
- Proxy Country The country code for the proxy server if used.
- Solve CAPTCHAs Whether to attempt solving CAPTCHAs encountered during scraping.
- Timeout (Ms) Maximum time in milliseconds to wait for page navigation before timing out.

Note: The properties listed are filtered for the "extract" operation.

Output

The output JSON object for the Extract operation contains:

  • url: The original URL processed.
  • extractedData: The data extracted from the webpage, structured according to the provided extraction schema or inferred from the extraction query.
  • status: Status information about the extraction process (e.g., success or failure status).

This output provides the user with the requested structured data directly usable in subsequent workflow steps.

Dependencies

  • Requires an API key credential for the Hyperbrowser service.
  • Relies on the external Hyperbrowser SDK to perform web interactions and data extraction.
  • Network access to the target URLs and optionally proxy configuration if enabled.

Troubleshooting

  • Common issues:

    • Invalid or unreachable URL may cause failures.
    • Incorrect or malformed extraction schema JSON can lead to errors or empty results.
    • Network restrictions or proxy misconfiguration might block access.
    • CAPTCHAs not solved if the option is disabled, potentially halting extraction.
  • Error messages:

    • "Operation "extract" is not supported": Indicates an unsupported operation was selected; ensure "extract" is chosen.
    • Timeout errors if the page takes too long to load; consider increasing the timeout setting.
    • JSON parse errors for the extraction schema; verify the schema is valid JSON.

To resolve these, verify input parameters, ensure network connectivity, and adjust options like timeout and CAPTCHA solving accordingly.

Links and References

Discussion