CloudBrowser icon

CloudBrowser

Interact with websites using a cloud-based browser instance

Overview

The node "CloudBrowser" enables interaction with websites through a cloud-based browser instance. Specifically, the Content - Get PDF From Website operation navigates to a specified URL and generates a PDF snapshot of the webpage. This is useful for automating the capture of web pages as PDFs without needing local browser installations.

Common scenarios include:

  • Archiving web pages in PDF format for record-keeping or offline access.
  • Generating printable versions of dynamic web content.
  • Automating report generation from web dashboards or analytics pages.

Example: Automatically navigate to a news article URL and generate a PDF version for distribution or storage.

Properties

Name Meaning
URL to Navigate The URL of the webpage to open and convert into a PDF.
Navigation Options Options controlling page navigation behavior:
- Wait Until: When to consider navigation finished (Load, Domcontentloaded, Networkidle0, Networkidle2).
- Timeout (Ms): Max wait time for navigation.
Browser Configuration Settings for the browser instance:
- Browser Type: Chrome, Chromium, or ChromeHeadlessShell.
- Headless Mode: Run browser without UI.
- Stealth Mode: Enable stealth to avoid detection.
- Keep Open (Seconds): Time before auto-closing browser (0 = never).
- Label: Instance name.
- Save Session: Save session for reuse.
- Recover Session: Recover saved session.
Custom Arguments Additional command-line arguments passed to the browser on startup.
Ignored Default Arguments Default browser arguments to ignore when launching.
Proxy Configuration Proxy server settings:
- Host, Port, Username, Password.
PDF Options PDF generation options:
- Format: Paper size (A0, A1, A2, A3, A4, A5, A6, Legal, Letter, Tabloid).
- Landscape: Generate PDF in landscape orientation.
- Print Background: Include background graphics.
- Scale: Rendering scale (0.1 to 2).
- Margin: Margins in millimeters (Top, Right, Bottom, Left).
- Page Ranges: Specific pages to print (e.g., "1-5, 8, 11-13").

Output

The node outputs JSON data containing:

  • url: The final URL of the loaded webpage.
  • title: The page title.
  • pdf: A base64-encoded string representing the generated PDF file, prefixed with the appropriate data URI (data:application/pdf;base64,...).
  • pdfBinary: The raw binary buffer of the PDF file.
  • filename: Suggested filename for the PDF, e.g., webpage_<timestamp>.pdf.
  • fileExtension: Always "pdf".
  • mimeType: Always "application/pdf".

This output allows downstream nodes to save the PDF file, send it via email, or upload it to storage services.

Dependencies

  • Requires an active internet connection to reach the target URL.
  • Uses a cloud browser service accessible via API at https://production.cloudbrowser.ai/api/v1/Browser/Open.
  • Requires an API token credential for authentication with the cloud browser service.
  • Puppeteer library is used internally to control the browser session.
  • No local browser installation needed; all browsing happens remotely.

Troubleshooting

  • No WebSocket address received from the browser service: Indicates failure to open a browser instance. Check API token validity and service availability.
  • Navigation timeout: If the page takes too long to load, increase the Timeout property under Navigation Options.
  • Invalid URL or unreachable site: Ensure the URL is correct and accessible from the cloud browser environment.
  • PDF generation errors: Verify PDF options are valid; unsupported margin values or page ranges may cause failures.
  • Session recovery issues: If using saved sessions, ensure session data exists and is not corrupted.
  • Proxy configuration problems: Incorrect proxy details can prevent navigation; verify host, port, and credentials.

Links and References

Discussion