Actions8
- Browser Rest Apis Actions
Overview
This node interacts with the Browserless API to perform web scraping using a browser automation approach. It allows users to navigate to a URL, wait for specific elements or events, inject scripts or styles, set HTTP headers, authenticate, and configure browser launch options. This node is beneficial for extracting data from dynamic web pages that require JavaScript execution or complex interactions, such as scraping product details from e-commerce sites, gathering social media content, or automating form submissions.
Use Case Examples
- Scrape product information from an e-commerce site by specifying CSS selectors for product titles and prices.
- Wait for a specific element to appear on a page before extracting its content, useful for pages that load data asynchronously.
- Inject custom JavaScript into a page to manipulate the DOM or extract data not directly accessible via selectors.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the web page to scrape. This is a required input. |
| Elements | A collection of CSS selectors and optional timeouts to specify which elements to scrape from the page. |
| Wait For Timeout | Time in milliseconds to wait before proceeding, useful for waiting for page content to load. |
| Wait For Selector | Specify selectors to wait for with options for visibility, hidden state, and timeout. |
| Goto Options | Options for navigating to the URL, including referer, timeout, and waitUntil events. |
| Wait For Event | Specify an event and timeout to wait for during scraping. |
| Wait For Function | A JavaScript function to evaluate in the browser context, with polling and timeout options. |
| Add Script Tag | Scripts to inject into the page, specified by URL, path, content, type, and id. |
| Add Style Tag | CSS styles to inject into the page, specified by URL, path, or raw content. |
| Set Extra HTTP Headers | Additional HTTP headers to include in requests. |
| Authenticate | Username and password for HTTP authentication. |
| Viewport | Settings for the browser viewport, including width, height, device scale factor, and mobile/landscape/touch options. |
| Emulate Media Type | Media type to emulate in the browser, e.g., screen or print. |
| Timeout | Override the system-level timeout for the request in milliseconds. |
| Html | Raw HTML content to load instead of navigating to a URL. |
| User Agent | User agent string to use for the browser session. |
| Best Attempt | If true, the node attempts to proceed even if awaited events fail or timeout. |
| Enable Cookies | Enable or disable cookie handling. |
| Cookies | Array of cookie objects to set in the browser session. |
| Block Ads | Enable or disable ad-blocking extensions during the session. |
| Set Java Script Enabled | Enable or disable JavaScript execution in the browser. |
| Enable Launch | Whether to launch a new browser instance. |
| Launch | Options for launching the browser, including arguments, viewport, devtools, headless mode, and more. |
| Reject Resource Types | Resource types to block from loading, such as images, scripts, or stylesheets. |
| Reject Request Pattern | Patterns of requests to block during scraping. |
| Request Interceptors | Patterns and corresponding responses to intercept and fulfill requests. |
| Debug Opts | Options to enable debugging features like console logs, cookies, HTML, network, and screenshots. |
| Use Custom Body | Whether to use a custom JSON body for the request instead of the standard parameters. |
| Custom Body | Custom JSON body to send with the request, allowing full control over scraping parameters. |
Output
JSON
data- The scraped data extracted from the web page based on the specified selectors and options.
Dependencies
- Browserless API
Troubleshooting
- Timeout errors if the page takes too long to load or elements do not appear within the specified timeout. Increase timeout values or use bestAttempt option to mitigate.
- Authentication failures if incorrect username or password is provided. Verify credentials.
- Issues with selectors not matching any elements. Ensure CSS selectors are correct and elements exist on the page.
- Problems with blocked resources causing incomplete page loads. Adjust rejectResourceTypes or blockAds settings.
Links
- Browserless Web Scraping Documentation - Official documentation for using Browserless API for web scraping.