Actions4
- Deep SERPAPI Actions
- Universal Scraping API Actions
- Crawler Actions
Overview
The "Universal Scraping API" node with the "Web Unlocker" operation is designed to bypass common web scraping protections and restrictions on target websites. It enables users to retrieve the fully rendered HTML content of a webpage, including JavaScript-rendered elements, by simulating a real browser environment. This is particularly useful for scraping data from sites that use anti-bot measures such as CAPTCHAs, IP blocking, or require JavaScript execution to load content.
Common scenarios where this node is beneficial include:
- Extracting data from websites that heavily rely on client-side rendering.
- Accessing content behind geo-restrictions by specifying a country proxy.
- Avoiding detection by blocking unnecessary resource types like images or fonts.
- Automating data collection from sites protected by anti-scraping technologies.
Practical example:
- A user wants to scrape product details from an e-commerce site that loads prices dynamically via JavaScript and blocks requests from non-browser clients. Using this node with JS rendering enabled and headless browsing, the user can obtain the complete page content as seen in a real browser.
Properties
| Name | Meaning |
|---|---|
| Target URL | The URL of the webpage to unlock and scrape. Must be a valid HTTP/HTTPS address. |
| Js Render | Whether to enable JavaScript rendering on the page. When true, the node will execute JavaScript to render dynamic content before returning the result. |
| Headless | Whether to run the browser in headless mode (without a visible UI). Typically set to true for automated scraping tasks. |
| Country | The geographic location from which the request should appear to originate. Useful for bypassing geo-blocks or accessing region-specific content. Options include many countries worldwide and "World Wide" (ANY) for no restriction. |
| Js Instructions | JSON array of instructions for controlling the JavaScript rendering process, such as waiting times or custom scripts to execute during page load. Default is to wait 100 milliseconds. |
| Block | JSON object specifying resources and URLs to block during page loading to speed up scraping and reduce bandwidth. For example, blocking images, fonts, and scripts from specified URLs. |
Output
The node outputs a JSON object containing the scraped webpage data after unlocking it. The exact structure depends on the response from the Universal Scraping API but typically includes:
- The full HTML content of the unlocked page.
- Metadata about the request or response.
- Any extracted data if applicable.
If binary data is returned (e.g., screenshots or files), it would be included in the binary output field, representing the raw data fetched from the target URL.
Dependencies
- Requires an API key credential for the Scrapeless service to authenticate requests.
- Depends on the Universal Scraping API service provided by Scrapeless.
- Network access to the target URLs and possibly proxy servers depending on the selected country option.
- n8n environment must have internet connectivity and proper configuration to use external APIs.
Troubleshooting
Common issues:
- Invalid or missing API credentials will cause authentication failures.
- Incorrect or malformed Target URL may lead to request errors.
- Selecting a country with restricted access or unavailable proxies might result in blocked requests.
- Improperly formatted JSON in Js Instructions or Block properties can cause parsing errors.
Error messages:
"Unsupported resource": Occurs if the Resource parameter is not set to "universalScrapingApi".- API errors related to rate limits or invalid parameters will be returned from the Scrapeless service; check your API usage and input values.
- Timeout or network errors may happen if the target website is unreachable or slow to respond.
Resolutions:
- Verify API key validity and permissions.
- Ensure URLs are correct and accessible.
- Adjust Js Instructions to allow sufficient time for page rendering.
- Use the Block property to disable loading heavy resources that may slow down or block scraping.