Scrapeless Official icon

Scrapeless Official

Official Scrapeless nodes for n8n

Actions4

Overview

The node "Scrapeless Official" provides web scraping and crawling capabilities through multiple resources, including a "Crawler" resource. When using the Crawler resource with the Scrape operation, the node performs web scraping on a specified URL. This is useful for extracting data from web pages where you want to gather structured information such as product details, articles, or any publicly available content.

Typical use cases include:

  • Extracting product prices and descriptions from e-commerce sites.
  • Gathering news headlines or article summaries from media websites.
  • Collecting data from public directories or listings.

For example, you can input a URL like https://example.com and the node will scrape the page content according to the configured scraping logic (defined in the underlying SDK or API).

Properties

Name Meaning
URL to Crawl The web address (URL) of the page to be scraped. Supports single URLs; batch crawling requires using the SDK as per documentation.

Output

The output of the node is an array of items, each containing a json field with the scraped data extracted from the target URL. The exact structure of the JSON depends on the scraping rules applied by the underlying service but generally includes key-value pairs representing the extracted content.

If the node supports binary data (not explicitly shown here), it would typically represent downloaded files or images related to the scraped content.

Dependencies

  • Requires an API key credential for the Scrapeless service (referred generically as an API authentication token).
  • The node internally calls Scrapeless APIs to perform scraping and crawling operations.
  • Proper network access to the target URLs is necessary.
  • For batch crawling scenarios, users should refer to the Scrapeless SDK documentation as indicated in the property hint.

Troubleshooting

  • Common issues:

    • Invalid or missing URL: Ensure the "URL to Crawl" is a valid and reachable web address.
    • API authentication errors: Verify that the API key credential is correctly configured and has sufficient permissions.
    • Network connectivity problems: Confirm that the n8n instance can access external websites and the Scrapeless API endpoints.
    • Rate limiting or blocking by target websites: Some sites may block automated scraping; consider adding delays or using proxy services if supported.
  • Error messages:

    • "Unsupported resource": Occurs if an invalid resource name is provided; ensure "crawler" is selected as the resource.
    • Errors returned from the Scrapeless API will be passed through; check the error message for details and consult Scrapeless documentation.

Links and References

Discussion