Scrapfly

Scrapfly data collection APIs for web page scraping, screenshots, and AI data extraction

Actions4

Scrape Actions
- Scrape Web Page
- Scrape API Request
Extraction Actions
- Extract Data From an HTML, Text, or Markdown Document Using AI
Screenshot Actions
- Capture Web Page Screenshot

Overview

This node integrates with Scrapfly, a service providing data collection APIs for web scraping, screenshots, and AI-powered data extraction. Specifically, the Scrape API Request operation allows users to perform HTTP requests to scrape web pages or APIs with advanced features like proxy rotation, session management, and anti-scraping protection.

Common scenarios include:

Extracting data from websites that require custom headers or specific HTTP methods.
Bypassing anti-bot protections using built-in anti-scraping features.
Using proxy pools to avoid IP bans or geo-restrict content.
Managing sessions to maintain cookies and fingerprints across multiple requests.

Practical example:

Scraping product details from an e-commerce site by sending a GET request with custom headers and rotating proxies to avoid detection.

Properties

Name	Meaning
URL	The web page URL to scrape.
Method	The HTTP method to use for the request. Options: `GET`, `HEAD`, `OPTIONS`, `PATCH`, `POST`, `PUT`.
Additional Fields	A collection of optional parameters:
- Body	The HTTP request body (for methods like POST or PUT).
- Headers	Custom HTTP headers as key-value pairs to include in the request (e.g., `Accept-Language: en-US`).
- Proxy Pool	The proxy pool to use for the request. Options: `Public Datacenter Pool`, `Public Residential Pool`.
- Country	The country code for proxy geolocation (e.g., `us` for United States).
- Anti-Scraping Protection (asp)	Enable to bypass anti-bot protections automatically.
- Session	A named session string to reuse cookies, fingerprint, and proxy across multiple scrapes. Must be alphanumeric and max 255 characters.
- Session Sticky Proxy	Whether to reuse the same proxy IP within the session (best effort). Defaults to true.
- Debug	Enable debug mode to get detailed logs for troubleshooting.

Output

The node outputs an array of JSON objects representing the response from the Scrapfly API for each input item processed. Each JSON object typically contains:

The scraped data or API response content.
Metadata about the request such as status codes, headers, and any error messages if applicable.

If binary data is returned (not typical for this operation), it would represent downloaded files or screenshots, but this operation focuses on JSON/text responses.

Dependencies

Requires an active Scrapfly API key credential configured in n8n.
Internet access to reach Scrapfly's API endpoints.
Optional proxy usage depends on Scrapfly's proxy pools configured via the node properties.

Troubleshooting

Common issues:
- Invalid or missing API key: Ensure the Scrapfly API key credential is correctly set up.
- Network errors or timeouts: Check internet connectivity and Scrapfly service status.
- Incorrect URL or unsupported HTTP method: Verify the URL format and method compatibility.
- Proxy-related errors: If using proxy pools, ensure the selected pool is available and supports the target region.
- Session misconfiguration: Session names must be alphanumeric and under 256 characters.
Error messages:
- Authentication failures usually indicate invalid API credentials.
- HTTP errors (4xx, 5xx) reflect issues with the target server or request parameters.
- Debug mode can be enabled to get more detailed error information for diagnosis.