Scrapfly

Scrapfly data collection APIs for web page scraping, screenshots, and AI data extraction

Actions4

Scrape Actions
- Scrape Web Page
- Scrape API Request
Extraction Actions
- Extract Data From an HTML, Text, or Markdown Document Using AI
Screenshot Actions
- Capture Web Page Screenshot

Overview

This node integrates with Scrapfly's data collection APIs to perform various web-related operations, including scraping web pages, taking screenshots, managing account info, and extracting structured data from documents using AI. Specifically, for the Extraction resource with the operation Extract Data From an HTML, Text, or Markdown Document Using AI, the node accepts document content and uses AI-powered extraction to parse and return structured data.

Common scenarios where this node is beneficial include:

Automatically extracting structured information (e.g., product details, metadata, summaries) from raw HTML, text, or markdown content.
Parsing web page content fetched elsewhere to transform it into usable JSON data without manual coding of parsers.
Leveraging AI models to interpret complex or unstructured documents for data extraction tasks.

Practical example:

You have scraped a product page’s HTML and want to extract product name, price, and availability automatically. You provide the HTML content as input, specify the content type, and optionally supply an AI extraction prompt or template. The node returns structured JSON with the extracted fields.

Properties

Name	Meaning
Body	The full content of the document you want to extract data from. Must be provided as a string in the format specified by the Content Type property (e.g., HTML, plain text, markdown).
Content Type	Specifies the MIME type of the document content passed in the Body. Common values include `text/html`, `text/plain`, or `text/markdown`. This informs the extraction engine how to interpret the content.
Additional Fields	A collection of optional parameters to customize extraction: - URL: Base URL to resolve relative links in the document. - Charset: Character encoding of the document; use `auto` to detect automatically. - Extraction Template: JSON template defining structured data to extract. - Extraction Prompt: Text prompt guiding the AI on what data to extract. - Extraction Model: Specifies which AI model to use for extraction. - Webhook Name: If set, queues the extraction request and sends results asynchronously to the named webhook endpoint.

Output

The node outputs an array of JSON objects representing the extracted structured data from the input document. The exact structure depends on the extraction template or AI prompt used but generally includes key-value pairs corresponding to the parsed fields.

If the webhook option is used, the output may represent a queued request confirmation rather than immediate extraction results.

No binary data output is indicated for this operation.

Dependencies

Requires an active Scrapfly API key credential configured in n8n to authenticate requests.
Depends on Scrapfly’s external AI-powered extraction service.
Network access to Scrapfly API endpoints is necessary.
Optional: Webhook infrastructure if asynchronous extraction via webhook is used.

Troubleshooting

Empty or incorrect extraction results: Ensure the Body content matches the declared Content Type and that the Extraction Template or Prompt correctly describes the desired data.
Character encoding issues: If extracted text appears garbled, try specifying the correct Charset instead of auto.
API authentication errors: Verify that the Scrapfly API key credential is valid and has sufficient permissions.
Webhook delivery failures: Confirm that the webhook endpoint specified is reachable and properly configured to receive POST requests.
Rate limits or quota exceeded: Scrapfly API may limit usage; check your account status and usage quotas.