ScrapeNinja

Consume ScrapeNinja Web Scraping API - See full documentation at https://scrapeninja.net/docs/

Actions6

Overview

The ScrapeNinja node's "Scrape Single Page (Fast)" operation allows you to scrape the HTML content of a single web page quickly, without executing JavaScript. It uses ScrapeNinja's high-performance API endpoint that mimics a real browser's network request, making it suitable for extracting data from most websites that do not require client-side rendering.

Common scenarios:

Extracting product details, prices, or metadata from e-commerce pages.
Collecting article content or headlines from news sites.
Gathering structured data from public directories or listings.

Practical example:
You can use this node to fetch and parse the HTML of a product page, then process the result in subsequent n8n nodes to extract specific information like price, title, or images.

Properties

Below are the supported input properties for the "Scrape Single Page (Fast)" operation:

Display Name	Type	Description
URL to Scrape	String	The target URL to scrape. Example: `https://example.com`. You can test your request's IP and headers using the provided test URLs. (Required)
Headers	String[]	Custom request headers (one per line: `"HeaderName: value"`). User-Agent and other basic headers are automatically added by ScrapeNinja.
Retry Count	Number	Number of retry attempts if certain conditions fail (HTTP failure, unexpected text, or status code). Default is 1.
Geo Location	Options	Proxy geo location or custom proxy. Each attempt may use a different IP if using a geo option. Choices include US, Europe, Australia, etc., or a custom/premium proxy.
Custom Proxy URL	String	Premium or custom proxy URL. Only shown if "Geo Location" is set to "[Custom or Premium Proxy]". See proxy setup guide.
Text Not Expected	String[]	Array of text patterns; if found in the response, triggers a retry with another proxy.
Status Not Expected	Number[]	HTTP status codes that trigger a retry with another proxy. By default, 403 and 502 are included.
Extractor (Custom JS)	String	Custom JavaScript function for extracting data from HTML. Receives page HTML as `input` and Cheerio parser as `cheerio`. Must return a JSON object.
Follow Redirects	Boolean	Whether to follow HTTP redirects. Default is true.
Timeout (Seconds)	Number	Timeout per attempt (in seconds). Default is 10.

Output

The output of this operation is a JSON object containing the scraped page's content and related metadata. The exact structure may vary depending on the configuration and whether a custom extractor is used, but typically includes:

{
  "status": 200,
  "headers": { /* response headers */ },
  "body": "<!DOCTYPE html> ...", // Raw HTML content of the page
  "url": "https://example.com",
  "timing": { /* timing info */ }
}

If a custom extractor is provided, the output will be the result of your extraction function (a JSON object).
If an error occurs and "Continue On Fail" is enabled, the output will contain an error field with the error message and optional details.

Note: This node does not output binary data.

Dependencies

External Service: Requires access to the ScrapeNinja API.
API Key: You must configure ScrapeNinja API credentials in n8n.
Optional: For premium/custom proxies, additional setup as described in the proxy setup guide.

Troubleshooting

Common issues:

Invalid or missing API key: Ensure your ScrapeNinja credentials are correctly configured in n8n.
Blocked requests or CAPTCHAs: Some sites may block scraping or present CAPTCHAs. Try changing the geo location or using a premium proxy.
Timeouts: Increase the "Timeout (Seconds)" property if the target site is slow to respond.
Unexpected output: If the returned HTML is not as expected, check if the site requires JavaScript rendering (consider using the "Scrape Single Page (Browser, Slow)" operation instead).

Error messages:

"error": "Request failed with status code 403": The site blocked the request. Try a different geo location or proxy.
"error": "Text Not Expected found": The specified unwanted text was detected in the response. Adjust your "Text Not Expected" patterns or increase retries.
"error": "Timeout exceeded": The request took too long. Increase the timeout setting.