ScrapeNinja icon

ScrapeNinja

Consume ScrapeNinja Web Scraping API - See full documentation at https://scrapeninja.net/docs/

Overview

The ScrapeNinja node's "Scrape Single Page (Fast)" operation allows you to scrape the HTML content of a single web page quickly, without executing JavaScript. It uses ScrapeNinja's high-performance API endpoint that mimics a real browser's network request, making it suitable for extracting data from most websites that do not require client-side rendering.

Common scenarios:

  • Extracting product details, prices, or metadata from e-commerce pages.
  • Collecting article content or headlines from news sites.
  • Gathering structured data from public directories or listings.

Practical example:
You can use this node to fetch and parse the HTML of a product page, then process the result in subsequent n8n nodes to extract specific information like price, title, or images.


Properties

Below are the supported input properties for the "Scrape Single Page (Fast)" operation:

Display Name Type Description
URL to Scrape String The target URL to scrape. Example: https://example.com. You can test your request's IP and headers using the provided test URLs. (Required)
Headers String[] Custom request headers (one per line: "HeaderName: value"). User-Agent and other basic headers are automatically added by ScrapeNinja.
Retry Count Number Number of retry attempts if certain conditions fail (HTTP failure, unexpected text, or status code). Default is 1.
Geo Location Options Proxy geo location or custom proxy. Each attempt may use a different IP if using a geo option. Choices include US, Europe, Australia, etc., or a custom/premium proxy.
Custom Proxy URL String Premium or custom proxy URL. Only shown if "Geo Location" is set to "[Custom or Premium Proxy]". See proxy setup guide.
Text Not Expected String[] Array of text patterns; if found in the response, triggers a retry with another proxy.
Status Not Expected Number[] HTTP status codes that trigger a retry with another proxy. By default, 403 and 502 are included.
Extractor (Custom JS) String Custom JavaScript function for extracting data from HTML. Receives page HTML as input and Cheerio parser as cheerio. Must return a JSON object.
Follow Redirects Boolean Whether to follow HTTP redirects. Default is true.
Timeout (Seconds) Number Timeout per attempt (in seconds). Default is 10.

Output

The output of this operation is a JSON object containing the scraped page's content and related metadata. The exact structure may vary depending on the configuration and whether a custom extractor is used, but typically includes:

{
  "status": 200,
  "headers": { /* response headers */ },
  "body": "<!DOCTYPE html> ...", // Raw HTML content of the page
  "url": "https://example.com",
  "timing": { /* timing info */ }
}
  • If a custom extractor is provided, the output will be the result of your extraction function (a JSON object).
  • If an error occurs and "Continue On Fail" is enabled, the output will contain an error field with the error message and optional details.

Note: This node does not output binary data.


Dependencies

  • External Service: Requires access to the ScrapeNinja API.
  • API Key: You must configure ScrapeNinja API credentials in n8n.
  • Optional: For premium/custom proxies, additional setup as described in the proxy setup guide.

Troubleshooting

Common issues:

  • Invalid or missing API key: Ensure your ScrapeNinja credentials are correctly configured in n8n.
  • Blocked requests or CAPTCHAs: Some sites may block scraping or present CAPTCHAs. Try changing the geo location or using a premium proxy.
  • Timeouts: Increase the "Timeout (Seconds)" property if the target site is slow to respond.
  • Unexpected output: If the returned HTML is not as expected, check if the site requires JavaScript rendering (consider using the "Scrape Single Page (Browser, Slow)" operation instead).

Error messages:

  • "error": "Request failed with status code 403": The site blocked the request. Try a different geo location or proxy.
  • "error": "Text Not Expected found": The specified unwanted text was detected in the response. Adjust your "Text Not Expected" patterns or increase retries.
  • "error": "Timeout exceeded": The request took too long. Increase the timeout setting.

Links and References

Discussion