ScrapeNinja

Consume ScrapeNinja Web Scraping API - See full documentation at https://scrapeninja.net/docs/

Actions6

Overview

The ScrapeNinja node's "Scrape Single Page (Browser, Slow)" operation enables you to scrape a single web page using a real browser environment (headless Chrome). This approach allows for the execution of JavaScript on the target page, making it suitable for scraping dynamic content that requires JS rendering, handling complex interactions, or capturing screenshots. It is particularly useful for:

Extracting data from modern websites that rely heavily on client-side JavaScript.
Bypassing basic anti-bot protections by mimicking real browser behavior.
Capturing rendered HTML, screenshots, and even specific iframe contents.

Practical examples:

Scraping product details from e-commerce sites with dynamic loading.
Collecting news articles from sites that render content via JavaScript.
Taking screenshots of landing pages for monitoring or archival purposes.

Properties

Below are the supported input properties for this operation, along with their display names, types, and meanings:

Display Name	Type	Meaning
URL to Scrape	string	The URL of the web page to scrape.
Headers	string[]	Custom request headers (one per line: `"HeaderName: value"`). User-Agent and other basic headers are added automatically.
Retry Count	number	Number of retry attempts if certain conditions fail (e.g., HTTP errors, unexpected text/status).
Geo Location	options	Proxy geo location or custom proxy selection. Each attempt may use a different IP if using a geo option.
Custom Proxy URL	string	Premium or custom proxy URL (used if Geo Location is set to "[Custom or Premium Proxy]").
Text Not Expected	string[]	Array of text patterns; if found in the response, triggers a retry with another proxy.
Status Not Expected	number[]	HTTP status codes that trigger a retry with another proxy. Defaults include 403 and 502.
Extractor (Custom JS)	string	Custom JavaScript function for extracting data from the HTML. Receives page HTML and Cheerio parser as arguments. Must return a JSON object.
Timeout (Seconds)	number	Timeout per attempt (in seconds) for JS-based scraping.
Wait For Selector	string	CSS selector to wait for before considering the page loaded.
Dump Iframe	string	Name of an iframe to dump. Waits for this iframe to appear in the DOM.
Wait For Selector in Iframe	string	CSS selector to wait for inside the specified iframe.
Extractor Target Iframe	boolean	Whether to run the custom extractor on the iframe HTML instead of the main page.
Block Images	boolean	Whether to block images in Chrome to speed up loading.
Block Media (CSS, Fonts)	boolean	Whether to block CSS/fonts in Chrome to speed up loading.
Screenshot	boolean	Whether to take a screenshot of the page (slower if enabled).
Catch Ajax Headers URL Mask	string	If set, captures/dumps XHR requests/responses matching this mask.
Post-Load Wait Time	number	Seconds to wait after page load (1–12s). Use if automatic waiting fails.
Viewport Settings (JSON)	string	Advanced: Custom viewport size/settings as a JSON object. Default is 1920x1080.

Output

The output will be a JSON object containing the results of the scraping operation. The structure can vary depending on the options selected, but typically includes:

Extracted Data: If a custom extractor is provided, the returned JSON object from your extractor function.
HTML Content: The full HTML of the page (or iframe, if specified).
Screenshot (optional): If enabled, binary data representing the screenshot (as a file attachment).
XHR/Ajax Data (optional): If "Catch Ajax Headers URL Mask" is set, relevant request/response data.
Meta Information: Such as HTTP status, headers, timing, and possibly proxy/geolocation info.

Note: If binary data (like screenshots) is included, it will be available in the binary output field.

Dependencies

External Service: Requires access to the ScrapeNinja API.
API Key: You must configure the scrapeNinjaApi credential in n8n.
Proxy (optional): For custom proxies, follow the proxy setup guide.

Troubleshooting

Common Issues:

Invalid API Key: Ensure your ScrapeNinja API credentials are correctly configured in n8n.
Timeouts: Increase the "Timeout (Seconds)" property if the target site is slow to load.
Blocked Requests: Some sites may still block scraping despite browser emulation. Try changing the Geo Location or using a premium proxy.
Extractor Errors: If your custom JS extractor throws an error, ensure it returns a valid JSON object and uses the correct function signature.
Binary Output Handling: If you enable screenshots, make sure downstream nodes can handle binary data.

Error Messages:

"error": "<message>", "details": "<additional details>": General error format. Check the message and details for clues (e.g., network errors, invalid selectors, extractor exceptions).
HTTP Status Not Expected: If you see retries or failures due to status codes, adjust the "Status Not Expected" list as needed.