Overview
This node allows you to save a web page to the Wayback Machine, an internet archive service that captures and stores snapshots of web pages over time. It is useful for preserving the state of a webpage at a specific moment, archiving content for future reference, or ensuring access to pages that might change or be removed.
Common scenarios include:
- Archiving important web pages for compliance or record-keeping.
- Capturing dynamic content with JavaScript interactions.
- Automatically saving pages with error status codes or outlinks.
- Generating screenshots of web pages for visual records.
For example, you can use this node to save a news article URL to the Wayback Machine, optionally capturing all outlinks and taking a full-page screenshot.
Properties
| Name | Meaning |
|---|---|
| URL | The URL of the page to save to the Wayback Machine. |
| Capture All | Whether to capture pages even if they return HTTP errors (status 4xx or 5xx). By default, only pages with status 200 are captured. |
| Capture Cookie | An extra HTTP cookie value to send when capturing the target page. Useful for authenticated or personalized content. |
| Capture Outlinks | Whether to automatically capture all outlinks found on the page, including links in PDFs, JSON, RSS, and MRSS feeds. |
| Capture Screenshot | Whether to capture a full-page screenshot in PNG format, stored separately in the Wayback Machine. |
| Delay Wayback Machine Availability | Whether the capture becomes available after approximately 12 hours instead of immediately, reducing system load. API responses remain the same regardless. |
| Email Result | Whether to send an email report of the captured URLs to the user's email address. |
| Force Get | Whether to force using a simple HTTP GET request to capture the page. By default, a HEAD request decides whether to use a headless browser or GET. This overrides that behavior. |
| If Not Archived Within | Only capture the page if the latest existing capture is older than this time delta (e.g., "3d 5h 20m" or seconds like "120"). Supports two comma-separated values: first for main capture, second for outlinks. Defaults to 45 minutes. |
| JS Behavior Timeout | Number of seconds (0–30) to run JavaScript after page load to trigger dynamic content loading (e.g., image hover, scroll). Default is 5 seconds. Set to 0 to skip JS execution for faster capture. |
| Outlinks Availability | Whether to return timestamps of the last capture for all outlinks. |
| Skip First Archive | Whether to skip checking if the capture is the first one, which speeds up the process if that information is not needed. |
| Target Username | Username to use in login forms on the target page, if authentication is required. |
| Target Password | Password to use in login forms on the target page, if authentication is required. |
| Use User-Agent | Custom HTTP User-Agent string to use when capturing the target page. |
Output
The node outputs JSON data representing the response from the Wayback Machine's save API. This typically includes metadata about the saved capture such as the archived URL, timestamp, and status.
If the capture includes a screenshot, it is stored separately by the Wayback Machine but not directly output as binary data by this node.
Output structure example (simplified):
{
"archived_snapshots": {
"closest": {
"available": true,
"url": "https://web.archive.org/web/20230101000000/https://example.com",
"timestamp": "20230101000000"
}
},
"url": "https://example.com",
"timestamp": "20230101000000",
"status": "success"
}
Dependencies
- Requires an API key credential for the Internet Archive service configured in n8n.
- Makes HTTP POST requests to
https://web.archive.org/savewith form data parameters. - No additional external dependencies beyond standard HTTP and authentication helpers.
Troubleshooting
- Invalid URL Error: If the provided URL is malformed or cannot be parsed, the node throws an "Invalid URL" error. Ensure the URL is correctly formatted and includes the protocol (e.g., https://).
- Authentication Errors: Missing or invalid API credentials will cause authentication failures. Verify that the API key credential is properly set up.
- Timeouts or Slow Responses: Capturing pages with heavy JavaScript or many outlinks may take longer. Adjust the JS Behavior Timeout or disable outlink capture to speed up.
- Capture Not Available Immediately: If "Delay Wayback Machine Availability" is enabled, captures appear after ~12 hours. This is expected behavior to reduce server load.
- HTTP Errors on Target Page: By default, pages returning HTTP errors (4xx, 5xx) are not captured unless "Capture All" is enabled.