Playwright - Page Source

Basic Example Node

Overview

This node captures the full HTML source content of a web page using Playwright. It waits until the page's network activity is idle to ensure the page is fully loaded before retrieving the source. This is useful for scenarios where you need to extract or analyze the complete HTML content of a webpage, such as web scraping, content verification, or archiving.

Use Case Examples

Extracting the HTML source of a webpage after it has fully loaded to scrape data.
Saving the HTML content of a page for offline analysis or debugging.
Verifying that a webpage has loaded specific elements by inspecting its source.

Output

Binary

Outputs the full HTML page source as a binary file named 'page-source.html' with MIME type 'text/html'.

JSON

pageSource - The full HTML source content of the loaded webpage as a string.
status - Indicates the success status of the operation, always 'success' if the page source was retrieved.

Dependencies

Requires a Playwright browser manager to control and interact with the browser page.

Troubleshooting

If the node fails to retrieve the page source, ensure the browser manager is properly configured and the page has fully loaded.
Network idle state waiting might timeout if the page has continuous network activity; consider adjusting the wait conditions if necessary.

Playwright - Page SourceInstall