Overview
This node captures the full HTML source content of a web page using Playwright. It waits until the page's network activity is idle to ensure the page is fully loaded before retrieving the source. This is useful for scenarios where you need to extract or analyze the complete HTML content of a webpage, such as web scraping, content verification, or archiving.
Use Case Examples
- Extracting the HTML source of a webpage after it has fully loaded to scrape data.
- Saving the HTML content of a page for offline analysis or debugging.
- Verifying that a webpage has loaded specific elements by inspecting its source.
Output
Binary
Outputs the full HTML page source as a binary file named 'page-source.html' with MIME type 'text/html'.
JSON
pageSource- The full HTML source content of the loaded webpage as a string.status- Indicates the success status of the operation, always 'success' if the page source was retrieved.
Dependencies
- Requires a Playwright browser manager to control and interact with the browser page.
Troubleshooting
- If the node fails to retrieve the page source, ensure the browser manager is properly configured and the page has fully loaded.
- Network idle state waiting might timeout if the page has continuous network activity; consider adjusting the wait conditions if necessary.