Overview
This node extracts the transcript of a YouTube video by automating a headless browser session. It navigates to the specified YouTube video page, handles cookie consent dialogs if present, opens the transcript panel, and scrapes the transcript text along with timestamps.
Common scenarios where this node is useful:
- Automatically retrieving subtitles or captions from YouTube videos for content analysis.
- Generating searchable text data from video content.
- Archiving or indexing spoken content from videos.
- Integrating video transcripts into workflows for translation, summarization, or sentiment analysis.
Practical example:
You provide a YouTube video URL or ID as input, and the node returns a structured transcript with timestamps and text segments, which you can then use downstream in your workflow for further processing.
Properties
| Name | Meaning |
|---|---|
| Youtube Video ID or Url | The YouTube video identifier or full URL from which to extract the transcript. |
| Options | Collection of optional settings: |
| - Debug Mode | Enable debug mode to show additional options and logs. |
| - Open DevTools | When debug mode is enabled, automatically open Chrome DevTools for inspection. |
| - Slow Motion | When debug mode is enabled, slow down Puppeteer operations for better visibility. |
| - Use Debugger Statement | When debug mode is enabled, pause execution at a debugger statement before processing the transcript. |
| - Wait After Transcript | When debug mode is enabled, keep the browser open after transcript extraction for manual inspection. |
Output
The node outputs an array of items, each containing:
{
"youtubeId": "string",
"transcript": [
{
"timestamp": "string", // e.g., "0:01", "1:23"
"text": "string", // Transcript text segment
"seconds": number // Timestamp converted to total seconds (integer)
},
...
]
}
youtubeId: The normalized YouTube video ID used for extraction.transcript: An array of transcript segments, each with:timestamp: The displayed timestamp string.text: The corresponding transcript text.seconds: The timestamp converted into total seconds for easier time-based processing.
If the video does not have a transcript or cannot be found, the node throws an error unless configured to continue on failure.
No binary data output is produced by this node.
Dependencies
- Puppeteer Extra with the stealth plugin: Used to automate Chromium browser interactions while avoiding detection.
- Requires a working environment capable of launching Chromium/Chrome browsers (e.g., Docker container with necessary dependencies or local machine with Chrome installed).
- No external API keys or credentials are required.
- Optional debug flags can be passed via node parameters or CLI arguments to control browser behavior.
Troubleshooting
Common issues:
Browser launch failures:
May occur if the environment lacks necessary dependencies for Chromium or sandboxing is restricted.
Resolution: Ensure all Chromium dependencies are installed and sandboxing is disabled if needed.Transcript button not found:
Happens if the video has no transcript available or the video ID/URL is invalid.
Resolution: Verify the video URL or ID is correct and that the video actually has a transcript.Cookie consent dialog blocking navigation:
The node attempts to detect and click cookie consent buttons but may fail if YouTube changes their UI.
Resolution: Update the node or manually inspect with debug mode enabled.Timeouts waiting for elements:
Network slowness or UI changes can cause selectors to timeout.
Resolution: Enable debug mode and increase wait times if necessary.
Error messages:
"Failed to launch the browser: ..."
Indicates Puppeteer could not start Chromium. Check environment setup."The video with ID ... either does not exist or does not have a transcript available."
Means no transcript was found for the given video."Failed to extract transcript: ..."
General error during scraping; possibly due to DOM structure changes.
Links and References
- Puppeteer Extra GitHub – Puppeteer plugins including stealth mode.
- YouTube Captions & Transcripts – Official YouTube help on transcripts.
- n8n Documentation – For general node usage and troubleshooting.
