YouTube Transcript icon

YouTube Transcript

Extract transcripts from YouTube videos with enhanced language priority

Overview

This node extracts information from YouTube videos, supporting two main operations:

  • Get Video Info: Retrieves basic metadata about a YouTube video such as title, author, thumbnail, and embed HTML.
  • Get Transcript: Extracts the transcript (captions) of a YouTube video in various formats and languages, with options to customize output for AI agents or subtitle formats.

Common scenarios:

  • Automatically fetching video metadata for cataloging or display purposes.
  • Extracting subtitles or transcripts for accessibility, content analysis, or translation workflows.
  • Preparing structured transcript data optimized for AI processing, summarization, or natural language understanding.

Practical examples:

  • A content manager uses the node to gather video titles and thumbnails to populate a media library.
  • A researcher extracts transcripts from educational videos to analyze spoken content.
  • An AI workflow ingests simplified, timestamped transcripts with summaries for further NLP tasks.

Properties

Name Meaning
Video URL YouTube video URL or video ID to identify the target video.
Additional Options Collection of optional settings:
- Include Timestamps Whether to include timestamps in the transcript output (true/false).
- Format Output Output format of the transcript: "Raw Text", "Structured" (with timestamps), "SRT Format", or "VTT Format".
- AI Agent Mode Optimize output for AI agents by simplifying and structuring the transcript (true/false).
- Include Summary When AI Agent Mode is enabled, whether to include a human-readable summary of the transcript (true/false).
- Max Length Maximum length (in characters) of the transcript text to return (range: 100 to 50,000).

Output

The node outputs JSON data with structure depending on the operation:

For "Get Video Info":

  • Fields include:
    • title: Video title.
    • author_name: Channel or author name.
    • author_url: URL of the author/channel.
    • type: Content type (usually "video").
    • height, width: Video dimensions.
    • version: Version info.
    • provider_name, provider_url: Provider details (YouTube).
    • thumbnail_height, thumbnail_width, thumbnail_url: Thumbnail image info.
    • html: Embed HTML snippet.
    • url: Full YouTube video URL.
    • video_id: The video identifier.

For "Get Transcript":

  • Depending on options, output can be:
    • Raw Text: Plain concatenated transcript text.
    • Structured: Array of segments with text, start time, duration, and index.
    • SRT Format: SubRip subtitle formatted string.
    • VTT Format: WebVTT subtitle formatted string.
  • Additional fields:
    • language: Language code of the transcript.
    • video_id: Video identifier.
    • segments_count: Number of transcript segments.
    • total_duration: Total duration covered by transcript segments (seconds).
  • If AI Agent Mode is enabled, output includes:
    • full_text: Complete transcript text.
    • word_count: Number of words in transcript.
    • segments_count: Number of segments.
    • total_duration: Duration in seconds.
    • Optionally, a summary field containing a human-readable summary extracted from transcript start and end segments.

Dependencies

  • Requires internet access to query YouTube's public oEmbed endpoint and internal InnerTube API endpoints.
  • Uses Axios HTTP client for requests with session management and custom headers mimicking a browser.
  • No special credentials are required; all data is fetched from publicly accessible YouTube endpoints.
  • No environment variables or external API keys needed beyond standard network connectivity.

Troubleshooting

Common issues:

  • Invalid Video URL or ID: The node validates the video identifier format and throws an error if invalid. Ensure the input is a correct YouTube URL or 11-character video ID.
  • No Transcript Available: Some videos may not have captions or transcripts available in the requested language, causing errors or empty results. Try changing the language or using "auto" detection.
  • Network or Timeout Errors: Since the node relies on web requests, network failures or slow responses may cause errors. Check internet connectivity and retry.
  • API Changes: YouTube's internal APIs may change, potentially breaking transcript extraction. In such cases, fallback methods are attempted but may also fail.

Error messages and resolutions:

  • "Invalid YouTube URL or video ID": Verify the input video URL or ID format.
  • "Failed to get YouTube transcript: ...": Could indicate no captions found or API failure; try different language or check video availability.
  • "Failed to get video info: ...": Usually network or invalid video ID issue; verify inputs and connectivity.
  • Node logs detailed debug messages that can help trace the exact failure point.

Links and References


This summary reflects static analysis of the provided source code and property definitions without runtime execution.

Discussion