Speech2Text icon

Speech2Text

Convert speech to text using Speech2Text API

Overview

This node performs speech-to-text conversion on an audio file provided via a URL. It supports multiple providers for the transcription, specifically "Google Gemini" and "OpenAI API". The user supplies the audio file URL and a prompt that guides how the transcription should be extracted from the audio content.

Common scenarios where this node is beneficial include:

  • Transcribing recorded meetings, interviews, or podcasts into text.
  • Extracting spoken content from audio files for further processing or analysis.
  • Automating note-taking or caption generation from audio sources.

Practical example:

  • A user has a podcast episode hosted online and wants to generate a transcript automatically. They provide the podcast audio URL and select the preferred provider (e.g., Google Gemini). The node returns the transcribed text based on the audio content.

Properties

Name Meaning
Provider Selects the speech-to-text service provider. Options: "Google Gemini", "OpenAI API".
Audio File URL URL of the audio file to be converted to text. This must be a publicly accessible link.
Prompt Custom prompt guiding the transcription process. Example: "Extract the content from the audio file and DO NOT add any content to the answer".
https://trolyai.me (outputFormat) Notice field indicating the format of the output data (informational only, no input expected).

Output

The node outputs JSON data containing the transcription result of the audio file. The exact structure is not explicitly detailed in the source code, but typically it includes:

  • A textual transcription of the audio content under a key such as text or similar.
  • Possibly metadata about the transcription process (e.g., confidence scores, provider used).

No binary data output is indicated by the source code.

Dependencies

  • Requires internet access to fetch the audio file from the provided URL.
  • Requires valid credentials/API keys configured in n8n for the selected provider (Google Gemini or OpenAI API).
  • The node depends on external APIs for speech-to-text conversion; thus, proper authentication and API quota management are necessary.
  • No other explicit environment variables or configurations are mentioned.

Troubleshooting

Common Issues

  • Invalid or inaccessible audio URL: If the audio file URL is incorrect, private, or behind authentication, the node will fail to retrieve the audio.
  • Unsupported audio format: Providers may require specific audio formats or codecs; unsupported formats can cause errors.
  • Missing or invalid API credentials: Without proper API keys or tokens for the chosen provider, the node cannot perform transcription.
  • Network issues: Connectivity problems can prevent accessing the audio file or the provider's API.

Error Messages and Resolutions

  • Failed to fetch audio file: Check the URL accessibility and ensure it is publicly reachable.
  • Authentication error with provider: Verify that the API credentials are correctly set up in n8n.
  • Transcription timeout or failure: Large audio files or slow network may cause timeouts; consider shortening audio or improving connectivity.
  • Unsupported provider option: Ensure the provider value matches one of the supported options ("gemini" or "openAi").

Links and References


Note: The above summary is based solely on static analysis of the provided source code and property definitions. Runtime behavior and dynamic responses depend on actual API implementations and configurations.

Discussion