Overview
This node performs speech-to-text conversion on an audio file provided via a URL. It supports multiple providers for the transcription, specifically "Google Gemini" and "OpenAI API". The user supplies the audio file URL and a prompt that guides how the transcription should be extracted from the audio content.
Common scenarios where this node is beneficial include:
- Transcribing recorded meetings, interviews, or podcasts into text.
- Extracting spoken content from audio files for further processing or analysis.
- Automating note-taking or caption generation from audio sources.
Practical example:
- A user has a podcast episode hosted online and wants to generate a transcript automatically. They provide the podcast audio URL and select the preferred provider (e.g., Google Gemini). The node returns the transcribed text based on the audio content.
Properties
| Name | Meaning |
|---|---|
| Provider | Selects the speech-to-text service provider. Options: "Google Gemini", "OpenAI API". |
| Audio File URL | URL of the audio file to be converted to text. This must be a publicly accessible link. |
| Prompt | Custom prompt guiding the transcription process. Example: "Extract the content from the audio file and DO NOT add any content to the answer". |
| https://trolyai.me (outputFormat) | Notice field indicating the format of the output data (informational only, no input expected). |
Output
The node outputs JSON data containing the transcription result of the audio file. The exact structure is not explicitly detailed in the source code, but typically it includes:
- A textual transcription of the audio content under a key such as
textor similar. - Possibly metadata about the transcription process (e.g., confidence scores, provider used).
No binary data output is indicated by the source code.
Dependencies
- Requires internet access to fetch the audio file from the provided URL.
- Requires valid credentials/API keys configured in n8n for the selected provider (Google Gemini or OpenAI API).
- The node depends on external APIs for speech-to-text conversion; thus, proper authentication and API quota management are necessary.
- No other explicit environment variables or configurations are mentioned.
Troubleshooting
Common Issues
- Invalid or inaccessible audio URL: If the audio file URL is incorrect, private, or behind authentication, the node will fail to retrieve the audio.
- Unsupported audio format: Providers may require specific audio formats or codecs; unsupported formats can cause errors.
- Missing or invalid API credentials: Without proper API keys or tokens for the chosen provider, the node cannot perform transcription.
- Network issues: Connectivity problems can prevent accessing the audio file or the provider's API.
Error Messages and Resolutions
- Failed to fetch audio file: Check the URL accessibility and ensure it is publicly reachable.
- Authentication error with provider: Verify that the API credentials are correctly set up in n8n.
- Transcription timeout or failure: Large audio files or slow network may cause timeouts; consider shortening audio or improving connectivity.
- Unsupported provider option: Ensure the provider value matches one of the supported options ("gemini" or "openAi").
Links and References
- Google Cloud Speech-to-Text
- OpenAI API Documentation
- n8n Documentation - Creating Nodes
- Best Practices for Audio Transcription
Note: The above summary is based solely on static analysis of the provided source code and property definitions. Runtime behavior and dynamic responses depend on actual API implementations and configurations.