Actions10
- Audio Actions
- Document Actions
- File Actions
- Image Actions
- Text Actions
- Video Actions
Overview
This node provides functionality to transcribe audio recordings using a Google Gemini API endpoint. It supports transcription from either audio URLs or binary audio files provided as input. Users can specify the model to use for transcription, and optionally define start and end times to transcribe only a segment of the audio.
Common scenarios include:
- Transcribing podcast episodes or interviews hosted online by providing their URLs.
- Processing locally uploaded audio files (e.g., voice memos, meeting recordings) in binary form.
- Extracting text from specific segments of longer audio files by specifying start and end times.
Practical example:
- A user wants to convert an online lecture recording into text. They provide the URL of the audio file, select a suitable speech-to-text model, and receive the transcription output.
- Another user uploads multiple audio files directly to n8n, which are then transcribed in batch.
Properties
| Name | Meaning |
|---|---|
| Server URL | The base URL of the Google Gemini API endpoint to send transcription requests to. |
| API Key | The API key credential used to authenticate requests to the Google Gemini service. |
| Model | The identifier of the speech-to-text model to use for transcription. Can be selected from a list or entered manually. |
| Input Type | Specifies whether the audio input is provided as URLs ("Audio URL(s)") or as binary file data ("Binary File(s)"). |
| URL(s) | One or more comma-separated URLs pointing to audio files to transcribe. Only shown if Input Type is "Audio URL(s)". |
| Input Data Field Name(s) | The name(s) of the binary fields containing audio data to process. Multiple field names can be comma-separated. Only shown if Input Type is "Binary File(s)". |
| Simplify Output | Boolean flag indicating whether to simplify the transcription response output for easier consumption. |
| Options - Start Time | Optional start time offset within the audio to begin transcription, formatted as MM:SS or HH:MM:SS. |
| Options - End Time | Optional end time offset within the audio to stop transcription, formatted as MM:SS or HH:MM:SS. |
Output
The node outputs JSON data representing the transcription results returned by the Google Gemini API. If the "Simplify Output" option is enabled, the response is transformed into a more concise format focusing on the main transcription text.
If binary input is used, the node processes the binary audio data but does not output binary data itself; the output remains textual transcription results.
Dependencies
- Requires access to the Google Gemini API endpoint specified by the Server URL.
- An API key credential with permissions to use the Google Gemini speech-to-text models.
- Network connectivity from the n8n instance to the Google Gemini API.
Troubleshooting
- Invalid API Key or Authentication Errors: Ensure the API key is valid, active, and has appropriate permissions.
- Incorrect Server URL: Verify the Server URL is correct and reachable.
- Model Not Found: Confirm the selected model ID exists and is accessible under your account.
- Malformed Audio URLs: Check that audio URLs are publicly accessible and correctly formatted.
- Binary Field Names Incorrect: When using binary input, ensure the specified field names match those in the incoming data.
- Time Format Errors: Start and end times must be in MM:SS or HH:MM:SS format; invalid formats may cause errors or ignored offsets.