Groq Voice Models

Process audio using Groq Voice Models

Overview

This node processes audio files using Groq Voice Models to perform speech-to-text transcription. It downloads an audio file from a provided URL, sends it to the Groq API for transcription, and returns the transcribed text or detailed JSON data depending on the selected response format.

Common scenarios where this node is beneficial include:

Transcribing interviews, podcasts, or meetings from audio files.
Converting voice notes or recorded lectures into searchable text.
Multilingual transcription tasks with support for different Groq models optimized for accuracy or cost.
Extracting timestamps and metadata for segments in verbose JSON mode for further analysis.

Practical example:

A user provides a URL to an MP3 recording of a meeting, selects the "Whisper Large V3 Turbo" model for balanced performance, optionally specifies the language as English ("en"), and chooses the plain text response format. The node outputs the full transcript as text for downstream processing or storage.

Properties

Name	Meaning
File URL	URL of the audio file to process. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. Maximum file size is 25MB.
Model	The Groq model to use for processing the audio. Options: • Whisper Large V3 — best for error-sensitive tasks and multilingual support. • Whisper Large V3 Turbo — best price-performance ratio for multilingual support. • Distil Whisper Large V3 (English) — best for English-only tasks with lower cost.
Language	Optional language code (e.g., "en" for English). Specifying this improves transcription accuracy.
Response Format	Format of the transcription response. Options: • JSON — basic JSON response. • Verbose JSON — includes timestamps and metadata for each segment. • Text — plain text transcription output.

Output

The node outputs an array with one item per input. Each item contains a json field with the following structure:

On success (success: true):
- If Response Format is set to Text:
```
{
  "success": true,
  "text": "<transcribed plain text>"
}
```
- If Response Format is JSON or Verbose JSON:
```
{
  "success": true,
  "data": { /* transcription JSON object returned by Groq API */ }
}
```
  The JSON includes either a simple transcription or detailed segments with timestamps and metadata depending on the chosen format.

On failure (success: false):

{
  "success": false,
  "error": "<error message>"
}

The node does not output binary data.

Dependencies

Requires an active Groq API key credential for authentication.
Downloads the audio file from the provided URL; the URL must point directly to a supported audio file.
Uses temporary local storage to save the downloaded audio before uploading it to the Groq API.
Supports audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
Maximum file size accepted by the API is 25MB.
Suggests using ffmpeg externally to convert or preprocess audio files if needed (e.g., to reduce size or change format).

Troubleshooting

No API key provided!
Ensure that the Groq API key credential is configured correctly in n8n.
Failed to download file: [status text]
Check that the provided File URL is correct, accessible, and points directly to an audio file.
Unsupported content type: [type]. The URL must point to an audio file.
The file at the URL is not a supported audio format. Convert the file to a supported format such as FLAC or MP3 using tools like ffmpeg.
Processing failed
This can occur if the file exceeds the 25MB limit or if the API rejects the request. Consider preprocessing the audio to reduce size or check API usage limits.
File size exceeds 25MB
Use ffmpeg to convert and downsample the audio, e.g.:
```
ffmpeg -i <input_file> -ar 16000 -ac 1 -c:a flac output.flac
```
If the node is set to continue on fail, errors will be returned in the output JSON instead of stopping execution.

Links and References

Groq API Documentation (for audio transcription endpoint)
FFmpeg Official Site — for audio conversion and preprocessing commands

Groq Voice ModelsInstall