Actions6
- Chat Actions
- Embedding Actions
- Image Actions
- Speech Recognition Actions
- Text to Speech Actions
Overview
This node integrates with the DeepInfra API to perform speech recognition tasks, specifically transcription and translation of audio files. It supports input audio either via a direct URL or binary data from previous nodes. The node uses OpenAI Whisper models for processing the audio.
Typical use cases include:
- Translating spoken content in audio files into text in another language.
- Converting audio speech into text transcripts for further analysis or storage.
- Processing audio from URLs or directly from binary data within n8n workflows.
For example, you can provide an audio file URL containing a foreign language speech and get back its English translation text, or upload recorded audio as binary data and receive a transcript.
Properties
| Name | Meaning |
|---|---|
| Model | The speech recognition model to use. Options: "Openai Whisper-Large-V3", "Openai Whisper-Large-V3-Turbo" |
| Input Type | Specifies whether the audio input is provided as a URL or as binary data. Options: "URL", "Binary Data" |
| Audio URL | (Required if Input Type is URL) The URL of the audio file to transcribe or translate |
| Binary Property | (Required if Input Type is Binary Data) The name of the binary property containing the audio data |
| Options | Additional optional parameters: |
| - Language | The language code of the audio (ISO-639-1 format), used to guide transcription |
| - Prompt | Optional text prompt to influence the model's style or continue a previous segment |
| - Temperature | Sampling temperature for transcription, between 0 and 1, controlling randomness |
Output
The node outputs JSON data representing the result of the speech recognition operation:
- For Translate operation, the output JSON contains the translated text returned by the DeepInfra API.
- The structure matches the API response from DeepInfra's audio translation endpoint.
- No binary output is produced by this operation.
Example output snippet (simplified):
{
"text": "Translated text of the audio content"
}
Dependencies
- Requires an active API key credential for the DeepInfra API.
- Uses the DeepInfra OpenAI-compatible API endpoint at
https://api.deepinfra.com/v1/openai. - Node depends on these npm packages bundled internally:
openaifor API clientaxiosfor HTTP requestsfs,path, andosmodules for temporary file handling
Troubleshooting
Common issues:
- Invalid or missing API key will cause authentication errors.
- Providing an invalid or inaccessible audio URL will result in download failures.
- Incorrect binary property name may cause the node to fail reading audio data.
- Unsupported audio formats or corrupted files might lead to API errors.
Error messages:
- Network or HTTP errors when fetching audio URL: check URL accessibility and network connection.
- API errors related to model usage or parameters: verify model selection and options.
- File system errors during temporary file creation/deletion: ensure proper permissions on temp directory.
Resolutions:
- Confirm API key validity and permissions.
- Verify audio URL correctness or binary data presence.
- Use supported audio formats (commonly mp3).
- Ensure n8n has write access to the OS temp directory.