Actions7
- Chat Actions
- Image Actions
- Audio Actions
Overview
The node "AIConnect" provides integration with OpenAI-compatible API functions, supporting multiple resources including Audio. For the Audio resource with the Create Transcription operation, the node transcribes audio files into text using a specified AI model.
This is useful for scenarios such as:
- Converting recorded meetings, interviews, or podcasts into searchable text.
- Generating subtitles or captions for videos.
- Automating transcription workflows in content production or customer support.
For example, you can upload an MP3 recording of a conference call and receive a text transcript that can be further processed or stored.
Properties
| Name | Meaning |
|---|---|
| Model | The AI model to use for transcription (loaded dynamically). |
| File | The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm. |
| Language | Optional ISO-639-1 code specifying the language spoken in the audio. |
| Prompt | Optional guiding text to influence the transcription style or continue from a previous segment. |
| Response Format | Format of the transcription output. Options: JSON, Text, SRT (subtitle), Verbose JSON, VTT (subtitle). |
| Temperature | Sampling temperature controlling randomness in transcription (0 to 1). |
| Timestamp Granularities | When using verbose JSON response format, choose timestamp detail level: Word or Segment. |
| Additional Options | Collection of extra options; currently supports a "User" identifier string representing the end-user. |
| Simplify Output | Whether to return a simplified version of the transcription response instead of raw data (default: true). |
Output
The node outputs an array of items where each item contains a json field with the transcription result.
- If Simplify Output is enabled, the output will be a streamlined transcription text or structured object depending on the response format.
- If disabled, the output includes the full raw response from the transcription API, which may contain detailed metadata, timestamps, confidence scores, etc.
- When subtitle formats like SRT or VTT are selected, the output contains the transcription formatted accordingly.
- Binary data output is not indicated for this operation.
Dependencies
- Requires an active connection to an OpenAI-compatible API endpoint supporting audio transcription.
- Needs an API authentication token configured in n8n credentials (referred generically as an API key credential).
- The node dynamically loads available audio models via an internal method.
- Supported audio file formats must be provided as input.
Troubleshooting
Common issues:
- Unsupported audio file format or corrupted file may cause transcription failure.
- Missing or invalid API authentication token will prevent successful API calls.
- Specifying an unsupported language code might lead to inaccurate or failed transcription.
- Choosing incompatible response formats with certain options (e.g., timestamp granularities without verbose JSON) may cause errors.
Error messages:
"The resource \"audio\" is not supported!"— indicates misconfiguration of the resource parameter.- API errors related to authentication or quota limits should be resolved by verifying credentials and usage limits.
- Errors about missing required parameters (like model or file) require ensuring all mandatory inputs are set.