ElevenLabs

WIP

Actions16

Speech Actions
Voice Actions
History Actions
User Actions
- Get User Info
- Get User Subscription

Overview

The "Create Transcript" operation of the Speech resource in this node converts audio or video files into text transcripts. It is designed to transcribe spoken content from binary audio/video input, supporting features like speaker diarization, language specification, and timestamp granularity. This node is useful for automating transcription workflows such as generating meeting notes, creating subtitles, or analyzing audio content.

Practical examples:

Transcribing recorded interviews or podcasts into searchable text.
Generating captions for videos automatically.
Extracting dialogue from customer support calls for quality analysis.

Properties

Name	Meaning
Binary Input Field	The name of the binary property containing the audio or video file to be transcribed.
Additional Fields	A collection of optional parameters to customize transcription:
- Transcript Model ID	The transcription model to use; currently only "scribe_v1" is available.
- Language Code	ISO-639-1 or ISO-639-3 code specifying the language of the audio. If omitted, the language will be auto-detected.
- Tag Audio Events	Whether to tag audio events (e.g., laughter, footsteps) in the transcript. Defaults to true.
- Number of Speakers	Maximum number of speakers expected in the audio, aiding speaker identification. Range: 1 to 32.
- Timestamps Granularity	Level of detail for timestamps in the transcript: None, Word, or Character. Default is Word.
- Speaker Diarization	Whether to annotate which speaker is talking throughout the audio. Defaults to false.
- Enable Logging	Enables logging of the transcription process. When false, zero retention mode is used (history features unavailable). Defaults to true.

Output

The node outputs a JSON object containing the transcription result. This includes the transcribed text along with metadata such as timestamps (depending on granularity), speaker labels if diarization is enabled, and tagged audio events if selected. The output does not include binary data but focuses on textual transcription results enriched with contextual information.

Dependencies

Requires an API key credential for authentication with the ElevenLabs API.
The node sends requests to the ElevenLabs speech-to-text endpoint (/speech-to-text).
Proper configuration of the API key credential in n8n is necessary.
The input audio/video must be provided as binary data within the specified binary input field.

Troubleshooting

Common issues:
- Providing an incorrect or empty binary input field name will cause the node to fail to find the audio data.
- Using unsupported audio formats or corrupted files may lead to transcription errors.
- Specifying an invalid language code might cause fallback to auto-detection or errors.
- Enabling speaker diarization without multiple speakers may produce unexpected results.
Error messages:
- Authentication errors indicate missing or invalid API credentials; verify the API key setup.
- Request failures due to network issues or API limits should be retried or checked against service status.
- Validation errors on parameters (e.g., number of speakers out of range) require correcting input values.

ElevenLabs

Actions16

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

ElevenLabsInstall

Actions16

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

ElevenLabs