ElevenLabs

WIP

Actions16

Speech Actions
Voice Actions
History Actions
User Actions
- Get User Info
- Get User Subscription

Overview

The "Create Transcript" operation of the Speech resource in this node converts audio or video files into text transcripts using an external speech-to-text API. This is useful for automating transcription tasks such as generating subtitles, creating searchable text from meetings or interviews, and enabling voice command processing.

Typical use cases include:

Transcribing recorded podcasts or webinars.
Converting customer support calls to text for analysis.
Creating captions for videos automatically.
Extracting text from voice notes or audio messages.

Users provide an audio/video file as binary input, and the node returns a detailed transcription with optional features like speaker diarization, timestamps, and tagging of audio events.

Properties

Name	Meaning
Binary Input Field	The name of the binary property containing the audio/video file to transcribe.
Transcript Model ID	The identifier of the transcription model to use. Currently, only `"scribe_v1"` is available.
Language Code	ISO-639-1 or ISO-639-3 language code to specify the language of the audio. If left empty, the language will be auto-detected.
Tag Audio Events	Whether to tag audio events (e.g., laughter, footsteps) in the transcription output. Boolean value (`true`/`false`).
Number of Speakers	Maximum number of speakers expected in the audio. Helps improve speaker prediction accuracy. Accepts values between 1 and 32.
Timestamps Granularity	Level of detail for timestamps in the transcript. Options: `none`, `word`, or `character`. Default is `word`.
Speaker Diarization	Whether to annotate which speaker is talking throughout the audio. Boolean value (`true`/`false`).
Enable Logging	Whether to enable logging on the API side. Disabling logging means zero retention mode, disabling history features. Boolean value (`true`/`false`).

Note: There is also a hidden property used internally for request configuration and query string parameters.

Output

The node outputs JSON data representing the transcription result. This typically includes:

The full transcribed text.
Optional speaker labels if diarization is enabled.
Timestamps for words or characters depending on granularity settings.
Tags for audio events if enabled.
Metadata about the transcription process.

If the input was binary audio/video, the node processes it and returns the transcription in the JSON output field. No binary output is generated by this operation.

Dependencies

Requires an active API key credential for the ElevenLabs API.
The node sends requests to the ElevenLabs speech-to-text endpoint (/speech-to-text).
The user must supply the audio/video file as binary data within the workflow.
Optional parameters depend on supported transcription models and API capabilities.

Troubleshooting

Common issues:
- Providing an incorrect or missing binary input field name will cause the node to fail to find the audio data.
- Using unsupported language codes or transcription models may lead to errors or fallback to defaults.
- Enabling speaker diarization without specifying a reasonable number of speakers might reduce accuracy.
- Network or authentication errors if the API key is invalid or quota exceeded.
Error messages:
- Errors related to missing binary data: Ensure the binary property name matches the actual input.
- API authentication errors: Verify that the API key credential is correctly configured.
- Model not found or unsupported language: Check the model ID and language code inputs.
- Rate limit or quota exceeded: Wait or upgrade your API plan.

ElevenLabs

Actions16

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

ElevenLabsInstall

Actions16

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

ElevenLabs