ElevenLabs

WIP

Actions13

Speech Actions
- Text to Speech
- Speech to Speech
Voice Actions
History Actions
User Actions
- Get User Info
- Get User Subscription

Overview

The ElevenLabs node provides text-to-speech (TTS) functionality under the "Speech" resource with the "Text to Speech" operation. It converts input text into spoken audio using selectable voice models and various customization options. This node is useful for automating voice generation in workflows such as creating audio content, voice assistants, accessibility tools, or any scenario where converting text to natural-sounding speech is needed.

Practical examples:

Generating podcast intros or announcements from text scripts.
Creating voiceovers for videos or presentations automatically.
Building chatbots or virtual agents that respond with synthesized speech.
Producing audio versions of articles or documents for accessibility.

Properties

Name	Meaning
Text	The text string that will be converted into speech.
Voice ID	The voice to use for speech synthesis. Can be selected from a searchable list of available voices or specified by ID.
Additional Fields	Optional advanced settings including:
- Binary Name	Custom name for the output binary data field containing the audio.
- File Name	Custom file name for the generated audio file.
- Model Name or ID	Identifier of the voice model to use for synthesis, selectable from a list or specified by ID.
- Output Format	Audio format of the output (e.g., mp3_44100_128).
- Seed	Numeric seed value to produce deterministic voice output.
- Similarity Boost	Number between 0 and 1 to control how closely the voice matches the original speaker's style.
- Speaker Boost	Boolean to enable or disable speaker boost feature.
- Stability	Number between 0 and 1 controlling voice stability during synthesis.
- Stitching	Boolean to enable stitching, which provides context by passing previous text to improve continuity.
- Streaming Latency	Integer (0-4) to optimize streaming latency at some cost to quality.
- Style	Number between 0 and 1 to exaggerate the voice style for expressive speech.

Output

The node outputs the generated speech audio as binary data. The binary data contains the audio file encoded in the specified output format (default is MP3 with 44.1kHz sample rate and 128 kbps bitrate). The output binary property name can be customized via the "Binary Name" additional field, and the filename can be set via the "File Name" field.

The JSON output typically includes metadata about the synthesis request and may include references to the audio data, but the primary payload is the binary audio file ready for further processing or saving.

Dependencies

Requires an API key credential for ElevenLabs API authentication.
Depends on ElevenLabs cloud service for text-to-speech conversion.
Needs network access to https://api.elevenlabs.io/v1.
No other external dependencies are indicated.

Troubleshooting

Common issues:
- Invalid or missing API key: Ensure the API key credential is correctly configured.
- Unsupported voice ID or model ID: Verify the voice and model IDs exist and are accessible.
- Network connectivity problems: Confirm internet access and API endpoint availability.
- Incorrect output format: Use supported audio formats; otherwise, the node may fail or produce unusable audio.
Error messages:
- Authentication errors usually indicate invalid or expired API credentials.
- Validation errors may occur if required fields like "Text" or "Voice ID" are missing.
- Rate limiting or quota exceeded errors from the API require checking usage limits or upgrading the plan.