Actions13
- Speech Actions
- Voice Actions
- History Actions
- User Actions
Overview
The ElevenLabs node provides text-to-speech (TTS) functionality under the "Speech" resource with the "Text to Speech" operation. It converts input text into spoken audio using selectable voice models and various customization options. This node is useful for automating voice generation in workflows such as creating audio content, voice assistants, accessibility tools, or any scenario where converting text to natural-sounding speech is needed.
Practical examples:
- Generating podcast intros or announcements from text scripts.
- Creating voiceovers for videos or presentations automatically.
- Building chatbots or virtual agents that respond with synthesized speech.
- Producing audio versions of articles or documents for accessibility.
Properties
| Name | Meaning |
|---|---|
| Text | The text string that will be converted into speech. |
| Voice ID | The voice to use for speech synthesis. Can be selected from a searchable list of available voices or specified by ID. |
| Additional Fields | Optional advanced settings including: |
| - Binary Name | Custom name for the output binary data field containing the audio. |
| - File Name | Custom file name for the generated audio file. |
| - Model Name or ID | Identifier of the voice model to use for synthesis, selectable from a list or specified by ID. |
| - Output Format | Audio format of the output (e.g., mp3_44100_128). |
| - Seed | Numeric seed value to produce deterministic voice output. |
| - Similarity Boost | Number between 0 and 1 to control how closely the voice matches the original speaker's style. |
| - Speaker Boost | Boolean to enable or disable speaker boost feature. |
| - Stability | Number between 0 and 1 controlling voice stability during synthesis. |
| - Stitching | Boolean to enable stitching, which provides context by passing previous text to improve continuity. |
| - Streaming Latency | Integer (0-4) to optimize streaming latency at some cost to quality. |
| - Style | Number between 0 and 1 to exaggerate the voice style for expressive speech. |
Output
The node outputs the generated speech audio as binary data. The binary data contains the audio file encoded in the specified output format (default is MP3 with 44.1kHz sample rate and 128 kbps bitrate). The output binary property name can be customized via the "Binary Name" additional field, and the filename can be set via the "File Name" field.
The JSON output typically includes metadata about the synthesis request and may include references to the audio data, but the primary payload is the binary audio file ready for further processing or saving.
Dependencies
- Requires an API key credential for ElevenLabs API authentication.
- Depends on ElevenLabs cloud service for text-to-speech conversion.
- Needs network access to
https://api.elevenlabs.io/v1. - No other external dependencies are indicated.
Troubleshooting
Common issues:
- Invalid or missing API key: Ensure the API key credential is correctly configured.
- Unsupported voice ID or model ID: Verify the voice and model IDs exist and are accessible.
- Network connectivity problems: Confirm internet access and API endpoint availability.
- Incorrect output format: Use supported audio formats; otherwise, the node may fail or produce unusable audio.
Error messages:
- Authentication errors usually indicate invalid or expired API credentials.
- Validation errors may occur if required fields like "Text" or "Voice ID" are missing.
- Rate limiting or quota exceeded errors from the API require checking usage limits or upgrading the plan.