ElevenLabs

WIP

Actions16

Speech Actions
Voice Actions
History Actions
User Actions
- Get User Info
- Get User Subscription

Overview

The ElevenLabs node provides advanced speech-related functionalities, focusing here on the Speech resource with the Text to Speech operation. This operation converts input text into spoken audio using a selected voice model. It is useful for automating voice generation in applications such as virtual assistants, audiobooks, accessibility tools, or any scenario requiring natural-sounding synthesized speech.

Users can customize the voice characteristics, output format, and other parameters to tailor the audio output to their needs. For example, you might convert customer support responses into speech for an interactive voice response system or generate narration for video content automatically.

Properties

Name	Meaning
Text	The text string that will be converted into speech.
Voice ID	The identifier of the voice to use for synthesis. Can be selected from a searchable list of available voices or entered manually by ID.
Model Name or ID	The specific voice model to use for generating speech. Selectable from a list or specified by ID.
Language Code	ISO 639-1 language code to enforce a specific language for the model (only supported by certain models).
Stability	Controls how stable the voice sounds; a number between 0 and 1 where higher values mean more stability.
Similarity Boost	Adjusts how closely the generated voice matches the original voice; a value between 0 and 1.
Style	Exaggerates the voice style; a number between 0 and 1.
Speaker Boost	Boolean flag to activate speaker boost feature.
Seed	A numeric seed to make the text-to-speech deterministic; same seed and text produce identical audio. Range: 0 to 4294967295.
Apply Text Normalization	Controls text normalization behavior: Auto (system decides), On (always applied), Off (skipped).
Use PVC as IVC	Boolean to choose whether to use the IVC version of the voice instead of the PVC version.
Stitching	Enables stitching mode to provide context by passing previous and next request IDs for smoother audio transitions.
Previous Request IDs	Comma-separated list of up to 3 request IDs representing prior audio samples for stitching context.
Next Request IDs	Comma-separated list of up to 3 request IDs representing subsequent audio samples for stitching context.
Additional Fields	Collection of optional fields including:
- Binary Name	Custom name for the binary output data.
- File Name	Custom file name for the generated audio file.
- Streaming Latency	Numeric setting (0-4) to optimize streaming latency at some cost to quality. Values range from no optimization (0) to max optimization with text normalizer off (4).
- Output Format	Audio output format options such as MP3 (various bitrates), PCM (various sample rates and bit depths), and μ-Law.
- Enable Logging	Boolean to enable or disable logging; disabling results in zero retention mode (no history features).

Output

The node outputs the generated speech audio as binary data. The binary data contains the audio file encoded in the selected output format (e.g., MP3, PCM). The output includes metadata such as the file name and binary property name if customized.

The json output field typically contains metadata about the request and response but the main payload is the audio binary data suitable for playback, storage, or further processing.

Dependencies

Requires an active API key credential for the ElevenLabs service.
Network access to the ElevenLabs API endpoint (https://api.elevenlabs.io/v1).
Proper configuration of authentication credentials within n8n.
Optional: Access to voice and model lists via API for selection.

Troubleshooting

Invalid Voice ID or Model ID: Ensure the voice and model identifiers are correct and available in your ElevenLabs account.
API Authentication Errors: Verify that the API key credential is correctly configured and has necessary permissions.
Unsupported Language Code: Only certain models support forced language codes; check compatibility.
Streaming Latency Settings Impact Quality: Higher latency optimizations may degrade audio quality; adjust accordingly.
Binary Data Handling: Make sure downstream nodes or systems can handle the binary audio data properly.
Stitching Context Errors: When using stitching, ensure previous and next request IDs are valid and correctly formatted.