ElevenLabs icon

ElevenLabs

Generate natural-sounding speech using ElevenLabs AI

Overview

The ElevenLabs node enables generating natural-sounding speech from text using ElevenLabs AI. It is designed to convert input text into audio speech with customizable voice parameters, supporting multiple voices and models. This node is useful in scenarios such as creating voiceovers for videos, generating audio content for accessibility, or building interactive voice applications.

For example, you can input a motivational quote and generate an MP3 audio file spoken by a selected voice model. You can also fine-tune voice characteristics like style, stability, and similarity boost to match specific needs.

Properties

Name Meaning
API Key Notice Notice informing the user that an API key from ElevenLabs is required, with a link to obtain it.
Text The text string to be converted into speech. Required.
Voice Name or ID Select a voice from a list or specify a voice ID via expression. Determines which voice will speak the text. Required.
Model Name or ID Select a speech synthesis model from a list or specify a model ID via expression. Required.
Language Support Notice Notice explaining that language codes are only supported by the Turbo v2.5 model; other models will error if a language code is provided.
Options (additionalFields) A collection of optional settings:
  • Binary Name: Property name to store binary audio data.
  • File Name: Name of the generated audio file.
  • Output Format: Audio format (e.g., FLAC, MP3, WAVE).
  • Voice Settings: Parameters to adjust voice characteristics:
    • Similarity Boost (0-1): How closely the voice matches original.
    • Stability (0-1): Variation across re-generations.
    • Style (0-1): Amount of style applied.
    • Speaker Boost (boolean): Enhances clarity and reduces background noise.
  • Streaming Latency: Optimize streaming latency with trade-offs in quality.
  • Text Normalization: Controls normalization of input text before generation (auto/on/off). Not supported on Turbo v2.5 model.
  • Language Code: ISO 639-1 language code (e.g., "en", "de"). Only works with Turbo v2.5 model.
  • Next Text: Text following current input to improve prosody when concatenating speech.
  • Previous Text: Text preceding current input to improve prosody.
  • Seed: Numeric seed for consistent voice output (0-4294967295).

Output

The node outputs the generated speech audio data in the binary property specified by the user (default property name is "data"). The audio file format corresponds to the chosen output format (e.g., MP3, FLAC, WAVE). The binary data contains the synthesized speech audio ready for further use or saving.

Additionally, the node outputs JSON metadata about the generation process, but the primary output is the audio binary data.

Dependencies

  • Requires an API key credential from ElevenLabs to authenticate requests.
  • Network access to ElevenLabs API endpoint (https://api.elevenlabs.io/v1).
  • Proper configuration of the API key in n8n credentials.

Troubleshooting

  • Missing or invalid API key: The node will fail if the API key is not set or incorrect. Ensure you have obtained a valid API key from the ElevenLabs dashboard and configured it properly in n8n.
  • Unsupported language code: Providing a language code with a model other than Turbo v2.5 will cause errors. Use language codes only with Turbo v2.5.
  • Invalid voice or model ID: Selecting or specifying a voice or model ID that does not exist will result in errors. Use the provided dropdown lists or verify IDs carefully.
  • Audio format issues: Choosing an unsupported or incompatible output format may cause failures. Stick to the listed formats.
  • Latency optimization trade-offs: Using strong latency optimizations may reduce audio quality or cause mispronunciations, especially with the "Maximum + No Text Normalization" option.
  • Text normalization conflicts: Enabling text normalization on Turbo v2.5 model is not allowed and will cause errors.

Links and References

Discussion