ElevenLabs icon

ElevenLabs

WIP

Overview

The ElevenLabs node's Speech - Text to Speech operation converts input text into spoken audio using a specified voice model. It supports advanced features such as including character-level timing data, selecting different output audio formats, and fine-tuning voice characteristics like stability and style.

This node is beneficial for automating voice content generation in applications such as:

  • Creating voiceovers for videos or presentations.
  • Generating audio responses for chatbots or virtual assistants.
  • Producing podcasts or audiobooks from text scripts.
  • Accessibility tools that read text aloud.

For example, you can input a motivational quote and generate an MP3 audio file of it spoken by a chosen voice, optionally receiving detailed timing information for subtitles or lip-syncing.

Properties

Name Meaning
Text The text string to be converted into speech.
Include Character Timing Whether to include character-level timing info in the response. If enabled, the output includes JSON with base64 audio plus timing data instead of just binary audio.
Voice ID The voice to use for speech synthesis. Can be selected from a searchable list or specified by ID.
Additional Fields A collection of optional parameters:
• Binary Name: Custom name for the output binary data.
• File Name: Custom filename for the generated audio.
• Streaming Latency: Level of latency optimization (0-4).
• Output Format: Audio format (e.g., MP3 128kbps, PCM variants, μ-law).
• Language Code: ISO 639-1 code to enforce language.
• Model Name or ID: Select or specify the TTS model.
• Stability: Voice stability level (0-1).
• Similarity Boost: Voice similarity boost (0-1).
• Style: Degree of voice style exaggeration (0-1).
• Speaker Boost: Enable speaker boost (boolean).
• Seed: Numeric seed for deterministic output.
• Enable Logging: Enable or disable logging.
• Text Normalization: Control text normalization (auto, on, off).
• Use PVC as IVC: Boolean flag affecting voice version used.
• Stitching: Enable context stitching across requests.
• Previous Request IDs: Comma-separated list of prior request IDs for stitching.
• Next Request IDs: Comma-separated list of subsequent request IDs for stitching.

Output

The node outputs the generated speech audio in the json field of the item. Depending on the "Include Character Timing" option:

  • If disabled, the output is raw binary audio data in the specified format (e.g., MP3), accessible as binary data attached to the item.
  • If enabled, the output is a JSON object containing:
    • Base64 encoded audio data.
    • Detailed character-level timing information useful for subtitle synchronization or lip-syncing.

The output binary data can be saved or streamed as an audio file.

Dependencies

  • Requires an API key credential for ElevenLabs API authentication.
  • Network access to the ElevenLabs API endpoint (https://api.elevenlabs.io/v1).
  • Proper configuration of the node credentials within n8n to authorize requests.

Troubleshooting

  • Common issues:

    • Invalid or missing API key will cause authentication errors.
    • Specifying an unsupported voice ID or model may result in errors or no audio output.
    • Enabling character timing without proper handling of JSON output might cause downstream processing failures.
    • Incorrect output format selection could lead to incompatible audio files.
  • Error messages:

    • Authentication failures: Check API key validity and permissions.
    • Voice/model not found: Verify the voice ID and model ID are correct and available.
    • Rate limiting or quota exceeded: Monitor usage limits on the ElevenLabs account.
    • Network errors: Ensure stable internet connection and API endpoint accessibility.

Links and References

Discussion