Actions16
Overview
The ElevenLabs node's Speech - Text to Speech operation converts input text into spoken audio using a specified voice model. It supports advanced features such as including character-level timing data, selecting different output audio formats, and fine-tuning voice characteristics like stability and style.
This node is beneficial for automating voice content generation in applications such as:
- Creating voiceovers for videos or presentations.
- Generating audio responses for chatbots or virtual assistants.
- Producing podcasts or audiobooks from text scripts.
- Accessibility tools that read text aloud.
For example, you can input a motivational quote and generate an MP3 audio file of it spoken by a chosen voice, optionally receiving detailed timing information for subtitles or lip-syncing.
Properties
| Name | Meaning |
|---|---|
| Text | The text string to be converted into speech. |
| Include Character Timing | Whether to include character-level timing info in the response. If enabled, the output includes JSON with base64 audio plus timing data instead of just binary audio. |
| Voice ID | The voice to use for speech synthesis. Can be selected from a searchable list or specified by ID. |
| Additional Fields | A collection of optional parameters: • Binary Name: Custom name for the output binary data. • File Name: Custom filename for the generated audio. • Streaming Latency: Level of latency optimization (0-4). • Output Format: Audio format (e.g., MP3 128kbps, PCM variants, μ-law). • Language Code: ISO 639-1 code to enforce language. • Model Name or ID: Select or specify the TTS model. • Stability: Voice stability level (0-1). • Similarity Boost: Voice similarity boost (0-1). • Style: Degree of voice style exaggeration (0-1). • Speaker Boost: Enable speaker boost (boolean). • Seed: Numeric seed for deterministic output. • Enable Logging: Enable or disable logging. • Text Normalization: Control text normalization (auto, on, off). • Use PVC as IVC: Boolean flag affecting voice version used. • Stitching: Enable context stitching across requests. • Previous Request IDs: Comma-separated list of prior request IDs for stitching. • Next Request IDs: Comma-separated list of subsequent request IDs for stitching. |
Output
The node outputs the generated speech audio in the json field of the item. Depending on the "Include Character Timing" option:
- If disabled, the output is raw binary audio data in the specified format (e.g., MP3), accessible as binary data attached to the item.
- If enabled, the output is a JSON object containing:
- Base64 encoded audio data.
- Detailed character-level timing information useful for subtitle synchronization or lip-syncing.
The output binary data can be saved or streamed as an audio file.
Dependencies
- Requires an API key credential for ElevenLabs API authentication.
- Network access to the ElevenLabs API endpoint (
https://api.elevenlabs.io/v1). - Proper configuration of the node credentials within n8n to authorize requests.
Troubleshooting
Common issues:
- Invalid or missing API key will cause authentication errors.
- Specifying an unsupported voice ID or model may result in errors or no audio output.
- Enabling character timing without proper handling of JSON output might cause downstream processing failures.
- Incorrect output format selection could lead to incompatible audio files.
Error messages:
- Authentication failures: Check API key validity and permissions.
- Voice/model not found: Verify the voice ID and model ID are correct and available.
- Rate limiting or quota exceeded: Monitor usage limits on the ElevenLabs account.
- Network errors: Ensure stable internet connection and API endpoint accessibility.
Links and References
- ElevenLabs API Documentation (for detailed API capabilities)
- n8n Expressions Documentation (for dynamic property values)
- ISO 639-1 Language Codes (for language code reference)