Actions13
- Speech Actions
- Voice Actions
- History Actions
- User Actions
Overview
The ElevenLabs Speech to Speech operation enables transforming input speech audio into a new speech audio output using AI voice cloning and synthesis technology. This node is useful for scenarios such as voice dubbing, creating personalized voice assistants, or generating speech in different styles or languages while preserving the original speaker's characteristics.
For example, you can input an audio clip of a person speaking and generate a new audio clip where the same text is spoken with a different style, clarity, or language accent. It supports fine-tuning voice parameters like similarity to the original voice, stability, style, and noise reduction.
Properties
| Name | Meaning |
|---|---|
| Voice Name or ID | Select a voice from a list or specify a voice ID to use for speech synthesis. |
| Model Name or ID | Select a speech model from a list or specify a model ID. The "Turbo v2.5" model supports language codes; others do not. |
| Binary Name | Name of the binary property where the generated audio data will be stored (default: "data"). |
| File Name | Name of the generated audio file (default: "voice"). |
| Output Format | Audio format of the generated speech. Options include FLAC (16kHz or 24kHz), MP3 (44.1KHz at 128kbps or 64kbps), MULAW (16kHz), and WAVE (16kHz or 24kHz). |
| Similarity Boost | How closely the generated voice matches the original voice (range 0-1). 0 means more freedom, 1 means very similar. |
| Stability | Controls variation across re-generations of the voice (range 0-1). 0 means more variable, 1 means more stable. |
| Style | Amount of style applied to the voice (range 0-1). 0 is neutral, 1 is maximum style. |
| Speaker Boost | Boolean to enhance voice clarity and reduce background noise. |
| Streaming Latency | Optimize streaming latency with options from no optimization (best quality) to maximum optimization (lowest latency but potentially lower quality). Includes a mode that disables text normalization for fastest response. |
| Text Normalization | Controls how text is normalized before generation. Options are Auto (system decides), On (always normalize), Off (never normalize). Cannot be enabled for Turbo v2.5 model. |
| Language Code | ISO 639-1 language code (e.g., "en", "de", "fr"). Only supported by Turbo v2.5 model. |
| Next Text | Text that follows the current text, used to improve prosody when concatenating multiple generations. |
| Previous Text | Text that precedes the current text, also used to improve prosody. |
| Seed | Numeric seed (0 to 4294967295) to fix randomness for consistent voice output. |
Output
The node outputs the generated speech audio as binary data attached to the specified binary property (default name: "data"). The binary data contains the audio file in the selected output format (e.g., MP3, FLAC, WAV). The JSON output typically includes metadata about the generation request and may contain information such as the voice ID, model ID, and any relevant status messages.
Dependencies
- Requires an API key credential from ElevenLabs to authenticate requests.
- The node communicates with the ElevenLabs API endpoint at
https://api.elevenlabs.io/v1. - Proper configuration of the API key credential in n8n is necessary.
- The user must select valid voice and model IDs available via the ElevenLabs service.
Troubleshooting
- Error due to unsupported language code: If a language code is provided with a model other than Turbo v2.5, the API will return an error. Solution: Use language codes only with Turbo v2.5 model.
- Invalid voice or model ID: Selecting or specifying an invalid voice or model ID will cause failures. Solution: Use the provided load options to select valid voices/models or verify IDs.
- API authentication errors: Missing or incorrect API key will result in authentication failures. Ensure the API key credential is correctly configured.
- Audio format issues: Specifying an unsupported output format or mismatch between format and usage might cause problems. Use one of the supported formats listed.
- Latency vs Quality tradeoff: Using aggressive streaming latency optimizations may degrade audio quality. Adjust the "Streaming Latency" option accordingly.
- Text normalization conflicts: Enabling text normalization on Turbo v2.5 model is not allowed and will cause errors. Set text normalization to "Off" or "Auto" as appropriate.