Actions16
Overview
The ElevenLabs node provides advanced speech-related functionalities, focusing here on the Speech resource with the Text to Speech operation. This operation converts input text into spoken audio using a selected voice model. It is useful for automating voice generation in applications such as virtual assistants, audiobooks, accessibility tools, or any scenario requiring natural-sounding synthesized speech.
Users can customize the voice characteristics, output format, and other parameters to tailor the audio output to their needs. For example, you might convert customer support responses into speech for an interactive voice response system or generate narration for video content automatically.
Properties
| Name | Meaning |
|---|---|
| Text | The text string that will be converted into speech. |
| Voice ID | The identifier of the voice to use for synthesis. Can be selected from a searchable list of available voices or entered manually by ID. |
| Model Name or ID | The specific voice model to use for generating speech. Selectable from a list or specified by ID. |
| Language Code | ISO 639-1 language code to enforce a specific language for the model (only supported by certain models). |
| Stability | Controls how stable the voice sounds; a number between 0 and 1 where higher values mean more stability. |
| Similarity Boost | Adjusts how closely the generated voice matches the original voice; a value between 0 and 1. |
| Style | Exaggerates the voice style; a number between 0 and 1. |
| Speaker Boost | Boolean flag to activate speaker boost feature. |
| Seed | A numeric seed to make the text-to-speech deterministic; same seed and text produce identical audio. Range: 0 to 4294967295. |
| Apply Text Normalization | Controls text normalization behavior: Auto (system decides), On (always applied), Off (skipped). |
| Use PVC as IVC | Boolean to choose whether to use the IVC version of the voice instead of the PVC version. |
| Stitching | Enables stitching mode to provide context by passing previous and next request IDs for smoother audio transitions. |
| Previous Request IDs | Comma-separated list of up to 3 request IDs representing prior audio samples for stitching context. |
| Next Request IDs | Comma-separated list of up to 3 request IDs representing subsequent audio samples for stitching context. |
| Additional Fields | Collection of optional fields including: |
| - Binary Name | Custom name for the binary output data. |
| - File Name | Custom file name for the generated audio file. |
| - Streaming Latency | Numeric setting (0-4) to optimize streaming latency at some cost to quality. Values range from no optimization (0) to max optimization with text normalizer off (4). |
| - Output Format | Audio output format options such as MP3 (various bitrates), PCM (various sample rates and bit depths), and μ-Law. |
| - Enable Logging | Boolean to enable or disable logging; disabling results in zero retention mode (no history features). |
Output
The node outputs the generated speech audio as binary data. The binary data contains the audio file encoded in the selected output format (e.g., MP3, PCM). The output includes metadata such as the file name and binary property name if customized.
The json output field typically contains metadata about the request and response but the main payload is the audio binary data suitable for playback, storage, or further processing.
Dependencies
- Requires an active API key credential for the ElevenLabs service.
- Network access to the ElevenLabs API endpoint (
https://api.elevenlabs.io/v1). - Proper configuration of authentication credentials within n8n.
- Optional: Access to voice and model lists via API for selection.
Troubleshooting
- Invalid Voice ID or Model ID: Ensure the voice and model identifiers are correct and available in your ElevenLabs account.
- API Authentication Errors: Verify that the API key credential is correctly configured and has necessary permissions.
- Unsupported Language Code: Only certain models support forced language codes; check compatibility.
- Streaming Latency Settings Impact Quality: Higher latency optimizations may degrade audio quality; adjust accordingly.
- Binary Data Handling: Make sure downstream nodes or systems can handle the binary audio data properly.
- Stitching Context Errors: When using stitching, ensure previous and next request IDs are valid and correctly formatted.