ElevenLabs icon

ElevenLabs

WIP

Overview

The "Speech to Speech" operation of the ElevenLabs node converts input speech audio into a new speech audio output using a selected voice model. This transformation can be used to modify or clone voices, apply different speaking styles, or generate speech with specific voice characteristics.

Common scenarios include:

  • Voice cloning or voice conversion for content creation.
  • Generating speech in a particular voice style or tone.
  • Enhancing audio content by applying voice similarity boosts or stability adjustments.
  • Creating personalized audio messages or narrations with custom voice parameters.

For example, you could input an audio clip of someone speaking and output a version of that speech rendered in a celebrity's voice or a synthetic voice tailored to your brand.

Properties

Name Meaning
Voice ID The identifier of the voice to use for generating the speech output. You can select from a list of available voices or enter a specific voice ID manually.
Additional Fields A collection of optional parameters to customize the speech synthesis:
- Binary Name: Name of the binary output data (default: "data").
- File Name: Name of the output audio file (default: "voice").
- Model Name or ID: Select or specify the model to use.
- Output Format: Audio format of the generated speech (e.g., mp3_44100_128).
- Seed: Fixed seed number for reproducibility.
- Similarity Boost: Value between 0 and 1 to control how closely the output matches the original voice.
- Speaker Boost: Boolean to activate speaker boost feature.
- Stability: Value between 0 and 1 controlling voice stability.
- Stitching: Boolean to enable stitching, which provides context by passing previous text.
- Streaming Latency: Integer 0-4 to optimize latency at some quality cost.
- Style: Value between 0 and 1 to exaggerate voice style.

Output

The node outputs JSON data containing the generated speech audio. The audio is provided as binary data under a configurable binary property name (default "data"). The binary data represents the synthesized speech audio file in the specified output format (e.g., MP3).

The JSON output typically includes metadata about the generated audio, such as file name and format, alongside the binary audio content.

Dependencies

  • Requires an API key credential for ElevenLabs API authentication.
  • Depends on ElevenLabs cloud service endpoints for speech synthesis.
  • Network connectivity to https://api.elevenlabs.io/v1 is necessary.
  • Optional: Access to voice and model lists via API for selection.

Troubleshooting

  • Invalid Voice ID: If the voice ID is incorrect or not found, the API will return an error. Verify the voice ID or select from the provided list.
  • Authentication Errors: Ensure the API key credential is correctly configured and has sufficient permissions.
  • Unsupported Output Format: Using an unsupported audio format may cause failures. Use one of the supported formats like "mp3_44100_128".
  • Latency vs Quality Tradeoff: Setting streaming latency optimization too high may degrade audio quality.
  • Missing Input Data: Ensure the input speech data is correctly provided and accessible to the node.
  • API Rate Limits: Excessive requests may lead to throttling; monitor usage accordingly.

Links and References

Discussion