Speech TTS Node icon

Speech TTS Node

Synthesize text to speech using the given voice profile.

Overview

This node converts text into speech audio using a specified voice profile and supports Chinese dialects (Mandarin and Cantonese). It is useful for applications that require automated voice generation, such as creating audio content, voice assistants, or accessibility features. For example, you can input a text string in Mandarin and receive an audio file or base64-encoded audio representing the spoken version of that text.

Properties

Name Meaning
Chinese Language Selects the Chinese dialect for text-to-speech synthesis. Options: "Mandarin" or "Cantonese" (yue). Required when generating audio in these dialects.
Return Type Determines the format of the output audio. Options: "Base64_audio" (base64 encoded audio string) or "File_url" (a URL pointing to the audio file).
Voice Profile ID The identifier of the voice profile to use for speech synthesis. This controls the voice characteristics like tone and style.
Text The text string to be converted into speech.
Options Additional optional parameters:
- Fragment Interval: Controls the length of pause between sentences (range 0 to 1, default 0.3).
- Temperature: Controls randomness of the TTS model output (range 0 to 1, default 1).

Output

The node outputs JSON data containing the response from the TTS service. Depending on the selected "Return Type," the output JSON will include either:

  • A base64-encoded audio string representing the synthesized speech, or
  • A URL linking to the generated audio file.

No binary data is directly output by the node; audio is provided as encoded strings or URLs.

Example output JSON structure:

{
  "audio": "<base64_encoded_audio_string>"
}

or

{
  "file_url": "https://example.com/path/to/audio/file.mp3"
}

Dependencies

  • Requires an API token credential for authentication with the external TTS service.
  • Needs the API domain URL configured via credentials.
  • The node makes HTTP POST requests to the /speech/tts endpoint of the configured API domain.
  • No additional environment variables are required beyond the API token and domain configuration.

Troubleshooting

  • Common issues:

    • Invalid or missing API token may cause authentication failures.
    • Incorrect voice profile ID or unsupported Chinese language option may result in errors or no audio output.
    • Network connectivity problems can lead to request timeouts or failures.
  • Error messages:

    • "Error tts request: <message>": Indicates a failure during the HTTP request to the TTS service. Check API token validity, endpoint URL, and network status.
    • If the node logs "Request error:" followed by details, inspect those details for clues about malformed requests or server-side issues.
  • Resolution tips:

    • Verify that the API token and domain are correctly set in credentials.
    • Ensure the voice profile ID matches one available in your TTS service account.
    • Confirm that the text input is not empty and properly formatted.
    • Adjust "Fragment Interval" and "Temperature" options if the output audio quality or style is unsatisfactory.

Links and References

  • Refer to your TTS service provider’s API documentation for details on voice profiles, supported languages, and parameter tuning.
  • n8n documentation on creating custom nodes for further customization guidance.

Discussion