Speech TTS Node icon

Speech TTS Node

Synthesize text to speech using the given voice profile.

Overview

This node converts text into speech audio using a specified voice profile and supports Chinese dialects (Mandarin and Cantonese). It is useful for applications that require automated voice generation, such as creating audio content, voice assistants, or accessibility tools. For example, you can input a Mandarin sentence and receive an audio file or base64-encoded audio string of the spoken text.

Properties

Name Meaning
Chinese Language Selects the Chinese dialect for speech synthesis. Options: "Mandarin" or "Cantonese". Required only when generating audio in these dialects.
Return Type Determines the format of the output audio. Options: "Base64_audio" (base64 encoded audio string) or "File_url" (URL to the audio file).
Voice Profile ID The identifier of the voice profile to use for speech synthesis. This is required to specify which voice characteristics to apply.
Text The text string to be converted into speech.
Options Additional parameters controlling speech synthesis:
- Fragment Interval: Length of pause between sentences (0 to 1, default 0.3).
- Temperature: Controls randomness of the TTS model output (0 to 1, default 1.0).

Output

The node outputs JSON data containing the response from the speech synthesis API. Depending on the selected return type, the output JSON will include either:

  • A base64-encoded audio string representing the synthesized speech, or
  • A URL pointing to the generated audio file.

No binary data output is directly handled by this node; all audio data is returned within the JSON structure.

Dependencies

  • Requires an API token credential for authentication with the external speech synthesis service.
  • Needs the API domain URL configured via credentials.
  • The node makes HTTP POST requests to the /speech/tts endpoint of the configured API domain.
  • No other external dependencies are required.

Troubleshooting

  • Common issues:

    • Invalid or missing API token or domain configuration will cause authentication failures.
    • Incorrect or empty voice profile ID may result in errors or no audio output.
    • Providing unsupported Chinese dialect values may cause the API to reject the request.
    • Network or API downtime can lead to request errors.
  • Error messages:

    • "Error tts request: ..." indicates a failure during the API call. Check API credentials, network connectivity, and parameter correctness.
    • If the node returns an error object in the output JSON, verify the input parameters and ensure the voice profile ID and text are valid.

Links and References

  • Refer to your speech synthesis service's API documentation for details on voice profiles, supported languages, and options.
  • n8n documentation on creating custom nodes for further customization guidance.

Discussion