Doubao TTS icon

Doubao TTS

Convert text to speech using Doubao TTS API

Overview

This node converts input text into speech audio using the Doubao TTS (Text-to-Speech) API. It is useful for automating voice content generation, such as creating audio versions of articles, notifications, or interactive voice responses. For example, you can input a product description and generate an MP3 audio file with a female or male voice, adjusting speed, pitch, volume, and emotion to suit your needs.

Properties

Name Meaning
Text The text string to convert into speech.
Voice Type Selects the voice style: "BV001 (Female)" or "BV002 (Male)".
Audio Encoding Output audio format: "MP3" or "WAV".
Speed Ratio Speech speed multiplier, from 0.5 (half speed) to 2.0 (double speed).
Volume Ratio Volume level multiplier, from 0.1 (quiet) to 3.0 (loud).
Pitch Ratio Pitch adjustment multiplier, from 0.5 (lower pitch) to 2.0 (higher pitch).
Emotion Emotional tone of the speech: "Normal", "Happy", or "Sad".
Custom Filename Optional custom filename (without extension) for the output audio file. Supports expressions. If empty or invalid, an auto-generated name is used.

Output

The node outputs one item per input with the following structure:

  • json:

    • success: Boolean indicating if synthesis succeeded.
    • reqid: Request ID returned by the API.
    • operation: Operation type (always "query").
    • message: Status message from the API.
    • sequence: Sequence number from the API response.
    • audioData: Base64-encoded audio data string.
    • mimeType: MIME type of the audio (e.g., audio/mp3 or audio/wav).
    • size: Size in bytes of the decoded audio buffer.
    • text: Original input text.
    • voiceType: Selected voice type.
    • encoding: Audio encoding format.
    • fileName: Final filename used for the audio file.
    • addition: Additional data from the API response (if any).
  • binary:

    • audio: Contains the audio file data with properties:
      • data: Base64-encoded audio content.
      • mimeType: MIME type matching the encoding.
      • fileName: Filename including extension.
      • fileExtension: File extension (mp3 or wav).
      • fileSize: Size in bytes as a string.

This binary data can be used directly in subsequent nodes for saving or playback.

Dependencies

  • Requires credentials containing:

    • An application ID.
    • An access token for authentication.
    • Optionally, a cluster identifier (defaults to "volcano_tts" if not provided).
  • Makes HTTP POST requests to the Doubao TTS API endpoint at:
    https://openspeech.bytedance.com/api/v1/tts

  • Requires network access to the above API.

Troubleshooting

  • Empty Text Error: If the "Text" property is empty or whitespace, the node throws an error stating "Text cannot be empty." Ensure valid text input.

  • Missing Credentials: Errors occur if the App ID or Access Token are missing in credentials. Verify that these are correctly configured.

  • API Errors: If the API returns a non-OK status or error code, the node throws an error with details. Common causes include invalid tokens, quota limits, or malformed requests.

  • Custom Filename Issues: If the custom filename expression is invalid or results in an empty string, the node falls back to an auto-generated filename. Use valid expressions or plain strings without special characters.

  • Network Issues: Connectivity problems to the API endpoint will cause request failures. Check internet connection and firewall settings.

Links and References

Discussion