Overview
This node converts input text into speech audio using the Doubao TTS (Text-to-Speech) API. It is useful for automating voice content generation, such as creating audio versions of articles, notifications, or interactive voice responses. For example, you can input a product description and generate an MP3 audio file with a female or male voice, adjusting speed, pitch, volume, and emotion to suit your needs.
Properties
| Name | Meaning |
|---|---|
| Text | The text string to convert into speech. |
| Voice Type | Selects the voice style: "BV001 (Female)" or "BV002 (Male)". |
| Audio Encoding | Output audio format: "MP3" or "WAV". |
| Speed Ratio | Speech speed multiplier, from 0.5 (half speed) to 2.0 (double speed). |
| Volume Ratio | Volume level multiplier, from 0.1 (quiet) to 3.0 (loud). |
| Pitch Ratio | Pitch adjustment multiplier, from 0.5 (lower pitch) to 2.0 (higher pitch). |
| Emotion | Emotional tone of the speech: "Normal", "Happy", or "Sad". |
| Custom Filename | Optional custom filename (without extension) for the output audio file. Supports expressions. If empty or invalid, an auto-generated name is used. |
Output
The node outputs one item per input with the following structure:
json:success: Boolean indicating if synthesis succeeded.reqid: Request ID returned by the API.operation: Operation type (always "query").message: Status message from the API.sequence: Sequence number from the API response.audioData: Base64-encoded audio data string.mimeType: MIME type of the audio (e.g.,audio/mp3oraudio/wav).size: Size in bytes of the decoded audio buffer.text: Original input text.voiceType: Selected voice type.encoding: Audio encoding format.fileName: Final filename used for the audio file.addition: Additional data from the API response (if any).
binary:audio: Contains the audio file data with properties:data: Base64-encoded audio content.mimeType: MIME type matching the encoding.fileName: Filename including extension.fileExtension: File extension (mp3orwav).fileSize: Size in bytes as a string.
This binary data can be used directly in subsequent nodes for saving or playback.
Dependencies
Requires credentials containing:
- An application ID.
- An access token for authentication.
- Optionally, a cluster identifier (defaults to
"volcano_tts"if not provided).
Makes HTTP POST requests to the Doubao TTS API endpoint at:
https://openspeech.bytedance.com/api/v1/ttsRequires network access to the above API.
Troubleshooting
Empty Text Error: If the "Text" property is empty or whitespace, the node throws an error stating "Text cannot be empty." Ensure valid text input.
Missing Credentials: Errors occur if the App ID or Access Token are missing in credentials. Verify that these are correctly configured.
API Errors: If the API returns a non-OK status or error code, the node throws an error with details. Common causes include invalid tokens, quota limits, or malformed requests.
Custom Filename Issues: If the custom filename expression is invalid or results in an empty string, the node falls back to an auto-generated filename. Use valid expressions or plain strings without special characters.
Network Issues: Connectivity problems to the API endpoint will cause request failures. Check internet connection and firewall settings.
Links and References
- Doubao TTS API Documentation (official API endpoint referenced)
- n8n documentation on Creating Custom Nodes