Actions4
- Voice Actions
- Synthesize Actions
Overview
This node performs text-to-speech synthesis using Microsoft Edge's capabilities. It converts input text or SSML (Speech Synthesis Markup Language) into spoken audio, allowing users to generate speech output from textual content. This is useful for applications such as creating audio versions of written content, generating voice responses for chatbots, or accessibility features.
Use Case Examples
- Converting a plain text message into speech audio for a voice assistant.
- Generating speech from SSML input to control speech prosody and pronunciation.
- Creating audio files from text for podcasts or e-learning materials.
Properties
| Name | Meaning |
|---|---|
| Input Text | The text or SSML content to be converted into speech. This is the main input for synthesis and is required. |
| Input Type | Specifies the type of input text, whether it is plain text, SSML, or auto-detected. |
| Voice | The voice to use for speech synthesis, e.g., en-US-AriaNeural or es-ES-ElviraNeural. This property is required when input type is auto or plain text. |
| Additional Options | Optional settings to customize the speech output, including audio info inclusion, metadata, pitch, rate, and volume adjustments. |
Output
JSON
audioData- The synthesized speech audio data, typically encoded in a suitable audio format.audioInfo- Optional audio information such as size, duration, and format if requested.metadata- Optional word boundaries metadata with timestamps if requested.
Dependencies
- Microsoft Edge Text-to-Speech capabilities or API
Troubleshooting
- Ensure the input text or SSML is correctly formatted to avoid synthesis errors.
- Verify that the selected voice is supported and correctly specified.
- Check that pitch, rate, and volume adjustments are within allowed ranges to prevent synthesis issues.
- If the node fails, review error messages for unsupported operations or invalid parameters and adjust accordingly.