Edge TTS

Text-to-Speech using Microsoft Edge capabilities

Actions4

Voice Actions
Synthesize Actions
- Text to Speech

Overview

This node performs text-to-speech synthesis using Microsoft Edge's capabilities. It converts input text or SSML (Speech Synthesis Markup Language) into spoken audio, allowing users to generate speech output from textual content. This is useful for applications such as creating audio versions of written content, generating voice responses for chatbots, or accessibility features.

Use Case Examples

Converting a plain text message into speech audio for a voice assistant.
Generating speech from SSML input to control speech prosody and pronunciation.
Creating audio files from text for podcasts or e-learning materials.

Properties

Name	Meaning
Input Text	The text or SSML content to be converted into speech. This is the main input for synthesis and is required.
Input Type	Specifies the type of input text, whether it is plain text, SSML, or auto-detected.
Voice	The voice to use for speech synthesis, e.g., en-US-AriaNeural or es-ES-ElviraNeural. This property is required when input type is auto or plain text.
Additional Options	Optional settings to customize the speech output, including audio info inclusion, metadata, pitch, rate, and volume adjustments.

Output

JSON

audioData - The synthesized speech audio data, typically encoded in a suitable audio format.
audioInfo - Optional audio information such as size, duration, and format if requested.
metadata - Optional word boundaries metadata with timestamps if requested.

Dependencies

Microsoft Edge Text-to-Speech capabilities or API

Troubleshooting

Ensure the input text or SSML is correctly formatted to avoid synthesis errors.
Verify that the selected voice is supported and correctly specified.
Check that pitch, rate, and volume adjustments are within allowed ranges to prevent synthesis issues.
If the node fails, review error messages for unsupported operations or invalid parameters and adjust accordingly.

Edge TTSInstall