Edge TTS icon

Edge TTS

Text-to-Speech using Microsoft Edge capabilities

Actions4

Overview

This node performs text-to-speech synthesis using Microsoft Edge's capabilities. It converts input text or SSML (Speech Synthesis Markup Language) into spoken audio, allowing users to generate speech output from textual content. This is useful for applications such as creating audio versions of written content, generating voice responses for chatbots, or accessibility features.

Use Case Examples

  1. Converting a plain text message into speech audio for a voice assistant.
  2. Generating speech from SSML input to control speech prosody and pronunciation.
  3. Creating audio files from text for podcasts or e-learning materials.

Properties

Name Meaning
Input Text The text or SSML content to be converted into speech. This is the main input for synthesis and is required.
Input Type Specifies the type of input text, whether it is plain text, SSML, or auto-detected.
Voice The voice to use for speech synthesis, e.g., en-US-AriaNeural or es-ES-ElviraNeural. This property is required when input type is auto or plain text.
Additional Options Optional settings to customize the speech output, including audio info inclusion, metadata, pitch, rate, and volume adjustments.

Output

JSON

  • audioData - The synthesized speech audio data, typically encoded in a suitable audio format.
  • audioInfo - Optional audio information such as size, duration, and format if requested.
  • metadata - Optional word boundaries metadata with timestamps if requested.

Dependencies

  • Microsoft Edge Text-to-Speech capabilities or API

Troubleshooting

  • Ensure the input text or SSML is correctly formatted to avoid synthesis errors.
  • Verify that the selected voice is supported and correctly specified.
  • Check that pitch, rate, and volume adjustments are within allowed ranges to prevent synthesis issues.
  • If the node fails, review error messages for unsupported operations or invalid parameters and adjust accordingly.

Discussion