Actions5
- Text to Speech Actions
- Voice Changer Actions
- Translation Actions
- Dubbing Actions
Overview
The "Text to Speech - Generate Speech" operation converts input text into spoken audio using AI voices. It supports advanced features like inserting pauses within the text, selecting different voices and voice styles, adjusting pitch and speed, and customizing pronunciation. This node is useful for automating voiceover creation, generating audio content for accessibility, creating announcements, or producing audio previews from text.
For example, you can input a marketing script with embedded pauses to create a natural-sounding advertisement, or convert long-form articles into audio podcasts by selecting an appropriate voice and language locale.
Properties
| Name | Meaning |
|---|---|
| Text | The text to be synthesized into speech. Supports special tags like [pause 1s] to control timing in the audio. |
| Voice | The AI voice used for synthesis. Choose from a searchable list of available voices or specify a voice ID directly. |
| Output Locale | Language/locale for the generated audio. Select from supported locales for the chosen voice or enter a locale code manually (e.g., en-US). |
| Voice Style | The style of the voiceover (e.g., formal, casual). Available styles depend on the selected voice and locale. |
| Audio Format | Format of the output audio file. Options include ALAW, FLAC, MP3, OGG, PCM, ULAW, WAV. Default is WAV. |
| Encode as Base64 | Whether to receive the audio data as a Base64 encoded string instead of a URL link. Useful if you want to embed audio directly in JSON output. |
| Additional Options | Collection of optional parameters: |
| - Audio Duration (Seconds) | Specify length of generated audio in seconds (0 to ignore). Only applicable for Gen2 model. |
| - Channel Type | Audio channel configuration: Mono or Stereo. |
| - Pitch | Adjust voice pitch from -50 (lower) to 50 (higher). |
| - Pronunciation Dictionary | Custom pronunciations for specific words. Define word, type (IPA or SAY_AS), and pronunciation text to override default pronunciation. |
| - Rate (Speed) | Adjust speaking speed from -50 (slower) to 50 (faster). |
| - Sample Rate | Audio sample rate in Hz. Options: 8000, 24000, 44100, 48000. Default is 44100 Hz. |
| - Variation | Adds variation in pause, pitch, and speed (0-5). Only for Gen2 model. |
| - Word Durations as Original Text | Whether word durations should return words exactly as in original input text (English only). |
Output
The node outputs JSON data containing details about the generated speech audio. This typically includes:
- A URL to the generated audio file (unless Base64 encoding is enabled).
- Metadata such as audio format, duration, sample rate, and possibly word timing information.
- If "Encode as Base64" is enabled, the audio content is returned as a Base64 encoded string instead of a URL.
Binary data output is not explicitly mentioned, so audio is provided via URL or Base64 string.
Dependencies
- Requires an API key credential for accessing the Murf AI service.
- The node makes HTTP requests to Murf AI's endpoints to fetch available voices, locales, styles, and to generate speech.
- No other external dependencies are indicated.
Troubleshooting
Common issues:
- Invalid or missing API key will cause authentication errors.
- Selecting a voice that does not support the chosen locale or style may result in empty or error responses.
- Providing unsupported audio formats or invalid parameter values (e.g., pitch out of range) may cause request failures.
- Network connectivity issues can prevent API calls.
Error messages:
- Authentication errors: Check that the API key credential is correctly configured.
- "Voice Not Found" or "No Styles Available": Verify voice and locale selections.
- Parameter validation errors: Ensure numeric values are within allowed ranges.
Resolution tips:
- Use the built-in searchable lists to select valid voices, locales, and styles.
- Validate all numeric inputs against their min/max constraints.
- Enable "Continue On Fail" to handle individual item errors gracefully during batch processing.