DeepInfra icon

DeepInfra

Use DeepInfra API for AI operations

Actions6

Overview

This node integrates with the DeepInfra API to perform text-to-speech (TTS) generation. It converts input text into spoken audio using various AI speech synthesis models. This is useful for automating voice content creation, such as generating audio narrations, voice assistants, accessibility features, or any application requiring natural-sounding speech output from text.

For example, you can input a product description and generate an MP3 audio file of that description being read aloud in a chosen voice and speed. The node supports multiple TTS models and allows customization of voice and speech speed.

Properties

Name Meaning
Model The TTS model to use for speech generation. Options: "Hexgrad Kokoro-82M", "Zyphra Zonos-V0.1-Hybrid", "Zyphra Zonos-V0.1-Transformer".
Text The text string that will be converted into speech audio.
Options Collection of additional options:
- Voices: One or more voice identifiers to use for speech synthesis (model-specific). Example values include "af_bella" or "default".
- Speed: A number controlling the speed of the generated speech, ranging from 0.5 (half speed) to 2 (double speed), default is 1.

Output

The node outputs an array of items where each item contains:

  • json:

    • success: Boolean indicating if the audio was successfully generated.
    • model: The name of the model used.
    • If unsuccessful, an error field with a message explaining the failure.
  • binary:

    • audio: Contains the generated speech audio data encoded in base64 format.
    • mimeType: Set to "audio/mp3" indicating the audio format.

This binary audio data can be used downstream in workflows for playback, storage, or further processing.

Dependencies

  • Requires an active API key credential for the DeepInfra API.
  • Makes HTTP POST requests to https://api.deepinfra.com/v1/inference/{model} endpoint.
  • Uses Axios library internally for HTTP requests.
  • No additional environment variables are explicitly required beyond the API key credential.

Troubleshooting

  • No audio data returned from the API: This error occurs if the API response does not contain audio data. Check that the input text is valid and not empty, and verify that the selected model supports the requested voices and options.
  • Invalid voice identifiers: Using unsupported or misspelled voice identifiers may cause the API to fail or return no audio. Use only documented voice IDs per model.
  • API authentication errors: Ensure the API key credential is correctly configured and has sufficient permissions.
  • Speed out of range: The speed parameter must be between 0.5 and 2. Values outside this range may cause errors or unexpected behavior.
  • Network or connectivity issues: Since the node relies on external API calls, network problems can cause failures. Verify internet connectivity and API availability.

If the node is set to continue on failure, errors will be returned in the JSON output instead of stopping the workflow.

Links and References

Discussion