DeepInfra

Use DeepInfra API for AI operations

Actions6

Chat Actions
- Completion
Embedding Actions
- Embed
Image Actions
- Generate
Speech Recognition Actions
- Transcribe
- Translate
Text to Speech Actions
- Generate

Overview

This node integrates with the DeepInfra API to perform text-to-speech (TTS) generation. It converts input text into spoken audio using various AI speech synthesis models. This is useful for automating voice content creation, such as generating audio narrations, voice assistants, accessibility features, or any application requiring natural-sounding speech output from text.

For example, you can input a product description and generate an MP3 audio file of that description being read aloud in a chosen voice and speed. The node supports multiple TTS models and allows customization of voice and speech speed.

Properties

Name	Meaning
Model	The TTS model to use for speech generation. Options: "Hexgrad Kokoro-82M", "Zyphra Zonos-V0.1-Hybrid", "Zyphra Zonos-V0.1-Transformer".
Text	The text string that will be converted into speech audio.
Options	Collection of additional options: - Voices: One or more voice identifiers to use for speech synthesis (model-specific). Example values include "af_bella" or "default". - Speed: A number controlling the speed of the generated speech, ranging from 0.5 (half speed) to 2 (double speed), default is 1.

Output

The node outputs an array of items where each item contains:

json:
- success: Boolean indicating if the audio was successfully generated.
- model: The name of the model used.
- If unsuccessful, an error field with a message explaining the failure.
binary:
- audio: Contains the generated speech audio data encoded in base64 format.
- mimeType: Set to "audio/mp3" indicating the audio format.

This binary audio data can be used downstream in workflows for playback, storage, or further processing.

Dependencies

Requires an active API key credential for the DeepInfra API.
Makes HTTP POST requests to https://api.deepinfra.com/v1/inference/{model} endpoint.
Uses Axios library internally for HTTP requests.
No additional environment variables are explicitly required beyond the API key credential.

Troubleshooting

No audio data returned from the API: This error occurs if the API response does not contain audio data. Check that the input text is valid and not empty, and verify that the selected model supports the requested voices and options.
Invalid voice identifiers: Using unsupported or misspelled voice identifiers may cause the API to fail or return no audio. Use only documented voice IDs per model.
API authentication errors: Ensure the API key credential is correctly configured and has sufficient permissions.
Speed out of range: The speed parameter must be between 0.5 and 2. Values outside this range may cause errors or unexpected behavior.
Network or connectivity issues: Since the node relies on external API calls, network problems can cause failures. Verify internet connectivity and API availability.

If the node is set to continue on failure, errors will be returned in the JSON output instead of stopping the workflow.

Links and References

DeepInfra API Documentation (general reference for API usage)
Text-to-Speech Models and Voices (for available models and voice identifiers)
n8n Documentation on Binary Data Handling

DeepInfraInstall