AIConnect icon

AIConnect

Use OpenAI-compatible API functions

Overview

The node provides functionality to generate speech audio from input text using various AI audio models. It is designed to convert textual content into spoken audio in different voices and formats, allowing users to create custom audio outputs for applications such as voice assistants, audiobooks, accessibility tools, or multimedia content creation.

Typical use cases include:

  • Generating voiceovers for videos or presentations.
  • Creating audio versions of written content for accessibility.
  • Producing dynamic speech responses in chatbots or virtual assistants.
  • Experimenting with different voices and audio formats for creative projects.

Properties

Name Meaning
Simplify Output Whether to return a simplified version of the response instead of the raw data.
Model The AI model used for generating the audio. Models are dynamically loaded via getAudioModels.
Input Text The text string (up to 4096 characters) to be converted into speech audio.
Voice The voice style to use for speech generation. Options: Alloy, Echo, Fable, Onyx, Nova, Shimmer.
Response Format The audio file format of the generated speech. Options: MP3, Opus, AAC, FLAC, WAV, PCM.
Speed Playback speed multiplier for the generated audio, ranging from 0.25x to 4x speed.
Additional Options Collection of extra options; currently supports specifying a unique user identifier ("User").

Output

The node outputs JSON data containing the result of the speech generation operation. When simplified output is enabled, this typically includes essential information such as:

  • The generated audio content encoded in the selected format (likely as a base64 string or a URL).
  • Metadata about the audio generation (e.g., duration, model used).

If binary data output is supported, it would represent the actual audio file in the chosen format, ready for further processing or download.

Dependencies

  • Requires an API key credential for accessing the underlying OpenAI-compatible audio generation service.
  • The node depends on external AI models that are dynamically fetched via the getAudioModels method.
  • Proper configuration of the API authentication token or key within n8n credentials is necessary.

Troubleshooting

  • Common Issues:

    • Invalid or missing API credentials will cause authentication failures.
    • Exceeding the maximum input text length (4096 characters) may result in errors.
    • Selecting unsupported combinations of voice and model might lead to unexpected results or errors.
    • Network connectivity issues can prevent successful API calls.
  • Error Messages:

    • "The resource \"audio\" is not supported!" — indicates an invalid resource parameter; ensure "audio" is selected.
    • API error messages related to quota limits or invalid parameters should be resolved by checking account limits and input correctness.
    • If the node returns an error object in the output JSON, enabling "Continue On Fail" allows workflow continuation while capturing error details.

Links and References

Discussion