ComfyUI Text to Audio

Generate audio from text using ComfyUI workflows

Overview

This node generates audio from text using ComfyUI workflows. It accepts a JSON definition of a ComfyUI workflow designed for text-to-audio generation, sends the text input to this workflow via the ComfyUI API, and waits for the audio output. Once the audio is generated, it retrieves the audio file(s) and outputs them in binary form along with metadata.

Common scenarios include:

Converting text prompts into speech or sound effects using custom ComfyUI workflows.
Automating audio content creation in multimedia pipelines.
Integrating AI-driven audio synthesis into larger automation workflows.

For example, you can provide a ComfyUI workflow JSON that defines a neural network model for text-to-speech synthesis, input some text, and receive the generated audio clip as output.

Properties

Name	Meaning
Workflow JSON	The JSON configuration of the ComfyUI workflow used to generate audio from the input text.
Text	The text string to convert into audio.
Timeout (Minutes)	Maximum time (1 to 60 minutes) to wait for the audio generation before timing out.

Output

The node outputs an array of items, each containing:

json:
- success: Boolean indicating if the audio generation succeeded.
- fileName: The name of the generated audio file.
- fileExtension: The audio file extension (e.g., wav, mp3).
- mimeType: The MIME type corresponding to the audio format.
- text: The original input text.
- downloadUrl: URL to download the generated audio file.
- If no audio output is found, a message and debug information about the workflow outputs are included instead.
binary:
- data: The base64-encoded audio data.
- mimeType: The MIME type of the audio.
- fileName: The audio file name.
- fileExtension: The audio file extension.

This allows downstream nodes to access both metadata and raw audio data for further processing or storage.

Dependencies

Requires access to a running ComfyUI server API endpoint (default: http://127.0.0.1:8188).
Optionally requires an API key credential for authentication with the ComfyUI server.
The node uses HTTP requests to communicate with the ComfyUI API (/prompt to queue jobs, /history/{id} to poll status, /view to download files).

Troubleshooting

Text input missing: The node throws an error if the "Text" property is empty. Ensure text is provided.
Invalid workflow JSON: If the workflow JSON is malformed or not parseable, an error is raised. Validate your JSON before use.
Timeouts: If audio generation takes longer than the specified timeout, the node errors out. Increase the timeout or optimize the workflow.
API connection issues: Failure to connect to the ComfyUI API or authorization errors may occur if the server is unreachable or credentials are incorrect.
No audio output found: The workflow completed but did not produce any recognized audio files. Check the workflow design and outputs.
ComfyUI execution failure: Errors returned by the ComfyUI server are surfaced with details; review the server logs and workflow correctness.

Links and References

ComfyUI GitHub Repository — For understanding workflow JSON structure and server setup.
n8n Documentation — General guidance on creating and using custom nodes.
Audio MIME types reference: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types

ComfyUI Text to AudioInstall