Pollinations.AI Audio
Generate audio (text-to-speech) or transcribe audio (speech-to-text) using Pollinations.AI
Actions2
Overview
This node integrates with Pollinations.AI to provide audio processing capabilities, specifically converting text to speech (TTS) and transcribing speech to text (STT). It is useful for automating audio content generation from text or extracting text from audio files, which can be applied in content creation, accessibility tools, and voice-enabled applications.
Use Case Examples
- Convert a blog post or article text into spoken audio for podcasts or audiobooks using the Text to Speech operation.
- Transcribe recorded interviews or meetings into text for documentation or analysis using the Speech to Text operation.
Properties
| Name | Meaning |
|---|---|
| Text | The text input to be converted into speech (used in Text to Speech operation). |
| Voice | The voice style to use for speech synthesis, such as neutral, deep, storyteller, warm, bright, or melodic (used in Text to Speech operation). |
Output
Binary
Contains the generated audio file in MP3 format when using Text to Speech.
JSON
text- The original input text converted to speech.voice- The voice style used for the speech synthesis.audioUrl- The URL from which the generated audio can be accessed.
Dependencies
- Pollinations.AI API with an API token credential
Troubleshooting
- Ensure the Pollinations.AI API token is correctly set in the node credentials; otherwise, the node will throw an error indicating the missing token.
- If the text input is empty or invalid, the Text to Speech operation may fail or produce no audio output.
- For Speech to Text, ensure the audio input is correctly provided either as binary data or a valid URL, and the audio format matches the actual file format to avoid transcription errors.
- Network or timeout issues may occur due to the 120-second request timeout; retry or check network connectivity if requests fail.
Links
- Pollinations.AI - Official website for Pollinations.AI, the service used for text-to-speech and speech-to-text processing.