Gemini icon

Gemini

Interact with Google Gemini AI

Overview

This node integrates with the Google Gemini AI platform to perform various AI-powered tasks. Specifically, for the Generate TTS operation, it converts input text into spoken audio using advanced Text-to-Speech (TTS) models provided by Gemini. This is useful in scenarios such as creating voiceovers for videos, generating audio notifications, accessibility features like reading text aloud, or building conversational agents with natural-sounding voices.

For example, you can input a cheerful greeting text and select a preferred voice and model to generate an audio file that can be played back or saved.

Properties

Name Meaning
Text to Speak The text string that will be converted into speech audio.
TTS Model The Gemini TTS model to use:
- Gemini 2.5 Flash Preview TTS (fast, low latency)
- Gemini 2.5 Pro Preview TTS (high-quality, enhanced control)
Voice The voice style for the speech synthesis. Options include:
Achernar (Soft), Achird (Friendly), Algenib (Gravelly), Algieba (Smooth), Alnilam (Firm), Aoede (Breezy), Autonoe (Bright), Callirrhoe (Light), Charon (Informative), Despina (Smooth), Enceladus (Breathy), Erinome (Clean), Fenrir (Enthusiastic), Gacrux (Mature), Iapetus (Clear), Kore (Firm), Laomedeia (Upbeat), Leda (Young), Orus (Friendly), Puck (Upbeat), Pulcherrima (Expressive), Rasalgethi (Informative), Sadachbia (Energetic), Sadaltager (Expert), Schedar (Even), Sulafat (Warm), Umbriel (Calm), Vindemiatrix (Gentle), Zephyr (Bright), Zubenelgenubi (Casual).

Output

The output contains both JSON data and binary audio data:

  • JSON fields:

    • textToSpeak: The original input text.
    • ttsModel: The TTS model used.
    • voice: The selected voice.
    • audioFileName: Generated filename for the audio (e.g., tts_output_<timestamp>.wav).
    • audioSize: Size in bytes of the final WAV audio file.
    • originalPcmSize: Size in bytes of the raw PCM audio data before WAV header wrapping.
  • Binary data:

    • Property name: data
    • Contains the generated audio encoded as a base64 string representing a WAV file.
    • MIME type: audio/wav
    • Filename matches audioFileName.

The node generates a standard WAV audio file with a 24kHz sample rate, mono channel, and 16-bit depth, suitable for playback or further processing.

Dependencies

  • Requires an API key credential for authenticating requests to the Google Gemini generative language API.
  • The node makes HTTP POST requests to Gemini endpoints for TTS generation.
  • n8n must be configured with this API key credential under a generic "API authentication token" setup.
  • No additional external dependencies are required beyond the configured Gemini API access.

Troubleshooting

  • No candidates returned from Gemini TTS API:
    This indicates the API did not return any audio data. Check your API credentials, network connectivity, and ensure the input text is valid and non-empty.

  • TTS generation failed with finish reason:
    Possible reasons include:

    • SAFETY: The input text was blocked due to safety policies. Modify the text to remove sensitive or disallowed content.
    • RECITATION: The text triggered recitation restrictions. Try changing the phrasing.
    • OTHER: General failure possibly due to text length or unsupported voice/model combination. Shorten the text or try a different voice/model.
  • No audio data found in response:
    The API response structure may have changed or the request was malformed. Verify the node parameters and API version compatibility.

  • Binary data errors:
    If the node cannot find or decode binary data properly, ensure the previous nodes provide valid binary input if chaining operations.

Links and References

Discussion