Google Speech icon

Google Speech

Use Google Speech API

Actions2

Overview

This node uses the Google Cloud Text-to-Speech API to convert input text into spoken audio. It is useful for automating voice generation in workflows, such as creating audio narrations, voice alerts, or accessibility features. For example, you can input a product description and generate an MP3 audio file that reads it aloud, or create personalized voice messages in different languages and voice styles.

Properties

Name Meaning
Text The text string to be converted into speech.
Output Format The audio file format of the generated speech. Options: MP3, LINEAR16 (WAV), OGG Opus.
Language The language code of the text to synthesize. Options include Italian (it-IT), English US/UK, French, German, Spanish.
Voice Type The type of voice quality to use. Options: Standard (economical), WaveNet (high quality), Neural2 (best quality).
Specific Voice Selects a specific voice variant. Options: Voice A-F with gender indicated (Female/Male).
Speaking Rate Speed of speech from 0.25 (slow) to 4.0 (fast), where 1.0 is normal speaking rate.
Pitch Voice pitch adjustment from -20.0 (low) to 20.0 (high), where 0.0 is normal pitch.
Additional Options Collection of optional settings:
  Enable Automatic Punctuation Adds automatic punctuation to the synthesized speech output.
  Number of Channels Number of audio channels: 1 for mono, 2 for stereo.
  Separate Recognition Per Channel Recognizes each channel separately (relevant for stereo audio).
  Save To Tmp Directory If enabled, saves the generated audio file to the /tmp directory on the host.
  Filename Filename (without extension) used when saving to /tmp.
  Boost per Specific Words Allows boosting the probability of certain words or phrases appearing in the speech synthesis. Each context includes phrases and a boost value (0-20).

Output

The node outputs JSON and binary data:

  • json:

    • success: Boolean indicating if synthesis was successful.
    • audioFormat: The selected audio encoding format (e.g., MP3, LINEAR16, OGG_OPUS).
    • mimeType: MIME type corresponding to the audio format (e.g., audio/mpeg for MP3).
    • tempFilePath (optional): Path to the saved audio file in /tmp if saving is enabled.
  • binary:

    • audio: Contains the base64-encoded audio content (data) and its MIME type (mimeType).

This structure allows downstream nodes to access the audio either as binary data or as a file path.

Dependencies

  • Requires a valid Google Cloud service account key with permissions for the Text-to-Speech API.
  • The node expects this credential to be configured in n8n as an API key credential.
  • Uses the official @google-cloud/text-to-speech Node.js client library.
  • Optional file saving requires write access to the /tmp directory on the host machine.

Troubleshooting

  • Common issues:

    • Invalid or incomplete service account key JSON will cause authentication errors.
    • If no audio content is returned, check that the input text is not empty or too long.
    • Incorrect voice configuration (language, voice type, or specific voice) may cause synthesis failure.
    • Saving to /tmp may fail if the directory is not writable or disk space is insufficient.
  • Error messages:

    • "Invalid service account key JSON. Please provide a valid service account key." — Ensure your service account JSON contains client_email, private_key, and project_id.
    • "Failed to synthesize speech: No audio content returned" — Verify the input text and voice parameters.
    • File save errors like "Errore durante il salvataggio del file in /tmp" indicate permission or filesystem issues.
  • Suggestions:

    • Double-check all required properties are set correctly.
    • Use shorter texts for testing.
    • When saving files, ensure the filename does not contain invalid characters.
    • Review Google Cloud IAM permissions for the service account.

Links and References

Discussion