Google Speech

Use Google Speech API

Actions2

Speech to Text Actions
- Recognize
Text to Speech Actions
- Synthesize

Overview

This node uses the Google Cloud Text-to-Speech API to convert input text into spoken audio. It is useful for automating voice generation in workflows, such as creating audio narrations, voice alerts, or accessibility features. For example, you can input a product description and generate an MP3 audio file that reads it aloud, or create personalized voice messages in different languages and voice styles.

Properties

Name	Meaning
Text	The text string to be converted into speech.
Output Format	The audio file format of the generated speech. Options: MP3, LINEAR16 (WAV), OGG Opus.
Language	The language code of the text to synthesize. Options include Italian (it-IT), English US/UK, French, German, Spanish.
Voice Type	The type of voice quality to use. Options: Standard (economical), WaveNet (high quality), Neural2 (best quality).
Specific Voice	Selects a specific voice variant. Options: Voice A-F with gender indicated (Female/Male).
Speaking Rate	Speed of speech from 0.25 (slow) to 4.0 (fast), where 1.0 is normal speaking rate.
Pitch	Voice pitch adjustment from -20.0 (low) to 20.0 (high), where 0.0 is normal pitch.
Additional Options	Collection of optional settings:
Enable Automatic Punctuation	Adds automatic punctuation to the synthesized speech output.
Number of Channels	Number of audio channels: 1 for mono, 2 for stereo.
Separate Recognition Per Channel	Recognizes each channel separately (relevant for stereo audio).
Save To Tmp Directory	If enabled, saves the generated audio file to the `/tmp` directory on the host.
Filename	Filename (without extension) used when saving to `/tmp`.
Boost per Specific Words	Allows boosting the probability of certain words or phrases appearing in the speech synthesis. Each context includes phrases and a boost value (0-20).

Output

The node outputs JSON and binary data:

json:
- success: Boolean indicating if synthesis was successful.
- audioFormat: The selected audio encoding format (e.g., MP3, LINEAR16, OGG_OPUS).
- mimeType: MIME type corresponding to the audio format (e.g., audio/mpeg for MP3).
- tempFilePath (optional): Path to the saved audio file in /tmp if saving is enabled.
binary:
- audio: Contains the base64-encoded audio content (data) and its MIME type (mimeType).

This structure allows downstream nodes to access the audio either as binary data or as a file path.

Dependencies

Requires a valid Google Cloud service account key with permissions for the Text-to-Speech API.
The node expects this credential to be configured in n8n as an API key credential.
Uses the official @google-cloud/text-to-speech Node.js client library.
Optional file saving requires write access to the /tmp directory on the host machine.

Troubleshooting

Common issues:
- Invalid or incomplete service account key JSON will cause authentication errors.
- If no audio content is returned, check that the input text is not empty or too long.
- Incorrect voice configuration (language, voice type, or specific voice) may cause synthesis failure.
- Saving to /tmp may fail if the directory is not writable or disk space is insufficient.
Error messages:
- "Invalid service account key JSON. Please provide a valid service account key." — Ensure your service account JSON contains client_email, private_key, and project_id.
- "Failed to synthesize speech: No audio content returned" — Verify the input text and voice parameters.
- File save errors like "Errore durante il salvataggio del file in /tmp" indicate permission or filesystem issues.
Suggestions:
- Double-check all required properties are set correctly.
- Use shorter texts for testing.
- When saving files, ensure the filename does not contain invalid characters.
- Review Google Cloud IAM permissions for the service account.