Google Gemini - FCI

Interact with Google Gemini AI models using direct URL and API Key

Actions10

Audio Actions
- Analyze Audio
- Transcribe a Recording
Document Actions
- Analyze Document
File Actions
- Upload File
Image Actions
- Analyze Image
- Generate an Image
Text Actions
- Message a Model
Video Actions

Overview

This node provides functionality to transcribe audio recordings using a Google Gemini API endpoint. It supports transcription from either audio URLs or binary audio files provided as input. Users can specify the model to use for transcription, and optionally define start and end times to transcribe only a segment of the audio.

Common scenarios include:

Transcribing podcast episodes or interviews hosted online by providing their URLs.
Processing locally uploaded audio files (e.g., voice memos, meeting recordings) in binary form.
Extracting text from specific segments of longer audio files by specifying start and end times.

Practical example:

A user wants to convert an online lecture recording into text. They provide the URL of the audio file, select a suitable speech-to-text model, and receive the transcription output.
Another user uploads multiple audio files directly to n8n, which are then transcribed in batch.

Properties

Name	Meaning
Server URL	The base URL of the Google Gemini API endpoint to send transcription requests to.
API Key	The API key credential used to authenticate requests to the Google Gemini service.
Model	The identifier of the speech-to-text model to use for transcription. Can be selected from a list or entered manually.
Input Type	Specifies whether the audio input is provided as URLs ("Audio URL(s)") or as binary file data ("Binary File(s)").
URL(s)	One or more comma-separated URLs pointing to audio files to transcribe. Only shown if Input Type is "Audio URL(s)".
Input Data Field Name(s)	The name(s) of the binary fields containing audio data to process. Multiple field names can be comma-separated. Only shown if Input Type is "Binary File(s)".
Simplify Output	Boolean flag indicating whether to simplify the transcription response output for easier consumption.
Options - Start Time	Optional start time offset within the audio to begin transcription, formatted as MM:SS or HH:MM:SS.
Options - End Time	Optional end time offset within the audio to stop transcription, formatted as MM:SS or HH:MM:SS.

Output

The node outputs JSON data representing the transcription results returned by the Google Gemini API. If the "Simplify Output" option is enabled, the response is transformed into a more concise format focusing on the main transcription text.

If binary input is used, the node processes the binary audio data but does not output binary data itself; the output remains textual transcription results.

Dependencies

Requires access to the Google Gemini API endpoint specified by the Server URL.
An API key credential with permissions to use the Google Gemini speech-to-text models.
Network connectivity from the n8n instance to the Google Gemini API.

Troubleshooting

Invalid API Key or Authentication Errors: Ensure the API key is valid, active, and has appropriate permissions.
Incorrect Server URL: Verify the Server URL is correct and reachable.
Model Not Found: Confirm the selected model ID exists and is accessible under your account.
Malformed Audio URLs: Check that audio URLs are publicly accessible and correctly formatted.
Binary Field Names Incorrect: When using binary input, ensure the specified field names match those in the incoming data.
Time Format Errors: Start and end times must be in MM:SS or HH:MM:SS format; invalid formats may cause errors or ignored offsets.

Google Gemini - FCI

Actions10

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Google Gemini - FCIInstall

Actions10

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Google Gemini - FCI