Google Gemini - FCI icon

Google Gemini - FCI

Interact with Google Gemini AI models using direct URL and API Key

Overview

The node provides functionality to analyze audio content using a Google Gemini API. It supports analyzing audio either by providing URLs to audio files or by uploading binary audio files directly. The node sends the audio data to a specified model hosted on the Google Gemini service, which processes and returns an analysis of the audio content.

This node is beneficial in scenarios such as:

  • Extracting descriptions or summaries from audio files.
  • Transcribing or understanding spoken content in audio.
  • Automating audio content analysis workflows without manual intervention.

For example, you could use this node to analyze podcast episodes by URL to generate episode summaries or to process uploaded voice recordings for transcription or sentiment analysis.

Properties

Name Meaning
Server URL The base URL of the Google Gemini API endpoint to send requests to (default: https://generativelanguage.googleapis.com).
API Key The API key credential required to authenticate requests to the Google Gemini API.
Model The specific audio analysis model to use, selectable from a list or by entering a model ID manually.
Text Input A prompt or question related to the audio, e.g., "What's in this audio?", guiding the analysis.
Input Type Specifies whether the audio input is provided as URLs (Audio URL(s)) or as binary file uploads (Binary File(s)).
URL(s) One or more comma-separated URLs pointing to audio files to be analyzed (used if Input Type is URL).
Input Data Field Name(s) Name(s) of the binary fields containing audio data to analyze, comma-separated if multiple (used if Input Type is binary).
Simplify Output Boolean flag indicating whether to simplify the response output for easier consumption.
Options Additional options including:
- Length of Description (Max Tokens): Limits the maximum tokens in the description output (default 300).

Output

The node outputs JSON data representing the analysis results returned by the Google Gemini audio model. This typically includes descriptive information about the audio content, transcriptions, or other metadata depending on the model's capabilities.

If the input was binary audio files, the node processes these and returns the corresponding analysis in JSON format. There is no direct binary output; the binary input is only used as source data for analysis.

Dependencies

  • Requires access to the Google Gemini API endpoint.
  • An API key credential for authenticating with the Google Gemini service must be configured.
  • Network connectivity to the specified server URL.
  • The node depends on internal routing and version description modules (bundled internally).

Troubleshooting

  • Invalid API Key or Authentication Errors: Ensure the API key is valid, active, and has permissions for the Google Gemini API.
  • Incorrect Server URL: Verify the server URL is correct and reachable.
  • Model Not Found: Confirm the selected model ID exists and is accessible under your account.
  • Input Data Issues: If using binary input, ensure the binary field names are correctly specified and contain valid audio data.
  • Rate Limits or Quotas: Be aware of API usage limits imposed by Google Gemini that might cause request failures.
  • Malformed URLs: When using URL input, ensure URLs are properly formatted and publicly accessible.

Links and References

Discussion