Actions2
- Text Actions
- Audio Actions
Overview
This node integrates with the Gemini API to generate audio content based on user input and specified parameters. It is designed to convert text prompts into audio outputs using a selected model from the Gemini API. This node is beneficial in scenarios such as creating voiceovers, generating speech for accessibility, or producing audio responses dynamically within workflows.
Practical examples include:
- Generating spoken versions of chatbot replies.
- Creating audio narration for articles or documents.
- Producing customized audio alerts or messages in automation processes.
Properties
| Name | Meaning |
|---|---|
| API Key | The API key credential required to authenticate requests to the Gemini API. |
| Model Name or ID | The specific model to use for audio generation. You can select from a list of available models or specify a custom model ID. |
| Simplify Output | Whether to simplify the output to only return the text portion of the response (not typically relevant for audio but available). |
| JSON Output | Whether to request the response in JSON format instead of raw audio data. |
| Options | A collection of advanced settings to customize the audio generation: |
| - Frequency Penalty | Penalizes repeated tokens in the generated response to reduce repetition. |
| - Max Output Tokens | Maximum number of tokens to generate in the response, controlling length. |
| - Presence Penalty | Encourages the model to use new tokens by penalizing tokens that have already appeared. |
| - Safety Settings | Multiple safety filters to block unsafe content in the response. Each setting includes: |
| • Harm Category | Categories of harmful content to protect against (e.g., hate speech, harassment, dangerous content). |
| • Harm Block Threshold | Threshold level for blocking content (e.g., block low and above, block none). |
| - System Instruction | Instructions to guide the model's behavior, e.g., "Do not claim to have self-awareness." |
| - Temperature | Controls randomness in token selection; lower values produce more deterministic results, higher values increase creativity. |
| - Thinking Config | Configuration for including model "thoughts" in the response and setting a thinking budget in tokens. |
| - Top K | Limits token sampling to the top K probable tokens at each step, affecting randomness. |
| - Top P | Limits token sampling to tokens whose cumulative probability reaches this value, further controlling randomness. |
Output
The node outputs JSON data containing the generated audio information. Depending on the JSON Output property, the response may be raw audio data or structured JSON describing the audio content. If binary data output is supported, it would represent the audio file generated by the Gemini API, suitable for playback or saving.
Dependencies
- Requires an active Gemini API key for authentication.
- The node depends on the Gemini API service being accessible.
- No additional environment variables or n8n-specific credentials beyond the API key are needed.
Troubleshooting
Audio resource not supported error: The bundled code explicitly throws an error stating that the Audio resource is not yet supported. This indicates that although the node properties define audio generation options, the actual execution logic for audio generation is not implemented. Users will encounter an error if they attempt to run the node with the Audio resource selected.
API Key issues: Invalid or missing API keys will cause authentication failures. Ensure the API key is correctly provided and valid.
Model selection errors: Selecting an unsupported or invalid model ID may result in API errors. Use the provided model list or verify custom IDs.
Parameter validation: Incorrect parameter types or out-of-range values (e.g., temperature outside 0-2) may cause errors. Validate inputs before execution.
Links and References
Note: Although the node defines detailed properties for audio generation, the current implementation does not support executing the audio generation operation and will throw an error if attempted. This suggests the feature is planned but not yet functional.