Actions21
Overview
The node provides functionality for document and image parsing using the "海螺Ai" resource. It supports analyzing images either by URL or by binary data input, combined with a text prompt to query or describe the image content. Additionally, it offers optional text-to-speech (TTS) capabilities with selectable voice options.
Common scenarios where this node is beneficial include:
- Automatically extracting information or descriptions from images in workflows.
- Integrating AI-powered image recognition or understanding into automation processes.
- Generating spoken responses from image analysis results using TTS voices.
Practical examples:
- Given an image URL of a product, the node can answer questions like "What is in this image?" or "Describe the contents."
- Processing multiple image URLs at once to batch analyze visual content.
- Using binary image data from previous nodes to perform on-the-fly image recognition.
- Enabling TTS to convert the textual analysis result into speech output with a chosen voice.
Properties
| Name | Meaning |
|---|---|
| 模型 (assistantId) | The model or intelligent agent used for processing. Can be any string; if unknown, any value can be entered. |
| 文本输入 (text) | The text prompt or question related to the image(s), e.g., "What is in this image?". Required. |
| 输入类型 (inputType) | Type of image input: "图片链接" (URL) or "二进制文件" (base64 binary file). |
| URL链接 (imageUrls) | Comma-separated list of image URLs to analyze. Required if inputType is URL. |
| 输入数据字段名称 (binaryPropertyName) | Name of the binary property field containing image data when inputType is base64. Default is "data". |
| 简化输出 (simplify) | Boolean flag indicating whether to simplify the response output. Defaults to true. |
| 开启语音 (use_tts) | Boolean flag to enable text-to-speech output. Only shown for the "海螺Ai" resource. Defaults to true. |
| 语音列表 (voice) | Selection of TTS voice. Supports official voices from a predefined list or custom cloned voices. Required if TTS is enabled. |
Output
The node outputs JSON data representing the parsed results from the image(s) based on the input text prompt. If simplification is enabled, the output will be a more concise summary of the analysis.
If TTS is enabled, the node may also produce audio data corresponding to the spoken version of the analysis result, using the selected voice.
Binary output (audio) is handled separately and represents the synthesized speech audio stream.
Dependencies
- Requires access to the 海螺Ai service API for image parsing and TTS functionalities.
- Needs appropriate API authentication credentials configured in n8n (e.g., an API key or token).
- For TTS voice selection, the node fetches available voices dynamically via a search method.
- Network access to image URLs if using URL input type.
Troubleshooting
- Invalid Image URL: Ensure that the provided image URLs are accessible and valid. Invalid URLs will cause failures in image retrieval.
- Missing Binary Data: When using base64 input type, verify that the specified binary property name exists and contains valid image data.
- API Authentication Errors: Confirm that the API credentials are correctly set up and have sufficient permissions.
- Unsupported Voice Selection: If TTS is enabled but no voice is selected or the voice is invalid, the node may fail to generate speech.
- Simplify Output Issues: If simplified output is enabled but the response seems incomplete, try disabling simplification to get full details for debugging.
Links and References
- No direct external links were found in the source code.
- For more information about 海螺Ai services and TTS voices, consult the official documentation of the respective API provider integrated with this node.