FreeLim

逆向LLM

Actions21

Overview

The node implements a document/image parsing operation under the "智谱清言" resource. It is designed to analyze images by either providing image URLs or binary image data, and then processing these inputs to extract meaningful information or perform recognition tasks. This node is useful in scenarios where users want to automate image content understanding, such as identifying objects in pictures, extracting text from images, or performing other AI-driven image analyses.

Practical examples include:

Uploading an image URL of a product to get a description or details about it.
Sending binary image data from a file upload to recognize contents or extract metadata.
Using the node in workflows that require automated image content analysis for categorization or tagging.

Properties

Name	Meaning
模型 (assistantId)	The model or intelligent agent used for processing. Can be any string; if unknown, can be arbitrary.
文本输入 (text)	Text input related to the image, e.g., a question like "What is in this picture?". Required for some resources.
输入类型 (inputType)	Type of input for the image: either "图片链接" (image URL) or "二进制文件" (binary file).
URL链接 (imageUrls)	One or multiple comma-separated URLs of images to analyze. Required if inputType is "url".
输入数据字段名称 (binaryPropertyName)	The name of the binary property field containing the image data when inputType is "base64". Default is "data".
简化输出 (simplify)	Boolean flag indicating whether to simplify the response output. Defaults to true.
语音列表 (voice)	Selection of voice options for text-to-speech features, including official and cloned voices.

Output

The node outputs JSON data representing the results of the image/document parsing operation. The structure typically includes recognized content, descriptions, or extracted metadata from the images processed. If the node supports text-to-speech, it may also provide audio data corresponding to the parsed text.

If binary data is output, it generally represents audio streams or processed media related to the voice synthesis feature.

Dependencies

Requires access to external AI services capable of image analysis and text-to-speech synthesis.
Needs configuration of API credentials or authentication tokens to connect with these services.
The voice list is dynamically fetched via a search method, implying network connectivity and proper API setup.

Troubleshooting

Invalid Image URLs: Ensure URLs are accessible and correctly formatted. Invalid or unreachable URLs will cause failures.
Binary Data Issues: When using base64 input, verify that the binary property name matches the actual input data field.
API Authentication Errors: Missing or incorrect API keys will prevent successful calls to external services.
Unsupported Models or Voices: Selecting a model or voice not supported by the backend service may result in errors or empty responses.
Simplify Output Flag: Setting simplify to false might produce complex nested JSON that could be harder to parse downstream.

Links and References

No direct links provided in the source code. For more information on AI image parsing and TTS services, consult the documentation of the respective external APIs integrated with this node.