Actions2
Overview
This node integrates with the Mistral OCR API to extract text and structured data from various document formats including PDFs, images, Word documents, PowerPoint presentations, RTF, EPUB, LaTeX, and Jupyter Notebooks. It is designed for scenarios where automated text extraction and document analysis are needed, such as digitizing scanned documents, processing forms, or extracting data for further automation workflows. The node supports multiple OCR model versions with high accuracy and multilingual capabilities, making it suitable for diverse document processing tasks.
Use Case Examples
- Extract text from scanned invoices in PDF format for automated accounting workflows.
- Process images of handwritten notes to convert them into editable text.
- Analyze PowerPoint presentations to extract slide content for summarization or indexing.
Properties
| Name | Meaning |
|---|---|
| Binary Property | Specifies the name of the binary property containing the document to process. Supported formats include PDF, images (PNG/JPEG/GIF), Word (.docx), PowerPoint (.pptx), RTF, EPUB, LaTeX, and Jupyter Notebooks (.ipynb). This is required to identify the input file for OCR processing. |
| Model | Selects the Mistral OCR model version to use for processing. Options include the latest recommended model, a specific May 2025 version, and a legacy March 2025 version, each offering different performance and compatibility characteristics. |
| Options | Additional optional settings for the OCR process, including whether to include the base64 encoded image in the response and the expiry time in hours for the uploaded file (1-168 hours). |
Output
JSON
operation- The OCR operation performed (e.g., Basic OCR or OCR with annotations).uploadedFileId- Identifier of the file uploaded to the Mistral API for processing.signedUrl- Signed URL to access the uploaded document for OCR processing.processedAt- Timestamp indicating when the document was processed.documentTemplate- (If applicable) The document template used for OCR with annotations.includeBboxAnnotations- (If applicable) Indicates if bounding box annotations were included in the OCR response.advancedMode- (If applicable) Indicates if advanced mode was used for document annotation schema.text- Extracted text content from the document (part of the OCR API response).annotations- Structured data and annotations extracted from the document (if OCR with annotations was used).
Dependencies
- Mistral OCR API
- An API key credential for Mistral API authentication
Troubleshooting
- Ensure the binary property name matches the input data property containing the document; otherwise, the node will throw an error indicating no binary data found.
- File size must not exceed the maximum allowed size (e.g., 50MB); larger files will cause an error.
- Unsupported file formats will cause the node to throw an error listing supported formats; convert unsupported files to a supported format before processing.
- Rate limit errors (HTTP 429) from the Mistral API indicate too many requests; the node implements exponential backoff retries but consider upgrading your API plan if limits are frequently hit.
- Invalid JSON in custom document annotation schema will cause errors; ensure JSON is well-formed before input.
Links
- Mistral OCR API Documentation - Official documentation for the Mistral OCR API, detailing usage, supported formats, and model options.