Mistral OCR

Extract text and structured data from documents using Mistral OCR API

Actions2

- Basic OCR
- OCR with Annotations

Overview

This node integrates with the Mistral OCR API to extract text and structured data from various document formats including PDFs, images, Word, PowerPoint, RTF, EPUB, LaTeX, and Jupyter Notebooks. It supports OCR with annotations, allowing users to define custom or pre-configured document templates for data extraction, and optionally include bounding box annotations for visual elements like charts and tables. The node handles file uploads, manages rate limits with retries, and returns detailed OCR results with metadata. It is useful for automating data extraction from invoices, contracts, letters, receipts, ID documents, research papers, and other document types.

Use Case Examples

Extract structured invoice data such as amounts, dates, and customer info from PDF invoices.
Process scanned contracts to extract parties, dates, and terms with bounding box annotations for tables and figures.
Analyze research papers to extract titles, authors, abstracts, and keywords using a custom template.

Properties

Name	Meaning
Binary Property	Name of the binary property containing the document to process. Supported formats include PDF, images (PNG/JPEG/GIF), Word (.docx), PowerPoint (.pptx), RTF, EPUB, LaTeX, and Jupyter Notebooks (.ipynb). This is required to provide the document data for OCR processing.
Model	Select the Mistral OCR model version to use for processing. Options include the latest recommended version, a specific May 2025 version, or a legacy March 2025 version, each offering different accuracy and compatibility.
Document Template	Choose a pre-configured document template or define custom fields for data extraction. Templates include invoice, letter, contract, receipt, ID document, research paper, or custom fields.
Custom Fields	Define multiple custom fields with names, types (number, string, date, array, boolean), descriptions, and whether they are required. Used only if Document Template is set to custom.
Include Element Analysis	Boolean option to include analysis of visual elements such as charts, figures, and tables in the document.
Advanced: Custom JSON Schema	Enable advanced mode to manually define JSON schemas for document and bounding box annotations.
BBox Annotation Schema	JSON schema defining the structure for visual element annotations when advanced mode and element analysis are enabled.
Document Annotation Schema	JSON schema defining the structure for document-level annotations when advanced mode is enabled.
Pages to Process	Specify pages to process for document annotations, e.g., '0-7' or '0,1,2,3'. Maximum 8 pages allowed.
Options	Additional options including whether to include base64 encoded image in the response and file expiry time in hours (1-168).

Output

JSON

operation - The OCR operation performed (e.g., 'ocrWithAnnotations').
uploadedFileId - ID of the uploaded file in the Mistral API.
signedUrl - Signed URL to access the uploaded document.
processedAt - Timestamp when the document was processed.
documentTemplate - The document template used for extraction (if applicable).
includeBboxAnnotations - Boolean indicating if bounding box annotations were included.
advancedMode - Boolean indicating if advanced mode was enabled.
``
- *
  * `` - Additional OCR results and metadata returned by the Mistral OCR API.

Dependencies

Mistral OCR API
An API key credential for Mistral OCR API authentication

Troubleshooting

Ensure the binary property specified contains valid document data in supported formats; errors occur if missing or unsupported format is detected.
File size must not exceed the maximum allowed size (e.g., 50MB); large files will cause errors.
Rate limit errors (HTTP 429) are handled with retries, but persistent rate limits indicate exceeding API plan capacity; consider upgrading the plan or reducing request frequency.
Invalid JSON in custom schemas will cause errors; validate JSON syntax before use.
If the node throws errors about missing or corrupted binary data, verify the input data source and format.

Mistral OCR

Actions2

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

Mistral OCRInstall

Actions2

Overview

Use Case Examples

Properties

Output

JSON

Dependencies

Troubleshooting

Links

Discussion

Mistral OCR