CAPIVISION OCR

OCR multiengine com visão apurada de capivara

Overview

This node performs Optical Character Recognition (OCR) on images or PDF documents using multiple OCR engines and optionally enhances the extracted text with AI analysis. It supports three OCR engines: Tesseract.js (best for simple images), OCR.space (supports PDFs and multiple languages), and AWS Textract (ideal for complex documents and forms). Users can input data either as binary files or base64 strings.

The node is useful in scenarios such as digitizing scanned documents, extracting text from images or PDFs for further processing, automating data entry from forms, or enriching OCR results with AI-powered insights.

Practical examples:

Extracting text from scanned invoices or receipts.
Processing multi-page PDFs to retrieve structured text.
Using AI to summarize or highlight key information from OCR output.

Properties

Name	Meaning
Mecanismo OCR	OCR engine to use: - Tesseract.js (only images, best for simple and well-defined text) - OCR.space (images and PDFs, supports multiple languages) - AWS Textract (images and PDFs, best for complex documents and forms)
Tipo de Entrada	Input format type: - Binário (binary file input) - Base64 (base64 encoded string input)
Input Binário	Name of the binary property containing the image or PDF data (used if input type is binary)
String Base64	Base64 string of the image (may include or exclude the data URI header) (used if input type is base64)
Método de Tratamento	Treatment method: - Apenas OCR (Sem IA): only OCR extraction - OCR + IA: OCR extraction followed by AI analysis
Mecanismo IA	AI engine to use for analysis (shown only if treatment method is OCR + IA): currently only ChatGPT
Modelo IA	AI model variant to use (shown only if AI engine is ChatGPT): options are GPT-4o-mini or GPT-4o
Formato de Saída	Output format of the extracted data: - Texto Puro (plain text) - JSON Estruturado (structured JSON) - CSV
Preset de Layout (opcional)	Optional JSON defining coordinate structure for targeted extraction (not actively used in the main logic)

Output

The node outputs a JSON object per input item with the following structure:

success: boolean indicating if the operation succeeded.
timestamp: ISO timestamp of processing.
engine: selected OCR engine.
fileType: detected input type ("image" or "pdf").
treatmentMethod: chosen treatment method ("ocr_only" or "ocr_ai").
outputFormat: requested output format ("text", "json", or "csv").
data: the extracted text or processed data, which varies depending on output format and treatment:
- For plain text output: a string with the recognized text.
- For JSON output: an object containing original text and AI analysis if applicable.
- For CSV output: CSV formatted string combining original text and AI analysis.
metadata: includes processing timestamps and versions of OCR and AI engines used.

If an error occurs during processing, the output JSON contains:

success: false
error: object with message, type, stack trace, and context details about the failure.

The node does not output binary data.

Dependencies

External Services / APIs:
- OCR.space API (requires API key credential) for OCR on images and PDFs.
- AWS Textract service (requires AWS credentials) for OCR on images and PDFs, especially complex documents.
- OpenAI API (requires API key credential) for AI-based text analysis when enabled.
Node.js Libraries:
- tesseract.js for local OCR on images.
- @aws-sdk/client-textract for AWS Textract integration.
- openai SDK for AI completions.
- axios for HTTP requests to OCR.space.
- pdf.js-extract for extracting text content from PDFs locally (used internally).
n8n Configuration:
- Credentials must be configured for the selected OCR engine and AI engine.
- Proper permissions and API keys must be set up for external services.

Troubleshooting

Common Issues:
- Missing binary data or base64 string input will cause errors.
- Using Tesseract.js with PDF input throws an error because it does not support PDFs directly.
- Invalid or missing API credentials for OCR.space, AWS Textract, or OpenAI will cause authentication failures.
- Empty or unreadable images/PDFs may result in no text extracted.
- Network issues can affect calls to external APIs.
Error Messages and Resolutions:
- "Nenhum dado binário encontrado!" or "Nenhum dado binário encontrado no campo ...": Check that the binary property name matches the input data field.
- "String base64 não fornecida!": Provide a valid base64 string input.
- "Tesseract.js não suporta PDF diretamente no n8n. Use OCR.space ou AWS Textract para PDFs.": Switch to OCR.space or AWS Textract for PDF inputs.
- "Erro na análise com IA: ...": Verify OpenAI API key and usage limits; check network connectivity.
- "Erro ao validar arquivo: ...": Ensure the input file is a supported image or PDF format.
- "OCR.space não retornou resultados válidos": Confirm API key validity and that the file is correctly sent.
- General unexpected errors include stack traces in output for debugging.

CAPIVISION OCR

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

CAPIVISION OCRInstall

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

CAPIVISION OCR