Google Document AI OCR

Extract text from documents using Google Document AI OCR

Overview

This node uses Google Document AI OCR (via the Google Cloud Vision API) to extract text from documents. It processes input documents either as binary files or by file path, performing optical character recognition (OCR) to detect and extract textual content along with detailed annotations.

Common scenarios where this node is beneficial include:

Automating data extraction from scanned documents, PDFs, or images.
Digitizing paper forms or receipts for further processing.
Extracting structured text data for indexing or search purposes.

For example, you can feed a scanned invoice image as a binary file, and the node will output all detected text segments with their positions and confidence scores.

Properties

Name	Meaning
Input Type	How the document data will be provided. Options: "Binary File" (upload file data), "File Path" (path to local file).
Binary Property	Name of the binary property containing the document file. Used only if Input Type is "Binary File". Default is "data".
File Path	Path to the document file on disk. Used only if Input Type is "File Path".

Output

The node outputs an array of items, each containing a json object with a textAnnotations field. This field is an array of detected text annotation objects, each including:

mid: Machine-generated identifier (if available).
locale: Language locale of the detected text.
description: The actual recognized text string.
score, confidence, topicality: Various confidence and relevance metrics.
boundingPoly: Coordinates outlining the position of the text in the document.
locations: Geographical or positional metadata (if any).
properties: Additional properties related to the text annotation.

If the node encounters an error during processing and "Continue On Fail" is enabled, it outputs an item with an error field describing the issue.

The node does not output binary data.

Dependencies

Requires a valid Google Cloud service account credential with access to the Vision API.
The Google Cloud Vision client library (@google-cloud/vision) is used internally.
The node expects the service account key JSON to be provided via credentials configured in n8n.

Troubleshooting

Invalid Document or Text: If the document is empty or the OCR fails to detect text, the node throws an error stating "Document or document text is invalid". Ensure the input file is a supported image or document format and contains readable text.
Credential Issues: Errors related to authentication usually indicate missing or incorrect Google service account credentials. Verify that the credentials are correctly set up and have the necessary permissions.
File Not Found: When using "File Path" input type, ensure the specified path is accessible and correct.
Binary Property Missing: For "Binary File" input type, confirm that the binary property name matches the actual binary data property in the input.
Enabling "Continue On Fail" allows the workflow to proceed even if some items fail, with errors reported in the output.

Google Document AI OCR

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Google Document AI OCRInstall

Overview

Properties

Output

Dependencies

Troubleshooting

Links and References

Discussion

Google Document AI OCR