CloudConvert icon

CloudConvert

Use CloudConvert to convert files, create thumbnails, merge files, add watermarks and more!

Overview

This node integrates with CloudConvert to add an OCR (Optical Character Recognition) text layer to scanned PDF files. It is useful for converting scanned PDFs, which are essentially images, into searchable and selectable text documents. This operation is beneficial in scenarios where users need to extract text from scanned documents for editing, searching, or archiving purposes. For example, a user can upload a scanned PDF and automatically add an OCR layer to make the text within the document accessible and searchable.

Use Case Examples

  1. Adding an OCR layer to scanned PDFs to enable text search and selection.
  2. Automating the processing of scanned documents to convert them into searchable PDFs for digital archiving.

Properties

Name Meaning
Authentication Method of authenticating with CloudConvert API, either OAuth2 or API Key.
Binary Input Data Whether the input file to upload should be taken from a binary field.
Input File Contents The text content of the file to upload, used if Binary Input Data is false.
Binary Property Name of the binary property containing the file data to be converted, used if Binary Input Data is true.
Auto Orient Whether to automatically detect and correct page orientation before performing OCR.
Languages Comma-separated list of language codes to use for OCR, e.g., 'eng,deu'.

Output

JSON

  • data - The resulting PDF file with the added OCR text layer.

Dependencies

  • CloudConvert API

Troubleshooting

  • Ensure the input PDF is a scanned document suitable for OCR; otherwise, the OCR layer may not be added correctly.
  • Verify that the correct authentication method and credentials are provided to avoid authorization errors.
  • If the OCR languages are not specified correctly, the OCR process may fail or produce inaccurate results.

Links

Discussion