Actions80
- Extract Text From Word
- Find And Replace Text
- Convert PDF To Editable PDF Using OCR
- Create Swiss QR Bill
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Create PDF/A
- Convert HTML To PDF
- Convert Markdown To PDF
- Upload File To PDF4me
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Fill PDF Form
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- AI-Invoice Parser
- AI-Process HealthCard
- AI-Process Contract
- Generate Barcode
- Classify Document
- Parse Document
- Linearize PDF
- Flatten PDF
- Convert To PDF
- Json To Excel
- Convert PDF To Excel
- Convert PDF To Word
- Convert PDF To PowerPoint
- Convert VISIO
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Extract Pages
- Merge Multiple PDFs
- Overlay PDFs
- Rotate Document
- Rotate Page
- Sign PDF
- URL to PDF
- Add Image Watermark To Image
- Add Text Watermark To Image
- Compress Image
- Convert Image Format
- Create Images From PDF
- Flip Image
- Get Image Metadata
- Image Extract Text
- Remove EXIF Tags From Image
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Image
- Rotate Image By EXIF Data
- Compress PDF
- Get PDF Metadata
- Repair PDF Document
- Get Document From Pdf4me
- Update Hyperlinks Annotation
- Protect Document
- Unlock PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Generate Document Single
- Generate Documents Multiple
- Get Tracking Changes In Word
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Attachment From PDF
- Extract Text By Expression
- Extract Table From PDF
- Extract Resources
Overview
This node operation converts a PDF file into an editable PDF using Optical Character Recognition (OCR). It is useful when you have scanned documents or image-based PDFs that do not contain selectable or searchable text, and you want to transform them into editable, searchable PDFs. Typical use cases include digitizing paper documents, making scanned contracts or forms editable, and enabling text extraction from image-only PDFs.
For example, if you receive a scanned invoice as a PDF, this node can convert it into an editable format so you can easily copy text, fill in fields, or make corrections.
Properties
| Name | Meaning |
|---|---|
| PDF Input Data Type | Choose how to provide the PDF file: - Binary Data: Use PDF file from previous node - Base64 String: Provide PDF content as base64 encoded string - URL: Provide URL to PDF file |
| PDF Binary Field | Name of the binary property containing the PDF file (used only if PDF Input Data Type is Binary Data) |
| PDF Base64 Content | Base64 encoded PDF content (used only if PDF Input Data Type is Base64 String) |
| PDF URL | URL to the PDF file (used only if PDF Input Data Type is URL) |
| Quality Type | Quality level for OCR processing: - Draft: Suitable for normal PDFs, consumes 1 API call per file - High: Suitable for image/scanned PDFs, consumes 2 API calls per page |
| OCR Only When Needed | Whether to skip OCR if text is already searchable: - True: Skip recognition if text exists - False: Always perform OCR |
| Language | Language of the text in the source file; used if output text is not recognizable |
| Output Format | Output format of the resulting file (string format) |
| Merge All Sheets | Whether to merge all sheets if applicable (boolean) |
| Output File Name | Name for the output editable PDF file |
| Async | Enable asynchronous processing (recommended for large files) |
Output
The node outputs JSON data representing the result of the conversion. The main output includes the editable PDF file, typically as binary data attached to the output item under a specified binary property. This allows subsequent nodes to access the converted editable PDF for further processing or saving.
If the node supports binary output, the binary data corresponds to the editable PDF generated by OCR.
Dependencies
- Requires an external OCR service or API capable of converting PDFs to editable PDFs.
- Needs appropriate API credentials or authentication tokens configured in n8n to access the OCR service.
- Network access to URLs if providing PDF via URL input type.
- Proper configuration of the node with correct input data and parameters.
Troubleshooting
Common issues:
- Providing incorrect or inaccessible PDF URL may cause failures.
- Supplying invalid base64 content will lead to errors during decoding.
- Choosing "High" quality OCR on very large PDFs may consume many API calls and increase processing time.
- If the language parameter is incorrect or missing, OCR results might be poor or unrecognizable.
- Asynchronous processing enabled but not handled properly downstream could cause timing issues.
Error messages:
- Errors related to missing or malformed PDF input indicate checking the input data type and content.
- Authentication or permission errors suggest verifying API credentials.
- Timeouts or rate limit errors imply adjusting quality settings or splitting large files.