PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

This node operation converts a PDF file into an editable PDF using Optical Character Recognition (OCR). It is useful when you have scanned documents or image-based PDFs that do not contain selectable or searchable text, and you want to transform them into editable, searchable PDF files. This can be beneficial for digitizing paper documents, enabling text extraction, editing, or further processing.

Typical use cases include:

  • Converting scanned contracts, invoices, or forms into editable PDFs.
  • Preparing image-only PDFs for text search and copy-paste.
  • Automating document workflows where editable content is required from non-editable PDFs.

Properties

Name Meaning
PDF Input Data Type Choose how to provide the PDF file:
- Binary Data (from previous node)
- Base64 String
- URL to PDF file
PDF Binary Field Name of the binary property containing the PDF file (used if input type is Binary Data)
PDF Base64 Content Base64 encoded string of the PDF content (used if input type is Base64 String)
PDF URL URL pointing to the PDF file (used if input type is URL)
Quality Type OCR quality level:
- Draft: suitable for normal PDFs, consumes 1 API call per file
- High: suitable for scanned/image PDFs, consumes 2 API calls per page
OCR Only When Needed Whether to skip OCR if text is already searchable:
- True: skip OCR if text exists
- False: always perform OCR
Language Language of the text in the source file; used if output text is not recognizable
Output Format Output format as a string (likely specifying the desired output file format or encoding)
Merge All Sheets Boolean indicating whether to merge all sheets if applicable
Output File Name Filename for the resulting editable PDF file
Binary Data Output Name Custom name for the binary data field in the node's output

Output

The node outputs the converted editable PDF file as binary data under the specified binary data output name (default "data"). The JSON output typically contains metadata about the processed file, while the binary output holds the actual editable PDF content ready for download or further workflow steps.

If the node supports multiple pages or sheets, it can merge them into a single output PDF depending on the "Merge All Sheets" option.

Dependencies

  • Requires access to an OCR service capable of converting PDFs to editable formats.
  • Needs proper authentication via an API key or token configured in n8n credentials (generic API authentication).
  • Internet access may be necessary if providing PDF input via URL.
  • The node depends on bundled action modules internally but no external npm packages are explicitly required beyond those.

Troubleshooting

  • Common issues:

    • Providing incorrect input data type or missing the corresponding input property (e.g., binary field name when using binary data).
    • OCR quality setting too low for scanned images, resulting in poor text recognition.
    • Incorrect language setting causing unrecognized characters.
    • Network errors when fetching PDF from URL.
    • Output format string invalid or unsupported.
  • Error messages:

    • Errors related to missing or invalid input data usually indicate misconfiguration of input properties.
    • Authentication failures suggest missing or invalid API credentials.
    • OCR processing errors might occur if the PDF is corrupted or unsupported.
  • Resolutions:

    • Verify input data matches the selected input type.
    • Adjust OCR quality based on document type.
    • Confirm language code or name is correct.
    • Check network connectivity and URL validity.
    • Ensure API credentials are correctly set up in n8n.

Links and References

Discussion