PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

The node provides functionality to convert PDF documents into editable Word files (.docx). It supports multiple input methods for the source PDF, including binary data from a previous node, base64-encoded strings, or direct URLs to PDF files. The conversion process can be customized with quality settings and OCR language options to handle scanned or image-based PDFs effectively.

This node is beneficial in scenarios where automated workflows require extracting editable content from PDFs, such as document editing, archiving, or further processing in word processors. For example, it can be used to convert contracts received as PDFs into Word documents for review and modification, or to digitize scanned reports by applying OCR during conversion.

Properties

Name Meaning
Input Data Type Choose how to provide the PDF file to convert. Options: Binary Data (from previous node), Base64 String (directly provide encoded content), URL (link to PDF file).
Input Binary Field Name of the binary property containing the PDF file when using Binary Data input type (usually "data").
Base64 PDF Content Base64 encoded string representing the PDF document content, used if Input Data Type is Base64 String.
PDF URL URL pointing to the PDF file to convert, used if Input Data Type is URL.
Output File Name Desired name for the output Word document file (e.g., "converted_document.docx").
Document Name Name of the source PDF file for reference purposes (e.g., "document.pdf").
Quality Type Conversion quality setting. Options: Draft (faster, suitable for simple PDFs with clear text), Quality (slower but more accurate, better for complex layouts).
OCR Language Language used for Optical Character Recognition when converting scanned or image-based PDFs. Supported languages include Arabic, Chinese (Simplified/Traditional), Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Portuguese, Russian, Spanish, Swedish.
Advanced Options Collection of additional optional settings:
• Custom Profiles: JSON string to adjust custom API call properties.
• Max Retries: Maximum polling attempts for async processing.
• Merge All Sheets: Combine multiple pages into a single document flow.
• Preserve Output Format: Whether to keep original formatting.
• Retry Delay (Seconds): Base delay between polling attempts.
• Use Async Processing: Enable asynchronous handling for large files.
• Use OCR When Needed: Enable OCR for scanned PDFs.

Output

The node outputs an array of items where each item contains a json field with metadata and a binary field holding the converted Word document file. The binary data represents the .docx file resulting from the PDF conversion.

  • json: Contains metadata about the conversion process or any relevant information.
  • Binary data field (default name usually "data"): Contains the Word document file content ready for download or further processing.

Dependencies

  • Requires access to an external PDF conversion service API that supports PDF to Word conversion with OCR capabilities.
  • Needs proper API authentication credentials configured in n8n to authorize requests to the conversion service.
  • Network access to fetch PDF files if using URL input type.
  • Optional configuration of advanced options may require familiarity with the external API's profile settings.

Troubleshooting

  • Common Issues:

    • Invalid or inaccessible PDF URL leading to failed downloads.
    • Incorrect binary property name causing missing input data.
    • Unsupported PDF formats or encrypted PDFs that cannot be converted.
    • Insufficient API quota or invalid API credentials causing authorization errors.
    • Large or complex PDFs exceeding default retry limits or timeouts.
  • Error Messages & Resolutions:

    • "File not found" or "Unable to download PDF": Verify the URL is correct and accessible.
    • "Invalid binary data": Ensure the binary property name matches the actual input binary field.
    • "Conversion failed due to unsupported format": Check if the PDF is corrupted or encrypted; try preprocessing or unlocking the PDF.
    • "Authentication error": Confirm API key or token is correctly set up in n8n credentials.
    • Timeout or polling exceeded max retries: Increase "Max Retries" and "Retry Delay" in advanced options for large files.

Links and References

Discussion