PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

This node converts PDF documents into editable Word (.docx) files. It supports multiple input methods for the source PDF, including binary data from a previous node, base64-encoded content, or a URL pointing to the PDF file. The conversion process can be customized with quality settings and OCR language options to handle scanned or image-based PDFs. This node is useful in workflows where automated extraction and editing of PDF content are needed, such as document processing, legal contract management, or digitizing scanned forms.

Practical examples:

  • Automatically convert uploaded PDF contracts into Word documents for easy editing.
  • Convert scanned invoices (PDF images) into editable Word files using OCR.
  • Fetch a PDF from a URL and transform it into a Word document for further processing.

Properties

Name Meaning
Input Data Type Method to provide the PDF file:
• Binary Data (from previous node)
• Base64 String (directly encoded PDF content)
• URL (link to the PDF file)
Input Binary Field Name of the binary property containing the PDF file when using Binary Data input type (usually "data").
Base64 PDF Content Base64 encoded string of the PDF document content (used if Input Data Type is Base64 String).
PDF URL URL to the PDF file to convert (used if Input Data Type is URL).
Output File Name Desired filename for the output Word document (e.g., "converted_document.docx").
Document Name Reference name for the source PDF file (e.g., "document.pdf").
Quality Type Conversion quality setting:
• Draft — faster conversion, suitable for simple PDFs with clear text.
• Quality — slower but more accurate, better for complex layouts.
OCR Language Language used for OCR text recognition in scanned/image PDFs. Options include Arabic, Chinese (Simplified/Traditional), Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Portuguese, Russian, Spanish, Swedish.
Advanced Options Collection of additional settings:
• Custom Profiles — JSON string to adjust API call properties.
• Max Retries — max polling attempts for async processing.
• Merge All Sheets — combine pages into one flow.
• Preserve Output Format — keep original formatting if possible.
• Retry Delay (Seconds) — base delay between polling attempts.
• Use OCR When Needed — enable OCR for scanned PDFs.
Binary Data Output Name Custom name for the binary data field in the node's output (default "data").

Output

The node outputs the converted Word document as binary data under the specified binary data output name (default "data"). The output includes the Word file content ready for download, storage, or further workflow steps. The JSON output contains metadata about the conversion result, typically including the filename and possibly other relevant information.

If the node processes multiple items, each will have its own corresponding Word document output.

Dependencies

  • Requires an external PDF-to-Word conversion service accessible via API.
  • Needs proper API authentication configured in n8n credentials (an API key or token).
  • Network access to URLs if using the URL input method.
  • Optional OCR support depends on the external service capabilities and selected OCR language.

Troubleshooting

  • Common issues:

    • Invalid or inaccessible PDF URL: Ensure the URL is reachable and points directly to a valid PDF file.
    • Incorrect binary property name: Verify the binary field name matches the actual input binary data property.
    • Unsupported PDF format or corrupted file: Confirm the PDF is not damaged and is supported by the conversion service.
    • OCR language mismatch: Select the correct OCR language matching the PDF content to improve text recognition accuracy.
    • API rate limits or authentication errors: Check API credentials and usage quotas.
  • Error messages:

    • "File not found" or "Invalid input": Usually caused by wrong input data or missing binary content.
    • "Conversion failed" or timeout errors: May indicate complex PDFs requiring higher max retries or longer retry delays.
    • Authentication errors: Verify that the API key/token is correctly set up in n8n credentials.

Adjust advanced options like max retries and retry delay for large or complex PDFs to reduce timeouts.

Links and References

Discussion