Overview
This node converts PDF documents into editable Word (.docx) files. It supports multiple input methods for the source PDF, including binary data from previous nodes, base64-encoded strings, URLs, or local file paths. The node offers options to control conversion quality and OCR language for scanned PDFs or images within the document. It is useful in workflows where automated extraction and editing of PDF content are required, such as digitizing scanned contracts, converting reports for further editing, or integrating PDF content into document management systems.
Practical examples:
- Automatically convert uploaded PDF invoices into Word documents for downstream processing.
- Convert scanned PDF forms using OCR to editable Word format for data extraction.
- Fetch a PDF from a URL and convert it to Word for content repurposing.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Method to provide the PDF file: Binary Data (from previous node), Base64 String (encoded PDF content), URL (link to PDF), or File Path (local file system path). |
| Input Binary Field | Name of the binary property containing the PDF file when using Binary Data input type (default is "data"). |
| Base64 PDF Content | Base64 encoded string of the PDF document content (used if Input Data Type is Base64 String). |
| PDF URL | URL pointing to the PDF file to convert (used if Input Data Type is URL). |
| Local File Path | Local file system path to the PDF file (used if Input Data Type is File Path). |
| Output File Name | Desired filename for the resulting Word document (default: "converted_document.docx"). |
| Document Name | Reference name for the source PDF file (default: "document.pdf"). |
| Quality Type | Conversion quality setting: "Draft" (faster, suitable for simple PDFs with clear text) or "Quality" (slower but more accurate, better for complex layouts). |
| OCR Language | Language used for Optical Character Recognition on scanned PDFs or images. Options include Arabic, Chinese (Simplified/Traditional), Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Portuguese, Russian, Spanish, Swedish. |
| Advanced Options | Collection of additional settings: |
| - Custom Profiles | JSON string to customize API call properties for advanced configuration. |
| - Max Retries | Maximum number of polling attempts for asynchronous processing (higher for complex PDFs). |
| - Merge All Sheets | Whether to combine multiple pages into a single continuous document flow (true/false). |
| - Preserve Output Format | Whether to preserve original formatting when possible (true/false). |
| - Retry Delay (Seconds) | Base delay in seconds between polling attempts; actual delay increases exponentially. |
| - Use Async Processing | Enable asynchronous processing for better handling of large files (true/false). |
| - Use OCR When Needed | Enable OCR automatically when needed for scanned PDFs (true/false). |
Output
The node outputs a JSON object containing the converted Word document. The main output field includes:
json: Metadata about the conversion result.binary: The Word document file in binary form, named according to the specified output file name.
If the input was a PDF, the output binary data represents the corresponding .docx file ready for download or further processing in the workflow.
Dependencies
- Requires access to an external PDF-to-Word conversion service API.
- Needs appropriate API authentication credentials configured in n8n (e.g., an API key).
- Network access to fetch PDFs from URLs if that input method is used.
- For OCR functionality, the service must support the selected OCR languages.
Troubleshooting
Common issues:
- Invalid or inaccessible PDF URL leading to download failures.
- Incorrect binary property name causing missing input data errors.
- Unsupported or corrupted PDF files causing conversion errors.
- Insufficient API quota or invalid API credentials resulting in authorization errors.
- Long processing times for large or complex PDFs; consider adjusting retry and async options.
Error messages and resolutions:
- "Input binary data not found": Verify the binary property name matches the actual input.
- "Failed to fetch PDF from URL": Check URL accessibility and network connectivity.
- "Conversion failed due to unsupported format": Ensure the input file is a valid PDF.
- "API authentication error": Confirm API credentials are correctly set up in n8n.
- "Timeout waiting for conversion": Increase max retries or retry delay in advanced options.
Links and References
- PDF4me API Documentation
- General info on PDF to Word conversion and OCR technologies can be found on vendor websites or technical blogs related to document processing.