pdf-page-split

n8n nodes to split PDF documents into individual pages and convert DOCX to PDF with page splitting

Package Information

Downloads: 25 weekly / 58 monthly
Latest Version: 0.2.0
Author: Matheus Kindrazki

Documentation

n8n-nodes-pdf-page-split

n8n.io - Workflow Automation

NPM Version
License: MIT

Powerful n8n community nodes for PDF document processing: split PDFs into pages and convert DOCX to PDF with page splitting

🌟 Features

PDF Page Split Node

  • 📄 PDF Splitting: Split multi-page PDFs into individual single-page files
  • 🎯 Page Selection: Process specific page ranges with flexible selection options
  • 📝 Custom Naming: Configure output file names with prefixes and page numbers
  • 🔄 Batch Processing: Handle multiple PDFs in a single workflow

DOCX to PDF Split Node (New in v0.2.0)

  • 📑 DOCX Conversion: Convert DOCX documents to PDF format
  • 📄 Automatic Splitting: Split converted PDFs into individual pages
  • 🎯 Page Selection: Extract specific pages from converted documents
  • 📝 Custom Naming: Configure output file names with prefixes and page numbers
  • 💾 Optional Full PDF: Keep the complete converted PDF alongside individual pages

General Features

  • 🚀 High Performance: Pure JavaScript implementation for maximum compatibility
  • 🐳 Docker Ready: Works seamlessly in containerized environments

📋 Prerequisites

  • n8n version 0.147.0 or newer
  • Node.js version 16 or newer

💻 Installation

Via n8n Interface

  1. Open your n8n instance
  2. Go to Settings > Community Nodes
  3. Click on Install
  4. Enter n8n-nodes-pdf-page-split in the Name field
  5. Click Install

Via npm

npm install n8n-nodes-pdf-page-split

Via yarn

yarn add n8n-nodes-pdf-page-split

🔧 Configuration

PDF Page Split Node

Input Parameters

Parameter Type Description Default
Binary Property string Name of the binary property containing the PDF file data
Page Range string Range of pages to process (e.g., "1-5,8,11-13") (all pages)
File Name Prefix string Prefix for output file names page_
Start Number number Starting number for page numbering 1

DOCX to PDF Split Node

Input Parameters

Parameter Type Description Default
Binary Property string Name of the binary property containing the DOCX file data
Page Range string Range of pages to process (e.g., "1-5,8,11-13") (all pages)
File Name Prefix string Prefix for output file names page_
Start Number number Starting number for page numbering 1
Keep Original PDF boolean Include the full converted PDF in output false

Output

Each processed page generates an item with:

  • Binary Data: The PDF page as a binary file
  • JSON Data:
    • pageNumber: Current page number
    • totalPages: Total pages in original/converted document
    • fileName: Generated file name

📚 Usage Examples

Basic PDF Splitting

// Split a PDF into individual pages
[
  {
    "node": "PDF Page Split",
    "parameters": {
      "binaryPropertyName": "data",
      "fileNamePrefix": "page_"
    }
  }
]

Extract Specific Pages from PDF

// Extract pages 1-3 and 5
[
  {
    "node": "PDF Page Split",
    "parameters": {
      "binaryPropertyName": "data",
      "pageRange": "1-3,5",
      "fileNamePrefix": "extract_"
    }
  }
]

Convert DOCX to PDF and Split

// Convert DOCX to PDF and split into pages
[
  {
    "node": "DOCX to PDF Split",
    "parameters": {
      "binaryPropertyName": "data",
      "fileNamePrefix": "converted_page_",
      "keepOriginalPdf": true
    }
  }
]

Convert DOCX and Extract Specific Pages

// Convert DOCX and extract specific pages
[
  {
    "node": "DOCX to PDF Split",
    "parameters": {
      "binaryPropertyName": "data",
      "pageRange": "1-5,10",
      "fileNamePrefix": "doc_page_",
      "keepOriginalPdf": false
    }
  }
]

🔍 Example Workflows

1. Split and Save PDF Pages

  1. HTTP Request → Download PDF from URL
  2. PDF Page Split → Split into pages
  3. Write Binary File → Save pages locally

2. Process Selected Pages

  1. Read Binary File → Load local PDF
  2. PDF Page Split → Extract specific pages
    • Set "Page Range" to "1-3,5,10-12"
  3. Google Drive → Upload selected pages

3. Convert DOCX and Process

  1. Read Binary File → Load DOCX document
  2. DOCX to PDF Split → Convert and split
    • Enable "Keep Original PDF" for complete document
  3. Email Send → Send individual pages as attachments

4. Batch DOCX Processing

  1. Google Drive → Download DOCX files
  2. DOCX to PDF Split → Convert each to PDF pages
  3. Compress → Create ZIP with all pages
  4. S3 → Upload to storage

⚠️ Troubleshooting

Common Issues

Issue Solution
"No binary data found" Ensure previous node outputs binary data
Empty PDF output Verify input PDF is valid and not corrupted
Memory errors Process fewer pages at once for large PDFs

Best Practices

  • Verify PDF is not password protected
  • Use page ranges for large documents
  • Monitor memory usage in production

🔧 Technical Details

Libraries Used

  • pdf-lib: PDF manipulation and splitting
  • mammoth: DOCX text and structure extraction
  • pdfkit: PDF generation from extracted content

Key Features

  • Pure JavaScript implementation
  • No native dependencies
  • Cross-platform compatibility
  • Docker-friendly operation

Limitations

  • Does not support password-protected PDFs or DOCX files
  • Cannot extract text or metadata
  • Maximum file size depends on available memory
  • DOCX conversion preserves formatting but may vary from native Office rendering

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📝 License

MIT

🙏 Acknowledgments

  • n8n - For the amazing workflow automation platform
  • pdf-lib - For reliable PDF manipulation
  • All our contributors and users

Discussion