pdf-to-csv

n8n community node to convert PDF documents to CSV format with flexible parsing options

Package Information

Released: 9/2/2025
Downloads: 409 weekly / 409 monthly
Latest Version: 1.3.0
Author: jkong0221

Documentation

n8n-nodes-pdf-to-csv

An n8n community node for converting PDF documents to CSV format. This node provides flexible PDF parsing capabilities with multiple output formats and parsing methods.

Features

  • 📄 Convert PDF documents to CSV format
  • 🔗 Support for both binary data and URL inputs
  • 🎯 Multiple parsing methods (auto-detect tables, line-by-line, custom delimiters)
  • 📊 Flexible output formats (CSV string, JSON array, binary data)
  • ⚙️ Configurable CSV delimiters and headers
  • 🚫 Skip empty lines option
  • 🔧 Built-in error handling and validation

Installation

Community Nodes (Recommended)

  1. Go to Settings > Community Nodes in your n8n instance
  2. Select Install
  3. Enter n8n-nodes-pdf-to-csv
  4. Agree to the risks and select Install

Manual Installation

  1. Clone this repository or download the source code
  2. Install dependencies:
    pnpm install
    
  3. Build the node:
    pnpm build
    
  4. Link the node to your n8n installation:
    pnpm link
    cd ~/.n8n/custom
    pnpm link n8n-nodes-pdf-to-csv
    
  5. Restart your n8n instance

Docker Installation

If you're using n8n with Docker, you can install this node by:

  1. Create a Dockerfile extending the n8n image:

    FROM n8nio/n8n
    USER root
    RUN npm install -g n8n-nodes-pdf-to-csv
    USER node
    
  2. Build and run your custom image:

    docker build -t n8n-custom .
    docker run -it --rm --name n8n -p 5678:5678 n8n-custom
    

Usage

Basic PDF to CSV Conversion

  1. Add the PDF to CSV node to your workflow

  2. Configure the input type:

    • Binary Data: Use when PDF comes from a previous node (e.g., HTTP Request, Google Drive)
    • URL: Provide a direct URL to the PDF file
  3. Choose parsing method:

    • Auto Detect Tables: Automatically identifies table structures (recommended for simple tables)
    • Smart Pattern Detection: Advanced pattern recognition for structured reports (recommended for complex tabular data)
    • Line by Line: Treats each line as a single CSV row
    • Custom Delimiter: Uses regex patterns to split text
  4. Configure output format:

    • CSV String: Returns formatted CSV text
    • JSON Array: Returns structured JSON data
    • Binary Data: Returns downloadable CSV file

Input Configuration

Binary Data Input

{
  "inputType": "binaryData",
  "binaryPropertyName": "data"
}

URL Input

{
  "inputType": "url",
  "pdfUrl": "https://example.com/document.pdf"
}

Parsing Methods

Auto Detect Tables

Best for PDFs containing simple tabular data. Automatically detects columns based on spacing.

Smart Pattern Detection

Advanced algorithm that identifies repeating data patterns in structured reports. Excellent for:

  • Sales reports with dates, codes, and IDs
  • Financial statements with structured data
  • Multi-language documents with consistent patterns
  • Complex tabular data that spans multiple pages

Line by Line

Suitable for simple text documents where each line should become a CSV row.

Custom Delimiter

Use regex patterns to split text. Examples:

  • \\s+ - Split by multiple spaces
  • \\t - Split by tabs
  • \\| - Split by pipe characters
  • , - Split by commas

Example Workflow

{
  "nodes": [
    {
      "parameters": {
        "url": "https://example.com/report.pdf",
        "options": {}
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 1,
      "position": [250, 300],
      "id": "http-request",
      "name": "Download PDF"
    },
    {
      "parameters": {
        "operation": "convert",
        "inputType": "binaryData",
        "binaryPropertyName": "data",
        "parsingMethod": "autoDetect",
        "outputFormat": "csvString",
        "csvDelimiter": ",",
        "includeHeaders": true,
        "skipEmptyLines": true
      },
      "type": "n8n-nodes-pdf-to-csv.pdfToCsv",
      "typeVersion": 1,
      "position": [450, 300],
      "id": "pdf-to-csv",
      "name": "PDF to CSV"
    }
  ]
}

Configuration Options

Parameter Type Default Description
inputType Options binaryData Source of PDF file (binaryData/url)
binaryPropertyName String data Name of binary property containing PDF
pdfUrl String - URL of PDF file to convert
parsingMethod Options autoDetect Method for parsing PDF content
customDelimiter String \\s+ Regex pattern for custom delimiter parsing
csvDelimiter String , Delimiter for CSV output
includeHeaders Boolean true Treat first row as headers
skipEmptyLines Boolean true Skip empty lines in PDF
outputFormat Options csvString Format of output data

Supported File Types

  • PDF documents (.pdf)
  • Password-protected PDFs are not currently supported

Error Handling

The node includes comprehensive error handling for:

  • Invalid PDF files
  • Network errors when fetching URLs
  • Parsing failures
  • Memory limitations for large files

Errors can be handled using n8n's built-in error handling mechanisms.

Limitations

  • Large PDF files may consume significant memory
  • Complex PDF layouts may not parse perfectly with auto-detection
  • Scanned PDFs (images) require OCR preprocessing
  • Password-protected PDFs are not supported

Development

Prerequisites

  • Node.js 18.10 or higher
  • pnpm 7.18 or higher

Setup

git clone https://github.com/your-username/n8n-nodes-pdf-to-csv.git
cd n8n-nodes-pdf-to-csv
pnpm install

Build

pnpm build

Development Mode

pnpm dev

Linting

pnpm lint
pnpm lintfix

Testing

pnpm test

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Run tests and linting: pnpm test && pnpm lint
  5. Commit your changes: git commit -m 'Add amazing feature'
  6. Push to the branch: git push origin feature/amazing-feature
  7. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Changelog

v1.0.0

  • Initial release
  • Basic PDF to CSV conversion
  • Multiple parsing methods
  • Flexible output formats
  • Comprehensive error handling

Discussion