Package Information
Available Nodes
Documentation
n8n-nodes-pdf-to-csv
An n8n community node for converting PDF documents to CSV format. This node provides flexible PDF parsing capabilities with multiple output formats and parsing methods.
Features
- 📄 Convert PDF documents to CSV format
- 🔗 Support for both binary data and URL inputs
- 🎯 Multiple parsing methods (auto-detect tables, line-by-line, custom delimiters)
- 📊 Flexible output formats (CSV string, JSON array, binary data)
- ⚙️ Configurable CSV delimiters and headers
- 🚫 Skip empty lines option
- 🔧 Built-in error handling and validation
Installation
Community Nodes (Recommended)
- Go to Settings > Community Nodes in your n8n instance
- Select Install
- Enter
n8n-nodes-pdf-to-csv - Agree to the risks and select Install
Manual Installation
- Clone this repository or download the source code
- Install dependencies:
pnpm install - Build the node:
pnpm build - Link the node to your n8n installation:
pnpm link cd ~/.n8n/custom pnpm link n8n-nodes-pdf-to-csv - Restart your n8n instance
Docker Installation
If you're using n8n with Docker, you can install this node by:
Create a
Dockerfileextending the n8n image:FROM n8nio/n8n USER root RUN npm install -g n8n-nodes-pdf-to-csv USER nodeBuild and run your custom image:
docker build -t n8n-custom . docker run -it --rm --name n8n -p 5678:5678 n8n-custom
Usage
Basic PDF to CSV Conversion
Add the PDF to CSV node to your workflow
Configure the input type:
- Binary Data: Use when PDF comes from a previous node (e.g., HTTP Request, Google Drive)
- URL: Provide a direct URL to the PDF file
Choose parsing method:
- Auto Detect Tables: Automatically identifies table structures (recommended for simple tables)
- Smart Pattern Detection: Advanced pattern recognition for structured reports (recommended for complex tabular data)
- Line by Line: Treats each line as a single CSV row
- Custom Delimiter: Uses regex patterns to split text
Configure output format:
- CSV String: Returns formatted CSV text
- JSON Array: Returns structured JSON data
- Binary Data: Returns downloadable CSV file
Input Configuration
Binary Data Input
{
"inputType": "binaryData",
"binaryPropertyName": "data"
}
URL Input
{
"inputType": "url",
"pdfUrl": "https://example.com/document.pdf"
}
Parsing Methods
Auto Detect Tables
Best for PDFs containing simple tabular data. Automatically detects columns based on spacing.
Smart Pattern Detection
Advanced algorithm that identifies repeating data patterns in structured reports. Excellent for:
- Sales reports with dates, codes, and IDs
- Financial statements with structured data
- Multi-language documents with consistent patterns
- Complex tabular data that spans multiple pages
Line by Line
Suitable for simple text documents where each line should become a CSV row.
Custom Delimiter
Use regex patterns to split text. Examples:
\\s+- Split by multiple spaces\\t- Split by tabs\\|- Split by pipe characters,- Split by commas
Example Workflow
{
"nodes": [
{
"parameters": {
"url": "https://example.com/report.pdf",
"options": {}
},
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 1,
"position": [250, 300],
"id": "http-request",
"name": "Download PDF"
},
{
"parameters": {
"operation": "convert",
"inputType": "binaryData",
"binaryPropertyName": "data",
"parsingMethod": "autoDetect",
"outputFormat": "csvString",
"csvDelimiter": ",",
"includeHeaders": true,
"skipEmptyLines": true
},
"type": "n8n-nodes-pdf-to-csv.pdfToCsv",
"typeVersion": 1,
"position": [450, 300],
"id": "pdf-to-csv",
"name": "PDF to CSV"
}
]
}
Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
inputType |
Options | binaryData |
Source of PDF file (binaryData/url) |
binaryPropertyName |
String | data |
Name of binary property containing PDF |
pdfUrl |
String | - | URL of PDF file to convert |
parsingMethod |
Options | autoDetect |
Method for parsing PDF content |
customDelimiter |
String | \\s+ |
Regex pattern for custom delimiter parsing |
csvDelimiter |
String | , |
Delimiter for CSV output |
includeHeaders |
Boolean | true |
Treat first row as headers |
skipEmptyLines |
Boolean | true |
Skip empty lines in PDF |
outputFormat |
Options | csvString |
Format of output data |
Supported File Types
- PDF documents (
.pdf) - Password-protected PDFs are not currently supported
Error Handling
The node includes comprehensive error handling for:
- Invalid PDF files
- Network errors when fetching URLs
- Parsing failures
- Memory limitations for large files
Errors can be handled using n8n's built-in error handling mechanisms.
Limitations
- Large PDF files may consume significant memory
- Complex PDF layouts may not parse perfectly with auto-detection
- Scanned PDFs (images) require OCR preprocessing
- Password-protected PDFs are not supported
Development
Prerequisites
- Node.js 18.10 or higher
- pnpm 7.18 or higher
Setup
git clone https://github.com/your-username/n8n-nodes-pdf-to-csv.git
cd n8n-nodes-pdf-to-csv
pnpm install
Build
pnpm build
Development Mode
pnpm dev
Linting
pnpm lint
pnpm lintfix
Testing
pnpm test
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests and linting:
pnpm test && pnpm lint - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- 📧 Email: your.email@example.com
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
Changelog
v1.0.0
- Initial release
- Basic PDF to CSV conversion
- Multiple parsing methods
- Flexible output formats
- Comprehensive error handling