Package Information
Documentation
n8n-nodes-docx-converter-enhanced
š Enhanced fork of n8n-nodes-docx-converter with advanced RAG capabilities!
This is an enhanced n8n community node that provides powerful DOCX to text conversion with RAG (Retrieval-Augmented Generation) capabilities, page-aware chunking, and comprehensive metadata extraction for AI/ML workflows.
⨠New Features (Enhanced Version)
- š Page-Aware Chunking: Intelligent text chunking that preserves page boundaries
- š§ RAG-Ready Output: Optimized for AI/ML and RAG systems
- š Metadata Extraction: Document properties, word count, estimated pages
- šļø Structure Analysis: Heading detection and document structure mapping
- š Multiple Output Modes: Legacy text-only, enhanced metadata, or RAG chunks
- ā” Backward Compatible: Works with existing workflows
n8n is a fair-code licensed workflow automation platform.
š Table of Contents
Installation
Operations
Enhanced Features
Credentials
Compatibility
Usage
Attribution
Resources
Version History
Installation
Follow the installation guide in the n8n community nodes documentation.
Operations
DOCX to Text (Legacy)
- Convert DOCX file to plain text (backward compatible)
DOCX to Text Enhanced
- Convert DOCX with metadata extraction
- Page-aware chunking for RAG systems
- Document structure analysis
- Multiple output formats
Enhanced Features
šÆ Output Modes
- Text Only (Legacy): Simple text extraction for backward compatibility
- Enhanced with Metadata: Text + document metadata + structure analysis
- RAG-Ready Chunks: Page-aware chunks optimized for AI/ML workflows
š Metadata Extraction
- Document title, author, creation/modification dates
- Word count and estimated page count
- Subject and description fields
š§© Page-Aware Chunking
- Configurable chunk size (words)
- Overlapping chunks for context preservation
- Page boundary preservation
- Section and heading awareness
šļø Structure Analysis
- Heading detection and hierarchy
- Section counting
- Document outline extraction
Credentials
No credentials are required for this node.
Compatibility
This node requires n8n version 1.0.0 or higher. It has been tested with the latest version of n8n.
Usage
Basic Usage (Legacy Mode)
- Add the "DOCX to Text" or "DOCX to Text Enhanced" node to your workflow
- Configure the input binary field containing your DOCX file
- Choose "Text Only (Legacy)" output mode for simple text extraction
Enhanced Usage (RAG Mode)
- Add the "DOCX to Text Enhanced" node
- Set output mode to "RAG-Ready Chunks"
- Configure chunk size (default: 300 words)
- Set chunk overlap (default: 50 words)
- Enable HTML conversion for better structure preservation
Output Examples
Enhanced Mode Output:
{
"text": "Full document text...",
"metadata": {
"title": "Document Title",
"author": "Author Name",
"wordCount": 1250,
"pageCount": 5
},
"structure": {
"headings": ["Introduction", "Methods", "Results"],
"sections": 3,
"estimatedPages": 5
}
}
RAG Chunks Output:
{
"chunks": [
{
"content": "Chunk text content...",
"pageStart": 1,
"pageEnd": 1,
"section": "Introduction",
"chunkIndex": 0,
"position": { "start": 0, "end": 300 }
}
],
"metadata": { ... },
"totalChunks": 15
}
Attribution
š This project is a fork of n8n-nodes-docx-converter by Blake Martin.
Original Repository: https://github.com/cre8tiv/n8n-docx-converter
Original Author: Blake Martin (info@cre8tivsystems.com)
License: MIT
We extend our gratitude to the original author for creating the foundation that made these enhancements possible.
Resources
- n8n community nodes documentation
- Mammoth documentation
- Original repository
- RAG Enhancement Documentation
Version History
1.0.0 (Enhanced Fork)
- š Major Enhancement Release
- ⨠Added RAG-ready chunking with page awareness
- š Comprehensive metadata extraction
- šļø Document structure analysis
- š Multiple output modes (legacy, enhanced, RAG chunks)
- š Page boundary preservation in chunks
- š§ Optimized for AI/ML workflows
- ā” Maintained backward compatibility
- š ļø Added new dependencies: jszip, cheerio
- š Enhanced documentation and examples
0.1.3 (Original)
- Use input and output destinations
0.1.0 (Original)
- Initial release by Blake Martin