docx-converter-enhanced

Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunking, and metadata extraction. Fork of n8n-nodes-docx-converter with advanced features for AI/ML workflows.

Package Information

Released: 8/31/2025
Downloads: 10 weeklyĀ /Ā 28 monthly
Latest Version: 1.0.0
Author: widjis

Documentation

n8n-nodes-docx-converter-enhanced

šŸš€ Enhanced fork of n8n-nodes-docx-converter with advanced RAG capabilities!

This is an enhanced n8n community node that provides powerful DOCX to text conversion with RAG (Retrieval-Augmented Generation) capabilities, page-aware chunking, and comprehensive metadata extraction for AI/ML workflows.

✨ New Features (Enhanced Version)

  • šŸ“„ Page-Aware Chunking: Intelligent text chunking that preserves page boundaries
  • 🧠 RAG-Ready Output: Optimized for AI/ML and RAG systems
  • šŸ“Š Metadata Extraction: Document properties, word count, estimated pages
  • šŸ—ļø Structure Analysis: Heading detection and document structure mapping
  • šŸ”„ Multiple Output Modes: Legacy text-only, enhanced metadata, or RAG chunks
  • ⚔ Backward Compatible: Works with existing workflows

n8n is a fair-code licensed workflow automation platform.

šŸ“‹ Table of Contents

Installation
Operations
Enhanced Features
Credentials
Compatibility
Usage
Attribution
Resources
Version History

Installation

Follow the installation guide in the n8n community nodes documentation.

Operations

DOCX to Text (Legacy)

  • Convert DOCX file to plain text (backward compatible)

DOCX to Text Enhanced

  • Convert DOCX with metadata extraction
  • Page-aware chunking for RAG systems
  • Document structure analysis
  • Multiple output formats

Enhanced Features

šŸŽÆ Output Modes

  1. Text Only (Legacy): Simple text extraction for backward compatibility
  2. Enhanced with Metadata: Text + document metadata + structure analysis
  3. RAG-Ready Chunks: Page-aware chunks optimized for AI/ML workflows

šŸ“Š Metadata Extraction

  • Document title, author, creation/modification dates
  • Word count and estimated page count
  • Subject and description fields

🧩 Page-Aware Chunking

  • Configurable chunk size (words)
  • Overlapping chunks for context preservation
  • Page boundary preservation
  • Section and heading awareness

šŸ—ļø Structure Analysis

  • Heading detection and hierarchy
  • Section counting
  • Document outline extraction

Credentials

No credentials are required for this node.

Compatibility

This node requires n8n version 1.0.0 or higher. It has been tested with the latest version of n8n.

Usage

Basic Usage (Legacy Mode)

  1. Add the "DOCX to Text" or "DOCX to Text Enhanced" node to your workflow
  2. Configure the input binary field containing your DOCX file
  3. Choose "Text Only (Legacy)" output mode for simple text extraction

Enhanced Usage (RAG Mode)

  1. Add the "DOCX to Text Enhanced" node
  2. Set output mode to "RAG-Ready Chunks"
  3. Configure chunk size (default: 300 words)
  4. Set chunk overlap (default: 50 words)
  5. Enable HTML conversion for better structure preservation

Output Examples

Enhanced Mode Output:

{
  "text": "Full document text...",
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "wordCount": 1250,
    "pageCount": 5
  },
  "structure": {
    "headings": ["Introduction", "Methods", "Results"],
    "sections": 3,
    "estimatedPages": 5
  }
}

RAG Chunks Output:

{
  "chunks": [
    {
      "content": "Chunk text content...",
      "pageStart": 1,
      "pageEnd": 1,
      "section": "Introduction",
      "chunkIndex": 0,
      "position": { "start": 0, "end": 300 }
    }
  ],
  "metadata": { ... },
  "totalChunks": 15
}

Attribution

šŸ™ This project is a fork of n8n-nodes-docx-converter by Blake Martin.

Original Repository: https://github.com/cre8tiv/n8n-docx-converter
Original Author: Blake Martin (info@cre8tivsystems.com)
License: MIT

We extend our gratitude to the original author for creating the foundation that made these enhancements possible.

Resources

Version History

1.0.0 (Enhanced Fork)

  • šŸš€ Major Enhancement Release
  • ✨ Added RAG-ready chunking with page awareness
  • šŸ“Š Comprehensive metadata extraction
  • šŸ—ļø Document structure analysis
  • šŸ”„ Multiple output modes (legacy, enhanced, RAG chunks)
  • šŸ“„ Page boundary preservation in chunks
  • 🧠 Optimized for AI/ML workflows
  • ⚔ Maintained backward compatibility
  • šŸ› ļø Added new dependencies: jszip, cheerio
  • šŸ“ Enhanced documentation and examples

0.1.3 (Original)

  • Use input and output destinations

0.1.0 (Original)

  • Initial release by Blake Martin

Discussion