html-to-docx-via-pandoc

n8n community node to convert HTML to DOCX using a local pandoc executable

Package Information

Downloads: 569 weekly / 1,156 monthly
Latest Version: 0.2.2

Documentation

n8n-nodes-html-to-docx-via-pandoc

Convert an HTML string to a DOCX binary using a locally installed pandoc on the worker.

Installation

  • Ensure pandoc is installed and available on PATH on the n8n worker host.
  • Install this community node package into your n8n instance according to n8n docs.

Node options

  • HTML Source: Direct string or JSON field path
  • Output Binary Property: name of the binary property (default data)
  • File Name: output file name (default document.docx)
  • Pandoc Path: path to pandoc executable (default pandoc)
  • Embed Resources: include local resources with --embed-resources --standalone when supported
  • Resource Paths: directories for pandoc to search for resources
  • Reference DOCX Source: None | Filesystem Path | Built-in Minimal Reference
  • Reference DOCX Path: path to a reference DOCX template (when Filesystem Path)
  • Clean Output Mode: enable cleanup (keeps only bold/italic, preserves lists/heading styles, removes bookmarks)
  • Punctuation Normalization: Off | Conservative (default Conservative)
  • Sanitize via CommonMark: Roundtrip to simplify structure
  • Strip Formatting Except Bold/Italic, Remove Bookmarks, Collapse Empty Runs/Paragraphs, Ensure xml:space="preserve"
  • Whitespace Policy: Collapse | Preserve Breaks (matching/sanitization only)
  • Normalization Profile (JSON): advanced profile to share across nodes
  • Timeout: seconds to wait for pandoc
  • Additional Pandoc Arguments: advanced array of extra args (tokens only)

NormalizationProfile

A JSON-serializable structure shared with the DOCX diff node.

Default:

{
  "punctuation": "conservative",
  "whitespacePolicy": "collapse",
  "collapseNBSP": true,
  "normalizeQuotes": true,
  "unicodeNormalization": "NFC"
}

Precedence:

  • If Normalization Profile (JSON) is provided, its fields override defaults.
  • Aggressive punctuation normalization is used for matching only and never mutates output.

Minimal DOCX constraints

When cleanup is enabled, output is sanitized to:

  • Runs: retain only w:b and w:i
  • Paragraph props: retain only w:pStyle and w:numPr
  • Remove bookmarks
  • Collapse empty runs/paragraphs (unless paragraph has pStyle/numPr)

Notes

  • Requires pandoc 2.11+ for --embed-resources. The node will continue without this flag on older versions.
  • For remote resources referenced by HTML, behavior may vary by pandoc setup. Prefer embedding or ensuring resources are available locally.

Development

  • Node >= 20.15
  • Install deps: npm ci
  • Build: npm run build
  • Dev compile: npm run dev
  • Lint: npm run lint
  • Tests:
    • Unit: npm run test:unit
    • Integration: npm run test:integration (requires pandoc installed)

License

MIT

Discussion