DOCX → Markdown icon

DOCX → Markdown

Convert a .docx document to Markdown

Overview

This node converts .docx files into Markdown format. It is useful when you want to extract and transform Word documents into a lightweight, plain-text markup language for easier editing, publishing, or integration with other systems that support Markdown.

Common scenarios include:

  • Automating the conversion of Word reports or documentation into Markdown for static site generators.
  • Extracting content from .docx files to store in databases or CMSs that use Markdown.
  • Preparing documents for platforms that do not support .docx but accept Markdown input.

The node supports outputting either JSON with Markdown text (and optionally HTML) or a binary Markdown file.

Properties

Name Meaning
Binary Property The name of the input binary property containing the .docx file to convert.
Output Mode Choose between outputting JSON (Markdown stored in a JSON field) or a binary .md file. Options: json, binary.
Markdown Field (JSON output only) The name of the JSON field where the Markdown text will be stored.
Include HTML (JSON output only) Whether to also include an HTML version of the document in the JSON output.
Preserve Structure Whether to try to preserve document structure such as headings, lists, and tables during conversion.
Output Binary Property (Binary output only) The binary property name where the resulting .md file will be written.
Output Filename (Binary output only) The filename to assign to the output Markdown file. Defaults to document.md.

Output

  • When Output Mode is set to json:

    • The node outputs JSON data with a field containing the Markdown text (field name configurable).
    • Optionally, it can also include an HTML representation of the document.
    • A warnings array may be included containing any messages generated during conversion.
  • When Output Mode is set to binary:

    • The node outputs a binary file containing the Markdown content.
    • The binary property name and filename are configurable.
    • The JSON part includes any warnings but no Markdown text.

Dependencies

  • Uses the mammoth library to convert .docx files to HTML.
  • Uses the turndown library to convert HTML to Markdown.
  • Requires the input .docx file to be provided as binary data in the specified binary property.
  • No external API keys or services are required.

Troubleshooting

  • Error: Binary property not found
    Occurs if the specified binary property does not exist on the input item.
    Resolution: Verify the binary property name matches the actual input data.

  • Error: Expected a .docx file
    Happens if the input binary data is not recognized as a .docx file by MIME type or filename extension.
    Resolution: Ensure the input binary data is a valid .docx file.

  • Warnings in output
    Conversion warnings are included in the output under warnings. These do not stop execution but indicate potential issues in the source document or conversion process.

  • If the Markdown output looks incorrect or incomplete, try toggling the "Preserve Structure" option to better maintain headings, lists, and tables.

Links and References

Discussion