Overview
This node converts .docx files into Markdown format. It is useful when you want to extract and transform Word documents into a lightweight, plain-text markup language for easier editing, publishing, or integration with other systems that support Markdown.
Common scenarios include:
- Automating the conversion of Word reports or documentation into Markdown for static site generators.
- Extracting content from
.docxfiles to store in databases or CMSs that use Markdown. - Preparing documents for platforms that do not support
.docxbut accept Markdown input.
The node supports outputting either JSON with Markdown text (and optionally HTML) or a binary Markdown file.
Properties
| Name | Meaning |
|---|---|
| Binary Property | The name of the input binary property containing the .docx file to convert. |
| Output Mode | Choose between outputting JSON (Markdown stored in a JSON field) or a binary .md file. Options: json, binary. |
| Markdown Field | (JSON output only) The name of the JSON field where the Markdown text will be stored. |
| Include HTML | (JSON output only) Whether to also include an HTML version of the document in the JSON output. |
| Preserve Structure | Whether to try to preserve document structure such as headings, lists, and tables during conversion. |
| Output Binary Property | (Binary output only) The binary property name where the resulting .md file will be written. |
| Output Filename | (Binary output only) The filename to assign to the output Markdown file. Defaults to document.md. |
Output
When Output Mode is set to
json:- The node outputs JSON data with a field containing the Markdown text (field name configurable).
- Optionally, it can also include an HTML representation of the document.
- A
warningsarray may be included containing any messages generated during conversion.
When Output Mode is set to
binary:- The node outputs a binary file containing the Markdown content.
- The binary property name and filename are configurable.
- The JSON part includes any warnings but no Markdown text.
Dependencies
- Uses the
mammothlibrary to convert.docxfiles to HTML. - Uses the
turndownlibrary to convert HTML to Markdown. - Requires the input
.docxfile to be provided as binary data in the specified binary property. - No external API keys or services are required.
Troubleshooting
Error: Binary property not found
Occurs if the specified binary property does not exist on the input item.
Resolution: Verify the binary property name matches the actual input data.Error: Expected a .docx file
Happens if the input binary data is not recognized as a.docxfile by MIME type or filename extension.
Resolution: Ensure the input binary data is a valid.docxfile.Warnings in output
Conversion warnings are included in the output underwarnings. These do not stop execution but indicate potential issues in the source document or conversion process.If the Markdown output looks incorrect or incomplete, try toggling the "Preserve Structure" option to better maintain headings, lists, and tables.
Links and References
- Mammoth.js GitHub – Library used for
.docxto HTML conversion - Turndown GitHub – Library used for HTML to Markdown conversion