Overview
This node converts DOCX files (Microsoft Word documents) into Markdown text format. It is useful when you want to extract and transform rich text content from DOCX files into a lightweight, plain-text markup language that is widely used for documentation, blogging, and content management systems.
Common scenarios include:
- Automating the conversion of uploaded DOCX reports or articles into Markdown for publishing on static site generators.
- Extracting textual content from DOCX attachments in workflows for further processing or storage.
- Removing images from DOCX content while preserving text formatting in Markdown.
For example, you can feed a DOCX file binary data into this node and get clean Markdown output, optionally without images, which can then be saved or sent to other services.
Properties
| Name | Meaning |
|---|---|
| Input Binary Field | The name of the input binary field containing the DOCX file to convert (default: "data"). |
| Destination Output Field | The name of the output JSON field where the converted Markdown text will be stored (default: "text"). |
| Remove Images | Boolean option to remove all images from the converted Markdown output (true/false). |
Output
The node outputs an array of JSON objects, each containing one property with the key specified by the "Destination Output Field" property. This property holds the Markdown text converted from the DOCX file.
Example output JSON structure:
[
{
"text": "# Converted Markdown content here..."
}
]
If "Remove Images" is enabled, the Markdown output will not contain any image elements.
The node does not output binary data; it only outputs the Markdown text as JSON.
Dependencies
- Uses the
mammothlibrary to convert DOCX binary data to HTML. - Uses
@joplin/turndownand@joplin/turndown-plugin-gfmto convert HTML to GitHub-Flavored Markdown. - Uses
markdownlintto lint and auto-fix the generated Markdown for style consistency. - Uses
node-html-parserto manipulate HTML before conversion (e.g., adjusting table headers).
No external API keys or credentials are required. All processing is done locally within the node.
Troubleshooting
Error: No binary data found for field "..."
This error occurs if the specified input binary field does not exist or contains no data. Ensure the input binary field name matches exactly the field containing the DOCX file in the incoming data.Malformed or unsupported DOCX files
If the DOCX file is corrupted or uses features not supported by the underlying libraries, the conversion may fail or produce incomplete Markdown. Verify the DOCX file integrity and try simplifying its content.Markdown output missing expected content
Check if "Remove Images" is enabled, which strips out all images. Also, complex DOCX formatting might not fully translate to Markdown due to limitations in the conversion libraries.
Links and References
- Mammoth.js GitHub - DOCX to HTML conversion library used internally
- Turndown GitHub - HTML to Markdown converter
- GitHub Flavored Markdown Spec - Markdown flavor targeted by the plugin
- Markdownlint - Markdown style linter and fixer