Overview
This node converts DOCX files (Microsoft Word documents) into Markdown text format. It is useful when you want to extract and transform rich text content from DOCX files into a lightweight, plain-text markup language that is widely used for documentation, blogging, and note-taking.
Common scenarios include:
- Automating the conversion of uploaded DOCX reports or documents into Markdown for further processing or publishing.
- Integrating with workflows that require Markdown input but receive DOCX files.
- Extracting textual content from DOCX files while preserving formatting such as headings, lists, tables, and links in Markdown syntax.
For example, you could use this node to convert meeting notes saved as DOCX files into Markdown to publish on a wiki or version control system.
Properties
| Name | Meaning |
|---|---|
| Input Binary Field | The name of the binary input field containing the DOCX file to be converted (e.g., "data"). |
| Destination Output Field | The name of the output JSON field where the converted Markdown text will be stored (e.g., "text"). |
Output
The node outputs an array of JSON objects, each containing one property named as specified by the "Destination Output Field" property. This property holds the Markdown string converted from the input DOCX file.
Example output JSON structure:
[
{
"text": "# Heading\n\nSome paragraph text converted from DOCX."
}
]
No binary data is output by this node; it only produces Markdown text in JSON form.
Dependencies
- Uses the
mammothlibrary to convert DOCX files to HTML. - Uses
@joplin/turndownwith GitHub Flavored Markdown plugin to convert HTML to Markdown. - Uses
markdownlintto lint and auto-fix the generated Markdown for style consistency. - Parses and modifies HTML tables to ensure proper Markdown table conversion.
No external API keys or credentials are required. The node operates entirely on the provided binary DOCX data.
Troubleshooting
Error: No binary data found for field "X"
This error occurs if the specified input binary field does not exist or contains no data. Ensure the binary input field name matches exactly the field containing the DOCX file.Malformed Markdown output or missing content
If the DOCX file contains complex elements unsupported by the converter, some formatting might be lost or altered. Review the original DOCX content and consider simplifying it if necessary.Performance issues with very large DOCX files
Large documents may take longer to process. Consider splitting large files before conversion.
Links and References
- Mammoth.js GitHub - DOCX to HTML conversion library used internally.
- Turndown GitHub - HTML to Markdown converter.
- Markdownlint GitHub - Markdown linter and fixer.