Docx To Text icon

Docx To Text

Converts Docx files to ={{ $parameter["outputFormat"] }}

Overview

This node converts DOCX files into text or HTML format. It is useful when you need to extract readable content from Word documents for further processing, such as indexing, searching, or displaying in web applications. For example, you can use this node to convert uploaded DOCX reports into plain text summaries or HTML snippets for email templates.

Properties

Name Meaning
Input Binary Field Name of the input field containing the DOCX binary file to be converted.
Output Field Name of the output field where the extracted text or HTML will be stored.
Output Format Format of the output content: either raw unformatted text ("Text") or converted HTML ("HTML").

Output

The node outputs a JSON array where each item contains a single field (named as specified by the "Output Field" property) holding the extracted content from the DOCX file. The content is either plain text or HTML depending on the selected output format.

Example output JSON structure:

[
  {
    "text": "Extracted plain text content here"
  }
]

or if HTML is chosen:

[
  {
    "html": "<p>Extracted HTML content here</p>"
  }
]

No binary data is output by this node; it only produces textual content.

Dependencies

  • Uses the mammoth library to perform DOCX to text/HTML conversion.
  • Requires the input DOCX file to be provided as binary data in the specified input field.
  • No external API keys or credentials are needed.

Troubleshooting

  • Error: No binary data found for field "XYZ"
    This error occurs if the specified input binary field does not contain any binary data at runtime. Ensure that the input field name matches exactly the field containing the DOCX file and that the previous node provides valid binary data.

  • If the output is empty or incorrect, verify that the input DOCX file is valid and not corrupted.

  • Selecting "HTML" output format may produce HTML with limited styling since the conversion focuses on semantic content rather than exact formatting.

Links and References

Discussion