DOCX to Text

Converts DOCX file to plain text

Overview

This node converts DOCX files into plain text or HTML format. It is useful when you need to extract readable content from DOCX documents within an automation workflow, such as processing uploaded reports, extracting text for indexing, or converting documents for display in web applications.

Practical examples:

  • Extracting raw text from DOCX resumes submitted via a form to analyze candidate information.
  • Converting DOCX meeting notes into HTML to embed them directly into a webpage or email.
  • Automating the extraction of contract text for further processing or storage in a database.

Properties

Name Meaning
Input Binary Field The name of the input binary field containing the DOCX file to be converted.
Destination Output Field The name of the output JSON field where the extracted text or HTML will be stored.
Output Format The format of the output: either raw unformatted text ("Text") or HTML formatted text ("HTML").

Output

The node outputs a JSON array where each item corresponds to one input item processed. Each output item contains a single field (named as specified by the "Destination Output Field" property) holding the extracted content:

  • If "Text" is selected, the field contains the raw, unformatted text extracted from the DOCX file.
  • If "HTML" is selected, the field contains the HTML representation of the DOCX content.

No binary data is output by this node.

Example output JSON structure if destinationOutputField is "text":

[
  {
    "text": "Extracted plain text from the DOCX document."
  }
]

Dependencies

  • Uses the mammoth library to perform DOCX to text/HTML conversion.
  • Requires input binary data containing the DOCX file.
  • No external API keys or services are needed.
  • Must be configured with the correct input binary field name matching the incoming binary data.

Troubleshooting

  • Error: No binary data found for field "X"
    This occurs if the specified input binary field does not exist or contains no data on the current item. Ensure that the binary data is correctly passed into the node and that the field name matches exactly.

  • Empty or incorrect output
    Verify that the input DOCX file is valid and not corrupted. Also, check that the binary data is properly loaded before this node.

  • Unexpected output format
    Confirm that the "Output Format" property is set correctly to either "Text" or "HTML" depending on your needs.

Links and References

Discussion