DOCX to Text

Converts DOCX file to plain text

Overview

This node converts DOCX files into plain text. It is useful when you have DOCX documents as binary data and want to extract their textual content for further processing, such as indexing, searching, or transforming the text in workflows. For example, you might use this node to convert uploaded DOCX reports into text summaries or to extract text from DOCX attachments in emails.

Properties

Name Meaning
Input Binary Field The name of the input binary field containing the DOCX file. Example: "data".
Destination Output Field The name of the output JSON field where the extracted plain text will be stored. Example: "text".

Output

The node outputs an array of JSON objects, each containing a single property whose key is the configured destination output field name and whose value is the extracted plain text from the corresponding DOCX file.

Example output JSON structure:

[
  {
    "text": "Extracted plain text from the DOCX file"
  }
]

The node does not output any binary data; it only outputs the extracted text as JSON.

Dependencies

  • Uses the mammoth library to extract raw text from DOCX files.
  • Requires the input DOCX file to be provided as binary data in the specified input binary field.
  • No external API keys or services are required.

Troubleshooting

  • Error: No binary data found for field "X"
    This error occurs if the specified input binary field does not exist or contains no data on the current item. Ensure that the binary data is correctly passed into the node and that the field name matches exactly.

  • Empty or incorrect text output
    If the output text is empty or missing expected content, verify that the input DOCX file is valid and not corrupted. Also, confirm that the binary data is properly loaded.

  • Performance considerations
    Large DOCX files may take longer to process. Consider splitting large documents before conversion if performance is critical.

Links and References

Discussion