Word2Text icon

Word2Text

Convert Word document (.docx) to plain text

Overview

This node converts Microsoft Word document files (.docx) from binary input into plain text output. It is useful when you need to extract readable text content from Word documents for further processing, such as indexing, searching, or transforming the text in workflows.

Common scenarios include:

  • Extracting text from uploaded Word files to analyze or store in a database.
  • Converting .docx attachments in emails to plain text for automated processing.
  • Preparing document content for natural language processing or sentiment analysis.

Properties

Name Meaning
Binary Property The name of the binary property that contains the Word document data. Default is "data".

Output

The node outputs an array of items where each item has a json field containing:

  • text: A string representing the extracted plain text content from the Word document.

No binary output is produced by this node; it only outputs the extracted text in JSON format.

Example output JSON structure:

{
  "json": {
    "text": "Extracted plain text from the Word document"
  }
}

Dependencies

  • This node depends on the external library mammoth, which is used to extract raw text from .docx files.
  • Requires the input data to contain valid binary data representing a Word document under the specified binary property.
  • No additional API keys or external services are required.

Troubleshooting

  • Common issues:
    • If the specified binary property does not exist or does not contain valid .docx binary data, the node will throw an error.
    • Providing non-Word document binary data will result in extraction failure or empty text output.
  • Error messages:
    • Errors related to missing binary data usually indicate that the binary property name is incorrect or the input data lacks the expected binary content.
    • Errors from the mammoth library typically mean the file is corrupted or not a valid .docx file.
  • Resolution:
    • Ensure the binary property name matches the actual binary data property in the input.
    • Verify that the input binary data is a valid .docx file.
    • Use preceding nodes to correctly load or download the Word document into binary form.

Links and References

Discussion