Overview
This node converts Microsoft Word document files (.docx) from binary input into plain text output. It is useful when you need to extract readable text content from Word documents for further processing, such as indexing, searching, or transforming the text in workflows.
Common scenarios include:
- Extracting text from uploaded Word files to analyze or store in a database.
- Converting .docx attachments in emails to plain text for automated processing.
- Preparing document content for natural language processing or sentiment analysis.
Properties
| Name | Meaning |
|---|---|
| Binary Property | The name of the binary property that contains the Word document data. Default is "data". |
Output
The node outputs an array of items where each item has a json field containing:
text: A string representing the extracted plain text content from the Word document.
No binary output is produced by this node; it only outputs the extracted text in JSON format.
Example output JSON structure:
{
"json": {
"text": "Extracted plain text from the Word document"
}
}
Dependencies
- This node depends on the external library
mammoth, which is used to extract raw text from.docxfiles. - Requires the input data to contain valid binary data representing a Word document under the specified binary property.
- No additional API keys or external services are required.
Troubleshooting
- Common issues:
- If the specified binary property does not exist or does not contain valid
.docxbinary data, the node will throw an error. - Providing non-Word document binary data will result in extraction failure or empty text output.
- If the specified binary property does not exist or does not contain valid
- Error messages:
- Errors related to missing binary data usually indicate that the binary property name is incorrect or the input data lacks the expected binary content.
- Errors from the
mammothlibrary typically mean the file is corrupted or not a valid.docxfile.
- Resolution:
- Ensure the binary property name matches the actual binary data property in the input.
- Verify that the input binary data is a valid
.docxfile. - Use preceding nodes to correctly load or download the Word document into binary form.
Links and References
- Mammoth.js GitHub Repository – Library used for extracting text from Word documents.
- n8n Documentation on Binary Data – Guide on handling binary data in n8n nodes.