Actions80
- Extract Text From Word
- Find And Replace Text
- Convert PDF To Editable PDF Using OCR
- Create Swiss QR Bill
- Split PDF By Barcode
- Split PDF By Swiss QR
- Split PDF By Text
- Split PDF Regular
- Create PDF/A
- Convert HTML To PDF
- Convert Markdown To PDF
- Upload File To PDF4me
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Fill PDF Form
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- AI-Invoice Parser
- AI-Process HealthCard
- AI-Process Contract
- Generate Barcode
- Classify Document
- Parse Document
- Linearize PDF
- Flatten PDF
- Convert To PDF
- Json To Excel
- Convert PDF To Excel
- Convert PDF To Word
- Convert PDF To PowerPoint
- Convert VISIO
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Extract Pages
- Merge Multiple PDFs
- Overlay PDFs
- Rotate Document
- Rotate Page
- Sign PDF
- URL to PDF
- Add Image Watermark To Image
- Add Text Watermark To Image
- Compress Image
- Convert Image Format
- Create Images From PDF
- Flip Image
- Get Image Metadata
- Image Extract Text
- Remove EXIF Tags From Image
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Image
- Rotate Image By EXIF Data
- Compress PDF
- Get PDF Metadata
- Repair PDF Document
- Get Document From Pdf4me
- Update Hyperlinks Annotation
- Protect Document
- Unlock PDF
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Generate Document Single
- Generate Documents Multiple
- Get Tracking Changes In Word
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Attachment From PDF
- Extract Text By Expression
- Extract Table From PDF
- Extract Resources
Overview
This node operation extracts text content from Word documents. It supports multiple input methods for the Word file, including binary data from a previous node, a base64 encoded string, or a URL pointing to the document. Users can specify page ranges to extract text from specific pages only and apply options such as removing comments, headers/footers, and accepting tracked changes.
This node is beneficial in scenarios where automated processing of Word documents is needed, such as extracting textual data for indexing, analysis, or integration with other systems without manual intervention. For example, it can be used to extract contract clauses from uploaded contracts, parse reports for key information, or convert Word documents into plain text for further processing.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the Word file: - Binary Data (from previous node) - Base64 String (directly provide base64 encoded content) - URL (link to the Word file) |
| Input Binary Field | Name of the binary property containing the Word file (default "data"). Used when Input Data Type is Binary Data. |
| Base64 Word Content | Base64 encoded content of the Word document. Used when Input Data Type is Base64 String. |
| Word URL | URL to the Word file to extract text from. Used when Input Data Type is URL. |
| Document Name | Name assigned to the document during processing (default "document.docx"). |
| Start Page Number | Starting page number for text extraction (default 1). |
| End Page Number | Ending page number for text extraction (default 3). |
| Extraction Options | Collection of boolean options to customize extraction: - Remove Comments: Whether to remove comments from extracted text (default true) - Remove Header/Footer: Whether to remove headers and footers (default true) - Accept Changes: Whether to accept tracked changes (default true) |
| Advanced Options | Additional advanced settings as JSON string profiles to adjust custom properties for API calls, e.g., output format customization. |
Output
The node outputs JSON data containing the extracted text from the specified pages of the Word document. The structure typically includes the plain text content extracted after applying the selected extraction options. If the document contains multiple pages, the output may include concatenated or segmented text according to the page range specified.
No binary output is produced by this operation; the focus is on textual content extraction.
Dependencies
- Requires access to an external document processing API service capable of handling Word document parsing and text extraction.
- An API authentication token or key must be configured in n8n credentials to authorize requests to the external service.
- Network access is required if using URL input type to fetch the Word document.
Troubleshooting
Common Issues:
- Invalid or inaccessible URL when using URL input type results in failure to download the document.
- Incorrect binary property name leads to missing input data errors.
- Providing malformed base64 content causes decoding errors.
- Specifying invalid page ranges (e.g., start page greater than end page) may cause unexpected results or errors.
Error Messages & Resolutions:
- "Input binary property not found": Verify the binary field name matches the actual binary data property from the previous node.
- "Failed to fetch document from URL": Check URL accessibility and correctness.
- "Invalid base64 content": Ensure the base64 string is properly encoded without extra characters.
- "Page range out of bounds": Adjust start and end page numbers within the document's actual page count.
Links and References
- PDF4me API Documentation — Reference for advanced profile options and API capabilities related to document processing.