Text Splitter & Chunker

Splits or extracts text using length, paragraph, sentence, word, or regex.

Actions2

Overview

This node processes text from an input field by either extracting parts of the text that match a given regular expression or splitting the text into chunks based on various methods such as length, paragraph, sentence, word, or a custom regex. It is useful for scenarios where you need to isolate specific patterns from text or divide large text into manageable pieces for further processing, such as extracting email addresses, keywords, or splitting articles into paragraphs or sentences.

Use Case Examples

  1. Extract all vowels from a text field using a regex pattern.
  2. Split a long text into chunks of 100 characters each.
  3. Divide text into sentences for individual analysis.

Properties

Name Meaning
Text Field The name of the field in the input item that contains the text to process.
Regex Pattern The regular expression used to extract matching parts from the text when the operation is 'Extract'.

Output

JSON

  • match - The extracted substring that matches the regex pattern when operation is 'Extract'.
  • chunk - A chunk of text resulting from splitting the original text when operation is 'Split'.

Troubleshooting

  • If the specified text field does not exist or is empty in the input item, no output will be generated for that item.
  • Invalid regular expression patterns may cause errors or no matches; ensure the regex syntax is correct.
  • When splitting by sentence, if the text does not contain typical sentence-ending punctuation, the entire text may be returned as one chunk.

Discussion