Overview
This node provides various operations to work with GPT-style tokenization, specifically using Byte Pair Encoding (BPE) tokens compatible with OpenAI GPT models. It allows encoding strings into tokens, decoding tokens back into strings, counting tokens in a string, checking if a string fits within a specified token limit, and slicing a string into chunks that each fit within a token limit.
Common scenarios where this node is useful include:
- Preparing text inputs for GPT models by encoding them into tokens.
- Validating whether input text exceeds model token limits before sending requests.
- Splitting long texts into smaller parts that comply with token limits.
- Decoding token arrays back into readable text.
Practical examples:
- Before calling an OpenAI GPT API, use "Check Token Limit" to ensure the prompt does not exceed the model's max tokens.
- Use "Slice to Max Token Limit" to split a large document into manageable chunks for sequential processing.
- Encode user input into tokens for custom token-based processing or analysis.
- Decode tokens received from a GPT model back into human-readable text.
Properties
| Name | Meaning |
|---|---|
| Operation | The action to perform: Encode, Decode, Count Tokens, Check Token Limit, or Slice to Max Token Limit. |
| Input String | The string text to process (required for encode, countTokens, isWithinTokenLimit, sliceMatchingTokenLimit). |
| Input Tokens | An array of BPE tokens to decode (required for decode operation). |
| Max Tokens | The maximum number of tokens allowed (required for isWithinTokenLimit and sliceMatchingTokenLimit). |
| Error When Exceeding Token Limit | Whether to throw an error if the input string exceeds the max token limit (only for isWithinTokenLimit). |
| Destination Key | The key name under which to store the result in the output JSON. If empty, default keys are used. |
Output
The node outputs JSON data with different structures depending on the operation:
- Encode: Outputs an array of tokens under the key
"tokens"(or custom destination key). - Decode: Outputs the decoded string under the key
"data"(or custom destination key). - Count Tokens: Outputs an object under
"stats"containing:resume: Number of tokens counted.tokens: Array of token IDs.
- Check Token Limit: Outputs a boolean (
trueorfalse) indicating if the input string is within the token limit under the key"isWithinTokenLimit"(or custom destination key). - Slice to Max Token Limit: Outputs an array of string slices under the key
"slices"(or custom destination key), where each slice fits within the max token limit.
The node does not output binary data.
Dependencies
- Uses the
gpt-tokenizerpackage for encoding, decoding, and token limit checks. - Uses the
js-tiktoken/litepackage to count tokens accurately. - Fetches a token encoding base JSON from a remote URL (
https://tiktoken.pages.dev/js/o200k_base.json) at runtime to initialize the tokenizer. - Requires internet access to fetch the encoding base JSON during execution.
- No internal credential or API key is required for the node itself.
Troubleshooting
Common Issues
- Input String is empty or not a string: The node requires a valid non-empty string for most operations except decode. Ensure the input is correctly provided.
- Input Tokens is not an array or empty: For decode operation, the input tokens must be a non-empty array of numbers.
- Max Tokens not provided or less than or equal to zero: For token limit related operations, max tokens must be a positive number.
- Exceeding token limit without error flag: If the input exceeds the token limit and the error flag is false, the node returns
falsebut does not throw an error. - Network issues fetching encoding base JSON: Since the node fetches a remote JSON file to initialize the tokenizer, network problems can cause failures.
Error Messages and Resolutions
"Input String is not a string": Provide a valid string input."Input String field is empty": Ensure the input string is not empty."Input Tokens is not an array": Provide a valid array of tokens for decoding."Input Tokens field is empty": Provide a non-empty array of tokens."Provide Max Tokens. (bigger then 0)": Set a positive number for max tokens."String exceeds token limit": Enable the error flag to throw an error or handle the false return value gracefully.