GPT-Tokenizer

Encode / decodes BPE Tokens or check Token Limits before working with the OpenAI GPT models.

Actions5

Overview

The node provides various operations to work with GPT-style tokenization, specifically using Byte Pair Encoding (BPE) tokens. It can encode strings into tokens, decode tokens back into strings, count tokens in a string, check if a string fits within a specified token limit, and slice a string into chunks that each fit within a maximum token limit.

This node is useful when preparing text for OpenAI GPT models or similar language models that have token limits per request. For example, it helps ensure that input text does not exceed the model's maximum token capacity by slicing long texts into manageable pieces. It also aids in token counting and encoding/decoding tasks needed for advanced prompt engineering or token management workflows.

Properties

Name	Meaning
Input String	The string of text to process (encode, count tokens, check limit, or slice).
Max Tokens	The maximum number of tokens allowed (used in checking token limits and slicing operations).
Destination Key	The key name where the result will be stored in the output JSON. If empty, defaults are used.

Output

The output JSON structure depends on the selected operation:

Slice to Max Token Limit: Outputs an array of string slices under the specified destination key (default key "slices"). Each slice contains a substring of the original input, sliced so that its token count does not exceed the max token limit.
Other related operations produce outputs such as:
- Encoded tokens array ("tokens" by default).
- Decoded string ("data" by default).
- Token count statistics ("tokenCount" or "stats").
- Boolean indicating if input is within token limit ("isWithinTokenLimit").

No binary data output is produced by this node.

Dependencies

Uses the gpt-tokenizer package for encoding, decoding, and token limit checks.
Uses js-tiktoken/lite for token counting.
Fetches a remote tokenizer configuration JSON from https://tiktoken.pages.dev/js/o200k_base.json during execution.
Requires internet access to fetch the tokenizer config at runtime.
No internal credential types are required, but network connectivity is necessary.

Troubleshooting

Input String is not a string: Ensure the "Input String" property is provided and is a valid string.
Input String field is empty: Provide a non-empty string for processing.
Input Tokens is not an array: When decoding, provide a valid array of BPE tokens.
Provide Max Tokens (bigger than 0): The "Max Tokens" value must be a positive integer for relevant operations.
String exceeds token limit: If configured to error on exceeding token limits, the node throws an error when the input string is too long. Either increase the max tokens or disable the error flag.
Network errors fetching tokenizer config: Since the node fetches tokenizer data remotely, network issues may cause failures. Ensure stable internet connection.