GPT-Tokenizer
Encode / decodes BPE Tokens or check Token Limits before working with the OpenAI GPT models.
Overview
The GPT-Tokenizer node encodes strings into BPE (Byte Pair Encoding) tokens, which are used by OpenAI GPT models. Specifically, with the Encode operation, it converts a given input string into an array of token IDs. This is useful for preparing text data for use with GPT-based APIs or for analyzing how text will be split into tokens before sending to a model.
Common scenarios:
- Preprocessing text before sending it to OpenAI's GPT models.
- Understanding and visualizing how a string will be tokenized.
- Ensuring that text fits within model token limits by checking tokenization results.
Practical example:
You have a long user message and want to see how many tokens it will consume in GPT-3.5-turbo, or you need to encode it into tokens for advanced prompt engineering.
Properties
| Name | Meaning |
|---|---|
| Input String | String to process. The text that will be encoded into BPE tokens. |
| Destination Key | The key to write the results to. Leave empty to use the default destination key ("tokens"). |
Output
- The output is a JSON object containing a property whose key is defined by Destination Key (or defaults to
"tokens"if left empty). - The value is an array of integers, each representing a BPE token ID corresponding to the input string.
Example output:
{
"tokens": [5661, 318, 1337]
}
Dependencies
- External library:
gpt-tokenizer(used for encoding) - No API keys or external service configuration required for the Encode operation.
Troubleshooting
Common issues:
- "Input String is not a string": The provided input is not a valid string. Ensure you pass a proper text value.
- "Input String field is empty": The input string is missing. Make sure to provide a non-empty string.
- If you specify a custom Destination Key, ensure it does not conflict with existing keys in your data.
How to resolve:
- Always provide a valid, non-empty string for the Input String property.
- Leave Destination Key empty unless you need a custom output key.