GPT-Tokenizer icon

GPT-Tokenizer

Encode / decodes BPE Tokens or check Token Limits before working with the OpenAI GPT models.

Overview

The GPT-Tokenizer node encodes strings into BPE (Byte Pair Encoding) tokens, which are used by OpenAI GPT models. Specifically, with the Encode operation, it converts a given input string into an array of token IDs. This is useful for preparing text data for use with GPT-based APIs or for analyzing how text will be split into tokens before sending to a model.

Common scenarios:

  • Preprocessing text before sending it to OpenAI's GPT models.
  • Understanding and visualizing how a string will be tokenized.
  • Ensuring that text fits within model token limits by checking tokenization results.

Practical example:
You have a long user message and want to see how many tokens it will consume in GPT-3.5-turbo, or you need to encode it into tokens for advanced prompt engineering.


Properties

Name Meaning
Input String String to process. The text that will be encoded into BPE tokens.
Destination Key The key to write the results to. Leave empty to use the default destination key ("tokens").

Output

  • The output is a JSON object containing a property whose key is defined by Destination Key (or defaults to "tokens" if left empty).
  • The value is an array of integers, each representing a BPE token ID corresponding to the input string.

Example output:

{
  "tokens": [5661, 318, 1337]
}

Dependencies

  • External library: gpt-tokenizer (used for encoding)
  • No API keys or external service configuration required for the Encode operation.

Troubleshooting

Common issues:

  • "Input String is not a string": The provided input is not a valid string. Ensure you pass a proper text value.
  • "Input String field is empty": The input string is missing. Make sure to provide a non-empty string.
  • If you specify a custom Destination Key, ensure it does not conflict with existing keys in your data.

How to resolve:

  • Always provide a valid, non-empty string for the Input String property.
  • Leave Destination Key empty unless you need a custom output key.

Links and References

Discussion