OpenAI API: How do I count tokens before(!) I send an API request?

Asked 21/3, 2023 at 17:35 Answered 6/3 at 19:54

Solved python openai-api chatgpt-api gpt-3 gpt-4

OpenAI's text models have a context length, e.g.: Curie has a context length of 2049 tokens.

They provide max_tokens and stop parameters to control the length of the generated sequence. Therefore the generation stops either when stop token is obtained, or max_tokens is reached.

The issue is: when generating a text, I don't know how many tokens my prompt contains. Since I do not know that, I cannot set max_tokens = 2049 - number_tokens_in_prompt.

This prevents me from generating text dynamically for a wide range of text in terms of their length. What I need is to continue generating until the stop token.

My questions are:

How can I count the number of tokens in Python API so that I will set max_tokens parameter accordingly?
Is there a way to set max_tokens to the max cap so that I won't need to count the number of prompt tokens?

Equilibrium answered 21/3, 2023 at 17:35 Comment(0)

106

How do I count tokens before(!) I send an API request?

As stated in the official OpenAI article:

To further explore tokenization, you can use our interactive Tokenizer tool, which allows you to calculate the number of tokens and see how text is broken into tokens. Alternatively, if you'd like to tokenize text programmatically, use tiktoken as a fast BPE tokenizer specifically used for OpenAI models.

How does a tokenizer work?

A tokenizer can split the text string into a list of tokens, as stated in the official OpenAI example on counting tokens with tiktoken:

tiktoken is a fast open-source tokenizer by OpenAI.

Given a text string (e.g., "tiktoken is great!") and an encoding (e.g., "cl100k_base"), a tokenizer can split the text string into a list of tokens (e.g., ["t", "ik", "token", " is", " great", "!"]).

Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you:

whether the string is too long for a text model to process and

how much an OpenAI API call costs (as usage is priced by token).

Which encodings does OpenAI use for its models?

As of April 2024, tiktoken supports 2 encodings used by OpenAI models (source 1, source 2):

Encoding name	OpenAI models
`o200k_base`	• GPT-4o models (`gpt-4o`)
`cl100k_base`	• GPT-4 models (`gpt-4`) • GPT-3.5 Turbo models (`gpt-3.5-turbo`) • GPT Base models (`davinci-002`, `babbage-002`) • Embeddings models (`text-embedding-ada-002`, `text-embedding-3-large`, `text-embedding-3-small`) • Fine-tuned models (`ft:gpt-4`, `ft:gpt-3.5-turbo`, `ft:davinci-002`, `ft:babbage-002`)

Note: The p50k_base and r50k_base encodings were used for models that are deprecated as of April 2024.

What tokenizer libraries are out there?

Official OpenAI libraries:

Python: tiktoken

3rd-party libraries:

Python: GPT2TokenizerFast
Node.js: tiktoken, gpt4-tokenizer, gpt3-tokenizer, gpt-3-encoder
.NET / C#: tryAGI.Tiktoken, SharpToken, TiktokenSharp, GPT Tokenizer
Java: jtokkit, gpt2-tokenizer-java
PHP: GPT-3-Encoder-PHP

How do I use tiktoken?

Install or upgrade tiktoken: pip install --upgrade tiktoken
Write the code to count tokens, where you have two options.

OPTION 1: Search in the table above for the correct encoding for a given OpenAI model

If you run get_tokens_1.py, you'll get the following output:

9

get_tokens_1.py

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "cl100k_base"))

OPTION 2: Use tiktoken.encoding_for_model() to automatically load the correct encoding for a given OpenAI model

If you run get_tokens_2.py, you'll get the following output:

9

get_tokens_2.py

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "gpt-3.5-turbo"))

Note: If you take a careful look at the usage field in the OpenAI API response, you'll see that it reports 10 tokens used for an identical message. That's 1 token more than tiktoken. I still haven't figured out why. I tested this in the past. As @Jota mentioned in the comment below, there still seems to be a mismatch between the token usage reported by the OpenAI API response and tiktoken.

Delossantos answered 21/3, 2023 at 17:39 Comment(10)

Is there Tiktoken for NodeJS? – Fruitage 10/4, 2023 at 15:5

@AnshumanKumar Yes: npmjs.com/package/@dqbd/tiktoken – Delossantos 10/4, 2023 at 18:12

The tokens calculated with the function indicated by Chat GPT do not match those returned by Chat GPT in the response... it is impossible to calculate the max_tokens var. – Honeydew 11/7, 2023 at 11:6

@Honeydew See this answer. – Delossantos 16/7, 2023 at 19:26

Why is this ANSWER better than tiktoken's DOCS??? – Justajustemilieu 9/8, 2023 at 19:34

@RamiAwar It's great to hear that. :) Probably because I combined tiktoken docs, OpenAI docs, and code I personally tested. – Delossantos 10/8, 2023 at 7:2

For NodeJS this gpt4-tokenizer npm package seems promising: – Barolet 13/8, 2023 at 6:51

anyone has ideas about how to handle functions? – Formerly 21/12, 2023 at 4:40

I added omni and a few C# libraries. Hope that was ok :-) – Formerly 19/5 at 4:41

@Xan-KunClark-Davis Thanks for the contribution. I appreciate that. :) Actually, I took some time to update the answer because it was pretty outdated. A lot of models have been deprecated since my last update, and consequently, encodings too. Of course, I left your contribution in my edit. – Delossantos 20/5 at 9:51

Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.

token_helper.py:

import tiktoken

def encoding_getter(encoding_type: str):
    """
    Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
    """
    if "k_base" in encoding_type:
        return tiktoken.get_encoding(encoding_type)
    else:
        return tiktoken.encoding_for_model(encoding_type)

def tokenizer(string: str, encoding_type: str) -> list:
    """
    Returns the tokens in a text string using the specified encoding.
    """
    encoding = encoding_getter(encoding_type)
    tokens = encoding.encode(string)
    return tokens

def token_counter(string: str, encoding_type: str) -> int:
    """
    Returns the number of tokens in a text string using the specified encoding.
    """
    num_tokens = len(tokenizer(string, encoding_type))
    return num_tokens

Works like this

>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
7

Superimpose answered 28/6, 2023 at 6:33 Comment(0)

COUNTING INPUT TOKENS

If you want to count the tokens used by a Chat Completion API request, which has other metadata like role and name in addition to the raw prompt (content), then see the excerpts below from OpenAI's cookbook.

Source Code

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """Return the number of tokens used by a list of messages."""
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")
    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    return num_tokens

Usage

# let's verify the function above matches the OpenAI API response

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

example_messages = [
    {
        "role": "system",
        "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "New synergies will help drive top-line growth.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Things working well together will increase revenue.",
    },
    {
        "role": "system",
        "name": "example_user",
        "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
    },
    {
        "role": "system",
        "name": "example_assistant",
        "content": "Let's talk later when we're less busy about how to do better.",
    },
    {
        "role": "user",
        "content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
    },
]

for model in [
    "gpt-3.5-turbo-0301",
    "gpt-3.5-turbo-0613",
    "gpt-3.5-turbo",
    "gpt-4-0314",
    "gpt-4-0613",
    "gpt-4",
    ]:
    print(model)
    # example token count from the function defined above
    print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
    # example token count from the OpenAI API
    response = client.chat.completions.create(model=model,
    messages=example_messages,
    temperature=0,
    max_tokens=1)
    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
    print()

Output

127 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-0314
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.

Important note from the Cookbook:

Note that the exact way that tokens are counted from messages may change from model to model. Consider the counts from the function below an estimate, not a timeless guarantee.

SETTING MAX TOKENS

In the Chat Completion API request, max_tokens represents the maximum tokens for the generated output. To simplify the process of setting max_tokens, you can make a function:

def max_tokens(messages, model):
    input_tokens = num_tokens_from_messages(messages, model=model)
    context_length = get_context_length(model)
    return context_length - input_tokens

def get_context_length(model):
    if model == "gpt-3.5-turbo-0613":
        return 4096
    # Add additional model context windows here.
    else:
        raise ValueError(f"No context length known for model: {model}")

REFERENCES

OpenAI Cookbook: https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken#6-counting-tokens-for-chat-completions-api-calls
OpenAI API Reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens
Community Forum: https://community.openai.com/t/how-the-max-tokens-are-considered/313514
OpenAI Model Documentation for Context Windows: https://platform.openai.com/docs/models/gpt-3-5-turbo

Casein answered 6/3 at 19:54 Comment(0)

With the information contained in the comments, I made this: https://gist.github.com/buanzo/7cdd2c34fc0bb25c71b857a16853c6fa

It is a count_tokens implementation that tries tiktoken, nltk and fallbacks to .split()

It includes a simple TokenBuffer implementation as well.

We can import the count_tokens function from the token_counter module and call it with our text string as follows:

from token_counter import count_tokens
text = "The quick brown fox jumps over the lazy dog."
result = count_tokens(text, debug=True)
print(result)

If all the required libraries are available the result is better but even without tiktoken nor nltk, the function should return a dictionary with the number of tokens and the method used to count them. For example:

{'n_tokens': 9, 'method': 'tiktoken'}

Gisela answered 13/4, 2023 at 12:40 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++