How do I know how much tokens a GPT-3 request used?
Asked Answered
A

6

10

I am building an app around GPT-3, and I would like to know how much tokens every request I make uses. Is this possible and how ?

Amylolysis answered 18/5, 2022 at 19:11 Comment(1)
The past tense of the question makes it sound like you're asking for the tokens after a request is made. I'm guessing that's not what's being asked, but if anyone comes across this Q&A looking for the tokens after running a request, it's in the JSON response, in the usage object: beta.openai.com/docs/api-reference/completionsPinkie
I
10

Counting Tokens with Actual Tokenizer

To do this in python, first install the transformers package to enable the GPT-2 Tokenizer, which is the same tokenizer used for [GPT-3]:

pip install transformers

Then, to tokenize the string "Hello world", you have a choice of using GPT2TokenizerFast or GPT2Tokenizer.

from transformers import GPT2TokenizerFast\
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])

or

from transformers import GPT2Tokenizer\
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])

In either case, tokenizer() produces a python list of token representing the string, which can the be counted with len(). The documentation doesn't mention any differences in behavior between the two methods. I tested both methods on both text and code and they gave the same numbers. The from_pretrained methods are unpleasantly slow: 28s for GPT2Tokenizer, and 56s for GPT2TokenizerFast. The load time dominates the experience, so I suggest NOT using the "fast" method. (Note: the first time you run either of the from_pretrained methods, a 3MB model will be downloaded and installed, which takes a couple minutes.)

Approximating Token Counts

The tokenizers are slow and heavy, but approximations can be made to go back and forth between them, using nothing but the number of characters or tokens. I developed the following approximations by observing the behavior of the GPT-2 tokenizer. They hold well for English text and python code. The 3rd and 4th functions are perhaps the most useful since they let us quickly fit a text in the GPT-3's token limit.

import math
def nchars_to_ntokens_approx(nchars):
    #returns an estimate of #tokens corresponding to #characters nchars
    return max(0,int((nchars - 2)*math.exp(-1))) 

def ntokens_to_nchars_approx(ntokens):
    #returns an estimate of #characters corresponding to #tokens ntokens
    return max(0,int(ntokens*math.exp(1) ) + 2 )

def nchars_leq_ntokens_approx(maxTokens):
    #returns a number of characters very likely to correspond <= maxTokens
    sqrt_margin = 0.5
    lin_margin = 1.010175047 #= e - 1.001 - sqrt(1 - sqrt_margin) #ensures return 1 when maxTokens=1
    return max( 0, int(maxTokens*math.exp(1) - lin_margin - math.sqrt(max(0,maxTokens - sqrt_margin) ) )) 

def truncate_text_to_maxTokens_approx(text, maxTokens):
    #returns a truncation of text to make it (likely) fit within a token limit
    #So the output string is very likely to have <= maxTokens, no guarantees though.
    char_index = min( len(text), nchars_leq_ntokens_approx(maxTokens) )
    return text[:char_index]
Inculcate answered 21/12, 2022 at 11:4 Comment(10)
It's pretty fast to me, almost instantaneous. I don't know why you got 56s.Chalk
Its making some network calls, so it depends on your network speed. When I sit further from my wifi antenna it takes even longer.Inculcate
Still, did you really mean 56s? or do you mean 56ms?Chalk
Yes, 56 seconds; as in almost a minute. It’s interesting that it runs fast for you. I wonder what’s going on.Inculcate
I did time it and got 3.74 ms per call on a text with 2000 tokens using GPT2TokenizerFast. Specifically my text is "hello world" * 1000. This doesn't require internet access because the model is already downloaded. Maybe you don't have a GPU so it's very slow. But I don't see GPU usage going up on my laptop when running the code either. Not sure what's going on. It doesn't make sense that a tokenizer will be that slow.Chalk
I'm running on a machine with an Nvidia RTX A2000 GPU. The super slow part for me is the line tokenizer = GPT2TokenizerFast.from_pretrained("gpt2"), so has nothing to do with the prompt.Inculcate
It still only takes 2.5 second to load the tokenizer for me. The tokenizer is already downloaded.Chalk
My best guess for the cause of this timing difference is that I'm running into an incompatibility between tensorflow (only able to handle Cuda <= v11.2) and my Cuda12 installation, and this is preventing the use of AVX2 FMA instructions. Loading the tokenizer throws warnings about this.Inculcate
Amazing code! What is init_offset ?Autry
Ah sorry, init_offset = 2. I'll edit that into the answer.Inculcate
A
4

Here is an example from openai-cookbook that worked perfectly for me:

import tiktoken


def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string("tiktoken is great!", "gpt2")
>6
Advance answered 30/1, 2023 at 18:9 Comment(0)
P
3

OPEN-AI charges GPT-3 usage through tokens, this counts both the prompt and the answer. For OPEN-AI 750 words would have an equivalent of around 1000 tokens or a token to word ratio of 1.4 . Pricing of the token depends of the plan you are on.

I do not know of more accurate ways of estimating cost. Perhaps using GPT-2 tokenizer from Hugging face can help. I know the tokens from the GPT-2 tokenizer are accepted when passed to GPT-3 in the logit bias array, so there is a degree of equivalence between GPT-2 tokens and GPT-3 tokens.

However GPT-2 and GPT-3 models are different and GPT-3 famously has more parameters than GPT-3 so GPT-2 estimations are probably lower token wise. I am sure you can write a simple program that estimates the price by comparing prompts and token usage, but that might take some time.

Practically answered 19/7, 2022 at 9:0 Comment(0)
V
1

Code to count how much tokens a GPT-3 request used:

def count_tokens(input: str):
    tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
    res = tokenizer(input)['input_ids']
    return len(res)


print(count_tokens("Hello world"))
Vaporetto answered 23/7, 2022 at 22:46 Comment(1)
Keep the tokenizer initialization outside the function (e.g. in __init__) to make this run much faster.Icsh
P
1

For C# users you can refer this git repo https://github.com/betalgo/openai You can take the tokenizing elements (Tokenizer/GPT3) from the repo and create a helper in your codebase. (Note* Ive used this tokenizer and is pretty much accurate but it isn't recommended for production use)

Pearlpearla answered 8/7, 2023 at 13:51 Comment(0)
M
0

Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.

token_helper.py:

import tiktoken

def encoding_getter(encoding_type: str):
    """
    Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
    """
    if "k_base" in encoding_type:
        return tiktoken.get_encoding(encoding_type)
    else:
        return tiktoken.encoding_for_model(encoding_type)

def tokenizer(string: str, encoding_type: str) -> list:
    """
    Returns the tokens in a text string using the specified encoding.
    """
    encoding = encoding_getter(encoding_type)
    tokens = encoding.encode(string)
    return tokens

def token_counter(string: str, encoding_type: str) -> int:
    """
    Returns the number of tokens in a text string using the specified encoding.
    """
    num_tokens = len(tokenizer(string, encoding_type))
    return num_tokens

Works like this

>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
7
Magyar answered 28/6, 2023 at 6:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.