I am building an app around GPT-3, and I would like to know how much tokens every request I make uses. Is this possible and how ?
Counting Tokens with Actual Tokenizer
To do this in python, first install the transformers package to enable the GPT-2 Tokenizer, which is the same tokenizer used for [GPT-3]:
pip install transformers
Then, to tokenize the string "Hello world", you have a choice of using GPT2TokenizerFast or GPT2Tokenizer.
from transformers import GPT2TokenizerFast\
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
from transformers import GPT2Tokenizer\
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
In either case, tokenizer() produces a python list of token representing the string, which can the be counted with len(). The documentation doesn't mention any differences in behavior between the two methods. I tested both methods on both text and code and they gave the same numbers. The from_pretrained methods are unpleasantly slow: 28s for GPT2Tokenizer, and 56s for GPT2TokenizerFast. The load time dominates the experience, so I suggest NOT using the "fast" method. (Note: the first time you run either of the from_pretrained methods, a 3MB model will be downloaded and installed, which takes a couple minutes.)
Approximating Token Counts
The tokenizers are slow and heavy, but approximations can be made to go back and forth between them, using nothing but the number of characters or tokens. I developed the following approximations by observing the behavior of the GPT-2 tokenizer. They hold well for English text and python code. The 3rd and 4th functions are perhaps the most useful since they let us quickly fit a text in the GPT-3's token limit.
import math
def nchars_to_ntokens_approx(nchars):
#returns an estimate of #tokens corresponding to #characters nchars
return max(0,int((nchars - 2)*math.exp(-1)))
def ntokens_to_nchars_approx(ntokens):
#returns an estimate of #characters corresponding to #tokens ntokens
return max(0,int(ntokens*math.exp(1) ) + 2 )
def nchars_leq_ntokens_approx(maxTokens):
#returns a number of characters very likely to correspond <= maxTokens
sqrt_margin = 0.5
lin_margin = 1.010175047 #= e - 1.001 - sqrt(1 - sqrt_margin) #ensures return 1 when maxTokens=1
return max( 0, int(maxTokens*math.exp(1) - lin_margin - math.sqrt(max(0,maxTokens - sqrt_margin) ) ))
def truncate_text_to_maxTokens_approx(text, maxTokens):
#returns a truncation of text to make it (likely) fit within a token limit
#So the output string is very likely to have <= maxTokens, no guarantees though.
char_index = min( len(text), nchars_leq_ntokens_approx(maxTokens) )
return text[:char_index]
"hello world" * 1000
. This doesn't require internet access because the model is already downloaded. Maybe you don't have a GPU so it's very slow. But I don't see GPU usage going up on my laptop when running the code either. Not sure what's going on. It doesn't make sense that a tokenizer will be that slow. –
Chalk init_offset
? –
Autry Here is an example from openai-cookbook that worked perfectly for me:
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
num_tokens_from_string("tiktoken is great!", "gpt2")
OPEN-AI charges GPT-3 usage through tokens, this counts both the prompt and the answer. For OPEN-AI 750 words would have an equivalent of around 1000 tokens or a token to word ratio of 1.4 . Pricing of the token depends of the plan you are on.
I do not know of more accurate ways of estimating cost. Perhaps using GPT-2 tokenizer from Hugging face can help. I know the tokens from the GPT-2 tokenizer are accepted when passed to GPT-3 in the logit bias array, so there is a degree of equivalence between GPT-2 tokens and GPT-3 tokens.
However GPT-2 and GPT-3 models are different and GPT-3 famously has more parameters than GPT-3 so GPT-2 estimations are probably lower token wise. I am sure you can write a simple program that estimates the price by comparing prompts and token usage, but that might take some time.
Code to count how much tokens a GPT-3 request used:
def count_tokens(input: str):
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
res = tokenizer(input)['input_ids']
return len(res)
print(count_tokens("Hello world"))
) to make this run much faster. –
Icsh For C# users you can refer this git repo https://github.com/betalgo/openai You can take the tokenizing elements (Tokenizer/GPT3) from the repo and create a helper in your codebase. (Note* Ive used this tokenizer and is pretty much accurate but it isn't recommended for production use)
Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.
import tiktoken
def encoding_getter(encoding_type: str):
Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
if "k_base" in encoding_type:
return tiktoken.get_encoding(encoding_type)
return tiktoken.encoding_for_model(encoding_type)
def tokenizer(string: str, encoding_type: str) -> list:
Returns the tokens in a text string using the specified encoding.
encoding = encoding_getter(encoding_type)
tokens = encoding.encode(string)
return tokens
def token_counter(string: str, encoding_type: str) -> int:
Returns the number of tokens in a text string using the specified encoding.
num_tokens = len(tokenizer(string, encoding_type))
return num_tokens
Works like this
>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
© 2022 - 2024 — McMap. All rights reserved.
object: beta.openai.com/docs/api-reference/completions – Pinkie