How to work with OpenAI maximum context length is 2049 tokens?
Asked Answered
G

4

25

I'd like to send the text from various PDF's to OpenAI's API. Specifically the Summarize for a 2nd grader or the TL;DR summarization API's.

I can extract the text from PDF's using PyMuPDF and prepare the OpenAI prompt.

Question: How best to prepare the prompt when the token count is longer than the allowed 2049?

  • Do I just truncate the text then send multiple requests?
  • Or is there a way to sample the text to "compress" it to lose key points?
Gluconeogenesis answered 22/11, 2021 at 4:19 Comment(0)
S
30

I faced the same problem. Here is the strategy I used to send text that is much, much longer than OpenAIs GPT3 token limit.

Depending on the model (Davinci, Curie, etc.) used, requests can use up to 4097 tokens shared between prompt and completion.

  • Prompt being the input you send to OpenAI, i.e. your "command", e.g. "Summarize the following text" plus the text itself
  • Completion being the response, i.e. the entire summary of your text

If your prompt is 4000 tokens, your completion can be 97 tokens at most. For more information on OpenAI tokens and how to count them, see here.

To ensure that we don’t exceed the maximum length limit for prompt plus completion, we need to ensure that prompt (i.e. your text) and completion (i.e. the summary) put together always fits into the 4097 token boundary.

For that reason we split the entire text into multiple text chunks, summarize each chunk independently and finally merge all summarized chunks using a simple " ".join() function.

Maximum Number of Words - Token-to-Word Conversion

OpenAI has a fixed limit on the number of tokens. However, a token is not the same as a word. Hence, we first need to calculate the maximum number of words we can send to OpenAI. The documentation says:

enter image description here

Given the token-to-word ratio, we can send approximately 2900 words to OpenAI's GPT3 assuming a 5 sentence summary per text chunk.

  • Max tokens per request: 4000 tokens (leaving 97 tokens as a safety buffer) = 3000 words
  • Max prompt tokens: “Summarize the following text in five sentences” has 7 words = 10 tokens
  • Max tokens of returned summary (5 sentences): 20 words per sentence. 5 * 20 = 100 words = 133 tokens
  • Max tokens of text chunk: 4000 - 10 - 133 = 3857 tokens = 2900 words

Text Chunking

We can choose from a plethora of strategies to split up the entire text into smaller chunks.

The simplest approach is creating a single list of all words by splitting the entire text on whitespaces, and then creating buckets of words with words evenly distributed across all buckets. The downside is that we are likely to split a sentence half-way through and lose the meaning of the sentence because GPT ends up summarizing the first half of the sentence independently from the second half — ignoring any relations between the two chunks.

Other options include tokenizers such as SentencePiece and spaCy’s sentence splitter. Choosing the later generates the most stable results.

Implementation of Text Chunking with spaCy

The following example splits the text “My first birthday was great. My 2. was even better.” into a list of two sentences.

python -m spacy download en_core_web_sm
import spacy
from spacy.lang.en import English

nlp = spacy.load("en_core_web_sm")

text = "My first birthday was great. My 2. was even better."
    
for sentence in nlp(text).sents:
  print(sentence.text)

Output

My first birthday was great.
My 2. was even better.

spaCy correctly detected the second sentence instead of splitting it after the “2.”.

Now, let’s write a text_to_chunks helper function to generate chunks of sentences where each chunk holds at most 2700 words. 2900 words was the initially calculated word limit, but we want to ensure to have enough buffer for words that are longer than 1.33 tokens.

def text_to_chunks(text):
  chunks = [[]]
  chunk_total_words = 0

  sentences = nlp(text)

  for sentence in sentences.sents:
    chunk_total_words += len(sentence.text.split(" "))

    if chunk_total_words > 2700:
      chunks.append([])
      chunk_total_words = len(sentence.text.split(" "))

    chunks[len(chunks)-1].append(sentence.text)
  
  return chunks

An alternative approach to determine the number of tokens of a text was recently introduced by OpenAI. The approach uses tiktoken and is tailored towards OpenAI's models.

import tiktoken

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
number_of_tokens = len(encoding.encode("tiktoken is great!"))
print(number_of_tokens)

Next, we wrap the text summarization logic into a summarize_text function.

def summarize_text(text):
  prompt = f"Summarize the following text in 5 sentences:\n{text}"

  response = openai.Completion.create(
      engine="text-davinci-003", 
      prompt=prompt,
      temperature=0.3, 
      max_tokens=150, # = 112 words
      top_p=1, 
      frequency_penalty=0,
      presence_penalty=1
  )

  return response["choices"][0]["text"]

Our final piece of code looks like this:

chunks = text_to_chunks(one_large_text)

chunk_summaries = []

for chunk in chunks:
  chunk_summary = summarize_text(" ".join(chunk))
  chunk_summaries.append(chunk_summary)

summary = " ".join(chunk_summaries)

References

Sumptuous answered 29/12, 2022 at 19:54 Comment(1)
While this strategy might work for texts, I don't see that it could possibly work very well for code refactoring, where it is important to get the full picture before optimising/improving. I know this wasn't the question here, but it's related, and if anyone has any ideas or pointers, they are very welcome.Latia
S
3

You have to make sure the context length is within the 2049 tokens. So for the prompt, you need to reduce the size.

OpenAI uses GPT-3 which has a context length of 2049, and text needs to fit within that context length.

I am not sure what you meant to sample the text and compress it. But if you meant how to summarize a longer text, then I would suggest you to chunk the text so that it fits within the 2049 tokens and query OpenAI that way.

Sindhi answered 5/8, 2022 at 5:37 Comment(7)
Thanks for the suggestion. I'm trying to summarize say, the text from a 10 page PDF. Its definitely longer than 2049 tokens. Maybe this API is not meant to do this?Gluconeogenesis
@Gluconeogenesis update the answer with more details. OpenAI uses GPT-3, which has a context length, and text needs to fit within that context length. There is no model where you can just fit the 10-page PDF. Please accept the answer if the response answers your query. If you need more details do let me know as well.Sindhi
I'm also wondering how to feed a very long text to Open AI Completion, not far I can't find the answer. I'm sure it's possible. Maybe as part of their fine-tuning program?Jonijonie
@Jonijonie did you find a solution that worked for you? Is it possible to make chunked requests such that the context is built up continually?Apus
@Apus Codex cannot understand context like this. The best you can do is to summarize context prior to the text and feed that into the subsequent invokation.Sindhi
@maxcountryman, with their fine-tuning option, you can feed a sum of fragmented chunks of text as if it was one. beta.openai.com/docs/guides/fine-tuningJonijonie
I'm currently struggling to make use of fine tuning but it doesn't work great. Doesn't answer based on the provided text when it's explicitly told to. I read on the internet it's also the case for other people.Ash
R
1

I guess I am kind of late to this, but I developed python and javascript libraries to summarize large (above token limit) text using GPT models. Of course it can handle text below token limits as well.

Assuming you are on Python, just use -

>>> from gptsummarizer import summarizer
>>> generator = summarizer.Summarizer(key="put_your_openai_key_here")
>>> summary = generator.getSummary(text="Hello! How are you?")
>>> summary
Two people are exchanging greetings and inquiring about each others wellbeing.
Roydd answered 24/5, 2023 at 21:10 Comment(0)
M
1

One can use the tiktoken library by OpenAI to count tokens (see also their Cookbook notebook). It's important to know that the max context window of a model (like 8192 tokens) is the amount of input and output tokens combined.

You can use the following function which truncates the input text based on a certain amount of max tokens:

def truncate_tokens(string: str, encoding_name: str, max_length: int = 8192) -> str:
    """Truncates a text string based on max number of tokens."""
    encoding = tiktoken.encoding_for_model(encoding_name)
    encoded_string = encoding.encode(string)
    num_tokens = len(encoded_string)

    if num_tokens > max_length:
        string = encoding.decode(encoded_string[:max_length])

    return string

This then works as follows:

text = "hello world"
text = truncate_tokens(string=text, encoding_name="gpt-3.5-turbo", max_length=8192)

I also recommend this post by Microsoft which shows how to remove messages from a conversation based on the max length: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions.

Mutter answered 18/10, 2023 at 8:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.