When using GPT-4 API, do I need to send the entire conversation back each time?
Asked Answered
S

2

6

I'm new to OpenAI API. I work with GPT-3.5-Turbo, using this code:

messages = [
        {"role": "system", "content": "You’re a helpful assistant"}
    ]

    while True:
        content = input("User: ")
        if content == 'end':
            save_log(messages)
            break
        messages.append({"role": "user", "content": content})

        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-16k",
            messages=messages
        )

        chat_response = completion.choices[0].message.content
        print(f'ChatGPT: {chat_response}')
        messages.append({"role": "assistant", "content": chat_response})

Result: User: who was the first person on the moon? GPT: The first person to step foot on the moon was Neil Armstrong, an American astronaut, on July 20, 1969, as part of NASA's Apollo 11 mission. User: how tall is he? GPT: Neil Armstrong was approximately 5 feet 11 inches (180 cm) tall.

But it requires tons of tokens. And I've heard that GPT-4 differs from GPT-3 in that it's able to remember the previous messages (on its own). Is that correct?

But if I remove the line where I append the 'messages' list with the latest one and send only one message: completion = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": content}] ) it can't remember anything.

User: who was the first person on the moon? GPT: The first person on the moon was Neil Armstrong on July 20, 1969. User: how tall is he? GPT: Without specific context or information about who "he" refers to, I'm unable to provide an accurate answer.

So I'm wondering is there any workflow difference between GPT-3.5-Turbo and GPT-4?

Sweepstakes answered 30/8, 2023 at 10:35 Comment(0)
G
5

As mentioned by Mithsew, specifically for the Chat Completions API you would still need to pass context up every time; however, the Open AI API now has a new feature called Assistants that includes:

A key change introduced by this API is persistent and infinitely long threads, which allow developers to hand off thread state management to OpenAI and work around context window constraints. With the Assistants API, you simply add each new message to an existing thread.

The process has more steps than the completions API does, but considering you don't have to repeatedly pass the same data up over and over again, it might be worth switching over.

Here is the general flow:

  1. Create an assistant (can choose models like gpt 3.5 or 4)
  2. Create a thread
  3. Add a user's message to a thread
  4. Run the assistant
  5. Check the status of the run
  6. Get the updated conversation thread

Repeat steps 3-6 as the conversation progresses

Check out the OpenAi docs page about Assistants for more general info, or check out their Assistants API page. I anticipate that they will be evolving this moving forward.

Also, depending on how much context you have, check out the Assistants Files Options as well, where you can upload a lot more context into files that the Assistant can access and talk about.

Ginseng answered 29/11, 2023 at 4:0 Comment(3)
Do you know if this Is this just a convenience / abstraction layer for what we'd otherwise be doing with the low level APIs, or is the model fundamentally different insofar as maintaining state/context/memory etc?Gnome
I am pretty sure they just added a layer to store/feed the additional data within their infrastructure, without messing at all with the models themselves.Ginseng
Thanks, yeah I think that's right. I'm actually getting much better results (and using far far fewer tokens) just managing the context myself and sending only what's needed than I was using the Assistants API.Gnome
G
4

Yes. It is always necessary to re-send all context. GPT regardless of 3-5 or 4 is still not able to store information via API. There is a thread about that on openai community

My theory is that openAI are waiting for the enterprise chatgpt(which is almost done) to release an model that we can train and memorize information, otherwise this should create millions of independent AI's overnight.

There are some libraries like langchain that simulate this storage and work with files, but in practice they also need to send all messages every time. The difference is that with it you can work with more tokens because before interacting with openai it looks first for the necessary context only.

(Note: If you choose to work with langchain keep in mind that it is quite expensive in terms of cpu and is not as effective)

Similar Threads: 1, 2, 3

Garrow answered 30/8, 2023 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.