Azure OpenAI gpt-35-turbo nondeterministic with temperature 0
Asked Answered
R

2

5

I have noticed that my deployment of gpt-35-turbo on "Azure AI Studio" is not giving consistent responses to my chat completion prompts even when I set the temperature to 0. The longer the prompt, the more inconsistency I see.

I thought the idea with setting temperature to 0 meant consistent (deterministic) responses (given the same model). Is that not the case?

Recourse answered 7/8, 2023 at 23:7 Comment(0)
A
1

The temperature parameter controls the randomness of the text generated and while a value of 0 ensures the least random or more deterministic responses, they will not necessarily be exactly the same.

Ideally, in cases where you would like deterministic responses (say for the same input across devices/users/etc.) then a response cache would help.

Also, while not recommended in the docs, you could also use the top_p parameter to further control the output response. This controls the set of possible tokens to consider for the next token.

This discussion on the OpenAI forums goes into how you can use both to your benefit and better control the response from the models.

Almire answered 9/8, 2023 at 19:1 Comment(2)
I saw the docs mentioned specifically not modifying top_p if I'm already modifying temperature. You're suggesting I set temp 0 and top_p to .1 to get the most consistent results?Recourse
@Recourse are you getting consistent response?Complimentary
C
5

I thought the idea with setting temperature to 0 meant consistent (deterministic) responses (given the same model). Is that not the case?

It's indeed not the case. 2 reasons:

  1. GPU non-determinism
  2. This blogpost authored by Sherman Chann argues that "Non-determinism in GPT-4 is caused by Sparse MoE [mixture of experts]".

Note that since 2023-11-06, it is possible to set a seed parameter. From platform.openai.com/docs (mirror):

Reproducible outputs (Beta)

Chat Completions are non-deterministic by default (which means model outputs may differ from request to request). That being said, we offer some control towards deterministic outputs by giving you access to the seed parameter and the system_fingerprint response field.

To receive (mostly) deterministic outputs across API calls, you can:

  • Set the seed parameter to any integer of your choice and use the same value across requests you'd like deterministic outputs for.

  • Ensure all other parameters (like prompt or temperature) are the exact same across requests.

Sometimes, determinism may be impacted due to necessary changes OpenAI makes to model configurations on our end. To help you keep track of these changes, we expose the system_fingerprint field. If this value is different, you may see different outputs due to changes we've made on our systems.

Collard answered 20/8, 2023 at 14:23 Comment(2)
Looks like, as of today (2023-11-10), the seed is not available on Azure OpenAIMacilroy
@GamlielCohen Azure tends to lag behind for OpenAI model deployments and updates.Collard
A
1

The temperature parameter controls the randomness of the text generated and while a value of 0 ensures the least random or more deterministic responses, they will not necessarily be exactly the same.

Ideally, in cases where you would like deterministic responses (say for the same input across devices/users/etc.) then a response cache would help.

Also, while not recommended in the docs, you could also use the top_p parameter to further control the output response. This controls the set of possible tokens to consider for the next token.

This discussion on the OpenAI forums goes into how you can use both to your benefit and better control the response from the models.

Almire answered 9/8, 2023 at 19:1 Comment(2)
I saw the docs mentioned specifically not modifying top_p if I'm already modifying temperature. You're suggesting I set temp 0 and top_p to .1 to get the most consistent results?Recourse
@Recourse are you getting consistent response?Complimentary

© 2022 - 2024 — McMap. All rights reserved.