OpenAI API: How do I specify the maximum number of words a completion should return?

H

3

9

How can I specify the number of words the Open AI completion should return?

E.g imagine I ask the AI the question

Who is Elon Musk?

What parameter can I use to make sure the AI sends back results less than or equal to 300 words?

I was thinking the max_tokens parameter was for that but it seems max_tokens is for breaking the input down not the output.

Hewie answered 15/4, 2022 at 20:23 Comment(0)

H

3

The parameter you are looking for in the Playground is called Maximum length and in the API max_tokens. The tokens are shared between prompt and completion.

The number of tokens processed in a given API request depends on the length of both your inputs and outputs. As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words for English text. One limitation to keep in mind is that your text prompt and generated completion combined must be no more than the model's maximum context length (for most models this is 2048 tokens, or about 1500 words). Check out our tokenizer tool to learn more about how text translates to tokens.

See docs

Please note that this parameter is dealing with tokens not words. If you want to find the equivalent number of words to tokens use the tokenizer.

Horsehair answered 2/5, 2022 at 14:25 Comment(1)

It just cuts. Sometimes in the middle of the word. – Thun 21/8, 2023 at 4:20

V

3

First of all, you said:

/.../ but it seems max_tokens is for breaking the input down not the output.

This is wrong. The max_tokens parameter is shared between the input (i.e., prompt) and output (i.e., response). Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.

As stated in the official OpenAI article:

Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.

Solution 1: Using the `max_tokens` parameter (works with the GPT-3, GPT-3.5, and GPT-4 APIs)

STEP 1: Choose the maximum length restriction for the prompt

Let's say that you'll allow the user to enter a prompt of a maximum length of 20 words. That should be enough for simple questions like:

How does climate change impact oceans and marine life? (9 words, 10 tokens)
Explain the theory of relativity and its significance in modern physics. (11 words, 13 tokens)
What are the potential benefits and risks of artificial intelligence in healthcare? (12 words, 13 tokens)
Describe the process of cellular respiration and its role in producing energy for cells. (14 words, 17 tokens)
Discuss the effects of deforestation on local ecosystems, climate, and global biodiversity. (12 words, 15 tokens)

Using the Tokenizer, you can see that 1 word is not equal to 1 token. For example, questions 3 and 5 consist of 12 words but different numbers of tokens! Because the OpenAI API operates with tokens, not words, you need to transform your limit of 20 words per prompt into tokens. Let's say that you'll allow the user to enter a prompt of a maximum length of 22 tokens. This will be approximately 20 words, plus or minus a few words depending on the text (as I previously said, 1 word is not equal to 1 token).

STEP 2: Use tiktoken to calculate the number of tokens in a prompt the user enters before(!) sending an API request to the OpenAI API

After you have chosen the maximum length restriction for the prompt, you need to check the prompt every time the user enters it to see if it doesn't exceed your limit of 22 tokens. You need to do this before you send an API request to the OpenAI API.

You can do this with tiktoken. As stated in the official OpenAI example:

Tiktoken is a fast open-source tokenizer by OpenAI.

Given a text string (e.g., "tiktoken is great!") and an encoding (e.g., "cl100k_base"), a tokenizer can split the text string into a list of tokens (e.g., ["t", "ik", "token", " is", " great", "!"]).

Tiktoken is very simple to use. See my past answer for more detailed information.

The logic is the following:

If tiktoken returns 22 tokens or less, send the prompt to the OpenAI API.
If tiktoken returns more than 22 tokens, send a message to the user something along the lines of: The text you entered is too long. Please make it shorter.

Note: This is not a really good UX for the user because the user doesn't know what your limit is (i.e., 22 tokens), and even if you state it, the user might not know how to calculate tokens with Tokenizer. This can be very easily solved on the frontend, where you implement a counter so that the user can see how many words it can enter without exceeding the limit. Something like StackOverflow implements in the comment section. It would be very bad UX if we could press the Add comment button only to get back the error message saying: Your comment is too long. Please make it shorter.

STEP 3: Set the max_tokens parameter

Again, the OpenAI API operates with tokens, not words. You need to transform your limit of 300 words per response into tokens. Let's say that's 700 tokens.

Simple math gives you an answer. You need to set the max_tokens parameter to 722 tokens (i.e., 22 tokens + 700 tokens = 722 tokens).

Of course, if the user enters a prompt of 15 tokens, the completion he'll get will be 707 tokens, not 700 tokens. As you can see, the response length will depend on the prompt length entered but will never be more than 700 tokens.

Solution 2: Using the system message (works with the GPT-3.5 and GPT-4 APIs)

This solution works with the Chat Completions API (i.e., GPT-3.5 or GPT-4 models), not the Completions API (i.e., GPT-3) because the messages parameter in the Chat Completions API allows you to set the system message, which helps to set the behavior of the assistant, as stated in the official OpenAI documentation:

The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either "system", "user", or "assistant") and content. Conversations can be as short as one message or many back and forth turns.

Typically, a conversation is formatted with a system message first, followed by alternating user and assistant messages.

The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as "You are a helpful assistant."

The user messages provide requests or comments for the assistant to respond to. Assistant messages store previous assistant responses, but can also be written by you to give examples of desired behavior.

Setting the system message is the most proper way to set the behavior of the model, as far as I know. You could try with the following code:

const { Configuration, OpenAIApi } = require('openai');

const configuration = new Configuration({
  apiKey: process.env.OPENAI_API_KEY,
});

const openai = new OpenAIApi(configuration);

async function getCompletionFromOpenAI() {
  const completion = await openai.createChatCompletion({
    model: 'gpt-3.5-turbo',
    messages: [
      { role: 'system', content: 'You are a helpful assistant. Your response should be less than or equal to 300 words.' },
      { role: 'user', content: 'Who is Elon Musk?'},
    ]
  });

  console.log(completion.data.choices[0].message.content);
}

getCompletionFromOpenAI();

Veedis answered 21/8, 2023 at 11:20 Comment(2)

For some reason max_tokens does not work for me once it's 200 or higher -- there is a big possibility it goes crazy big again. What worked for me is the "solution 2" but I appended it to the user message. Does it matter where I put it? Whether I append it to "user" or "system". – Thun 25/8, 2023 at 8:19

@Thun You can append it to the user message, but if both work, I would append it to the system message. As I said, the system message helps set the behavior of the assistant. If you append it to the system message, does it work? – Veedis 25/8, 2023 at 8:55

T

0

OpenAI respects max_tokens, the word restriction in the prompt (system or user), and relevance.

Hence it doesn't cut off the content prematurely just for the sake of limiting the number of words/characters.

Mostly, if you ask to limit the content to 300 words it goes a little beyond the limit and generates somewhere between 300-400 words.

Hence if you want to restrict the content to 300 words, make the prompt generate 200 words content so that it will generate somewhat closer to 300 words

Tolu answered 26/8, 2023 at 2:31 Comment(0)

Solution 1: Using the `max_tokens` parameter (works with the GPT-3, GPT-3.5, and GPT-4 APIs)

Solution 2: Using the system message (works with the GPT-3.5 and GPT-4 APIs)

Recommended topics

Hot tags

Solution 1: Using the max_tokens parameter (works with the GPT-3, GPT-3.5, and GPT-4 APIs)

Solution 2: Using the system message (works with the GPT-3.5 and GPT-4 APIs)

Recommended topics

Hot tags

Solution 1: Using the `max_tokens` parameter (works with the GPT-3, GPT-3.5, and GPT-4 APIs)