OpenAI API: How do I enable JSON mode using the gpt-4-vision-preview model?
Asked Answered
A

5

8

Update: It seems like they made a mistake in the API docs, and fixed it now.

Earlier, it said "when calling gpt-4-vision-preview or gpt-3.5-turbo," but now reads "when calling gpt-4-1106-preview or gpt-3.5-turbo-1106."


According to Text generation - OpenAI API, "when calling gpt-4-vision-preview or gpt-3.5-turbo, you can set response_format to { type: "json_object" } to enable JSON mode."

However, the following code throws an error:

 {'error': {'message': '1 validation error for Request\nbody -> response_format\n  extra fields not permitted (type=value_error.extra)', 'type': 'invalid_request_error', 'param': None, 'code': None}}

If I comment "response_format": {"type": "json_object"}, it works fine.

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    
    payload = {
        "model": "gpt-4-vision-preview",
        "response_format": {"type": "json_object"},
        "messages": [
          {
            "role": "system",
            "content": "You are a helpful assistant. Your response should be in JSON format."
          },
          {
            "role": "user",
            "content": [
              {
                "type": "text",
                "text": prompt
              },
              {
                "type": "image_url",
                "image_url": {
                  "url": f"data:image/jpeg;base64,{base64_image}"
                }
              }
            ]
          }
        ],
        "max_tokens": 1000,
    }
    
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    print(response.json())
Anagram answered 6/11, 2023 at 23:46 Comment(0)
B
8

You can get the JSON response back only if using gpt-4-1106-preview or gpt-3.5-turbo-1106, as stated in the official OpenAI documentation:

A common way to use Chat Completions is to instruct the model to always return JSON in some format that makes sense for your use case, by providing a system message. This works well, but occasionally the models may generate output that does not parse to valid JSON.

To prevent these errors and improve model performance, when calling gpt-4-1106-preview or gpt-3.5-turbo-1106, you can set response_format to { type: "json_object" } to enable JSON mode. When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON.

Also, I've made a YouTube tutorial on how to get the response in JSON format and posted the code on my GitHub profile.


Working example in Python

If you run test.py, you'll get the following response:

{ "response": "Hello! How can I assist you today?" }

test.py

import os
from openai import OpenAI
client = OpenAI()
OpenAI.api_key = os.getenv('OPENAI_API_KEY')

completion = client.chat.completions.create(
  model="gpt-4-1106-preview",
  messages=[
    {"role": "system", "content": "You are a helpful assistant. Your response should be in JSON format."},
    {"role": "user", "content": "Hello!"}
  ],
  response_format={"type": "json_object"}
)

print(completion.choices[0].message.content)

Working example in Node.js

If you run test.js, you'll get the following response:

{ "response": "Hello! How can I assist you today?" }

test.js

const OpenAI = require("openai");
const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: "gpt-4-1106-preview",
    messages: [
      {
        role: "system",
        content:
          "You are a helpful assistant. Your response should be in JSON format.",
      },
      { role: "user", content: "Hello!" },
    ],
    response_format: { type: "json_object" },
  });

  console.log(completion.choices[0].message.content);
}

main();
Bernabernadene answered 7/11, 2023 at 9:22 Comment(3)
Yea, it seems like they fixed the docs. It definitely said "when calling gpt-4-vision-preview or gpt-3.5-turbo" yesterday.Anagram
When i enable JSON mode. He returned the JSON content from the example I provided along with itRegnal
Update: To prevent these errors and improve model performance, when using gpt-4o, gpt-4-turbo, or gpt-3.5-turbo, you can set response_format to { "type": "json_object" } to enable JSON mode. When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON object.Chamfer
M
2

Unfortunately at the moment, the gpt-4-vision-preview and gpt-3.5-turbo models don't support the JSON output format. In the official documentation from OpenAI, you can read about the JSON mode. There are mentioned only two models: gpt-4-1106-preview and gpt-3.5-turbo-1106.

Therefore, the solution for you is to choose one of these models.

enter image description here

enter image description here

Monoclinous answered 7/11, 2023 at 8:55 Comment(1)
Yea, it seems like they fixed the docs. It definitely said "when calling gpt-4-vision-preview or gpt-3.5-turbo" yesterday. I just copied and pasted from the docs yesterday.Anagram
L
1

According to the release notes, those GPT modes don't support the new JSON output format option. Try gpt-3.5-turbo-1106 or gpt-4-1106-preview instead of gpt-3.5-turbo.

https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo

Lanceolate answered 7/11, 2023 at 6:15 Comment(1)
Yea, it seems like they fixed the docs. It definitely said "when calling gpt-4-vision-preview or gpt-3.5-turbo" yesterday. I just copied and pasted in my question.Anagram
R
1

I solved easy with a work around (ugly, but it works):

response.getBody().toString().split("```json")[1].split("```")[0];

It's in Java, but the concept it's that

Rabblerouser answered 18/11, 2023 at 13:52 Comment(0)
A
1

I've been working on a project called convo-lang which is a mix between a procedural programming language and a prompting templating sytem. I just added support for JSON mode and it works with vision and function calling too.

You can define images using the markdown image format and those images can be queried by GPT-4 and GPT-3-turbo. Under the hood Convo defines a function that GPT-4 and GPT-3-turbo can call that is ran as a separate prompt using GPT4-vision.

They syntax of Convo-lang super easy to use and even has a vscode extension that gives you syntax highlighting and that let's you run prompts directly in vscode for testing and experimenting.

Here is an example of a prompt with an image using JSON mode

Image of syntax highlighting here -> https://raw.githubusercontent.com/iyioio/common/main/assets/convo/image-vision-example.png

> define
Person = struct(
    name?:string
    description?:string
)

@json Person[]
@responseAssign dudes
> user
Describe each person in this picture.

![](https://raw.githubusercontent.com/iyioio/common/main/assets/convo/abbey-road.jpg)



@tokenUsage 213 / 49 / $0.0036000000000000003
@toolId call_dfpKG4bnDRA3UTBjspQr2O4s
> call queryImage(
    "query": "describe each person in this picture",
    "imageUrls": [
        "https://raw.githubusercontent.com/iyioio/common/main/assets/convo/abbey-road.jpg"
    ]
)
> result
__return={
    "result": "The image shows four individuals crossing the street in single file on a zebra crossing. They are all adult males, and they appear to be dressed in distinct styles:\n\n1. The first person, walking from right to left, has long hair and is wearing a white suit. He is barefoot and steps off with his right foot onto the crossing.\n\n2. The second individual has darker hair and is wearing a black suit with a white shirt and no tie. He is also stepping off with his right foot, wearing black shoes.\n\n3. The third man is wearing a suit as well, looking somewhat casually dressed compared to the second person. He has a beard and long hair and is wearing an open neck shirt with a black suit, and he is mid-step onto the zebra with his left foot forward, wearing black shoes.\n\n4. The fourth man brings up the rear and is dressed most casually. He wears a denim jacket and jeans with a white shirt underneath. He has shoulder-length hair and is taking a step with his right foot, wearing dark-colored shoes.\n\nThe setting appears to be a quiet, tree-lined street with cars parked on either side, and there is a white Volkswagen Beetle and a black car visible behind them, among other vehicles. The architecture in the background suggests this might be a residential area. The image has a calm, orderly feel, indicative of an everyday urban scene albeit with a touch of timeless style."
}


@tokenUsage 564 / 252 / $0.0132
@format json
@assign dudes
> assistant
[
    {
        "name": "first person",
        "description": "The first person, walking from right to left, has long hair and is wearing a white suit. He is barefoot and steps off with his right foot onto the crossing."
    },
    {
        "name": "second person",
        "description": "The second individual has darker hair and is wearing a black suit with a white shirt and no tie. He is also stepping off with his right foot, wearing black shoes."
    },
    {
        "name": "third person",
        "description": "The third man is wearing a suit as well, looking somewhat casually dressed compared to the second person. He has a beard and long hair and is wearing an open neck shirt with a black suit, and he is mid-step onto the zebra with his left foot forward, wearing black shoes."
    },
    {
        "name": "fourth person",
        "description": "The fourth man brings up the rear and is dressed most casually. He wears a denim jacket and jeans with a white shirt underneath. He has shoulder-length hair and is taking a step with his right foot, wearing dark-colored shoes."
    }
]


NPM - https://www.npmjs.com/package/@iyio/convo-lang

GitHub - https://github.com/iyioio/common/blob/main/packages/convo-lang/README.md

VSCode extension - https://marketplace.visualstudio.com/items?itemName=IYIO.convo-lang-tools

Attain answered 13/12, 2023 at 7:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.