How can I upload an image as context with a prompt to GPT4's api?
Asked Answered
R

2

10

I see there are ways of doing various image generation here: https://platform.openai.com/docs/api-reference/images

But I'm just trying to sent chat gpt a png file, ask "what is this?" or something like that and then get back a response.

Rickettsia answered 30/9, 2023 at 4:58 Comment(0)
L
2

I was able to get this to work using the july2024 chatgpt-4o-mini model:

    import openai
    import base64
    client = openai.OpenAI( api_key=os.getenv("OPENAI_API_KEY"))
    THIS_MODEL = "gpt-4o-mini"
    # Function to encode the image
    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')
    
    # Getting the base64 string
    base64_image = encode_image(image_path)
    
    # Send the request to the API
    response = client.chat.completions.create(
            model=THIS_MODEL,
            messages=[
                {
                    "role": "system",
                    "content": [
                        {"type": "text",
                        "text": "You are a cool image analyst.  Your goal is to describe what is in the image provided as a file."
                        }
                    ],
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type":"text",
                            "text": "What is in this image?"
                        },
                        {
                            "type": "image_url",
                            "image_url": 
                                {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                        }
                    ]
                }
            ],
            max_tokens=300
        )
    print(f"response: {response}")
    # Extract the description
    description = response.choices[0].message.content
    print(f"Desription: {description}")
  • Create the client
  • Convert the image to base64
  • prepare the message:
    • include the type: "image_url"
    • include the url made inline: "url": f"data:image/jpeg;base64,{base64_image}"

With help from the API docs on vision found here: https://platform.openai.com/docs/guides/vision

Lactiferous answered 19/7 at 22:18 Comment(0)
W
0

It is possible...

but not in chatGPT right now based on this response in their forums:

What you want is called “image captioning” and is not a service OpenAI currently provides in their API.

You can check for other APIs, such as the Azure Describe Image API, or a service such as hive.ai, or host your own CLIP model.

source: https://community.openai.com/t/how-can-i-get-description-from-the-content-of-the-image/307090/2

Use Azure Computer Vision To Describe an Image

But I did find it possible to describe images with the Azure AI services | Computer vision API.

  1. Create a free Azure account: https://azure.microsoft.com/en-us/free
  2. Go to portal.azure.com and create your own instance of Computer Vision by searching for and clicking on Computer vision and clicking the + Create button
  3. Enter the details it needs(sub account, name, etc) until you finish
  4. Click Manage keys: Click here to manage keys your keys and the endpoint
  5. Save your Key 1 and Endpoint values

Now you can make the curl request like so:

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"

image_analysis = client.analyze_image(
    url, visual_features=[VisualFeatureTypes.tags])

Full code example is in this replit: https://replit.com/@allenmcgehee/HonoredCarefulBackticks#main.py

Woodnote answered 26/10, 2023 at 17:24 Comment(2)
Thanks. I don't think the GPT4V image upload functionality is the same as image captioning though... at least the results seem very different.Mickeymicki
out of date. it is now possilbeLactiferous

© 2022 - 2024 — McMap. All rights reserved.