I see there are ways of doing various image generation here: https://platform.openai.com/docs/api-reference/images
But I'm just trying to sent chat gpt a png file, ask "what is this?" or something like that and then get back a response.
I see there are ways of doing various image generation here: https://platform.openai.com/docs/api-reference/images
But I'm just trying to sent chat gpt a png file, ask "what is this?" or something like that and then get back a response.
I was able to get this to work using the july2024 chatgpt-4o-mini model:
import openai
import base64
client = openai.OpenAI( api_key=os.getenv("OPENAI_API_KEY"))
THIS_MODEL = "gpt-4o-mini"
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Getting the base64 string
base64_image = encode_image(image_path)
# Send the request to the API
response = client.chat.completions.create(
model=THIS_MODEL,
messages=[
{
"role": "system",
"content": [
{"type": "text",
"text": "You are a cool image analyst. Your goal is to describe what is in the image provided as a file."
}
],
},
{
"role": "user",
"content": [
{
"type":"text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url":
{
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
max_tokens=300
)
print(f"response: {response}")
# Extract the description
description = response.choices[0].message.content
print(f"Desription: {description}")
With help from the API docs on vision found here: https://platform.openai.com/docs/guides/vision
but not in chatGPT right now based on this response in their forums:
What you want is called “image captioning” and is not a service OpenAI currently provides in their API.
You can check for other APIs, such as the Azure Describe Image API, or a service such as hive.ai, or host your own CLIP model.
source: https://community.openai.com/t/how-can-i-get-description-from-the-content-of-the-image/307090/2
But I did find it possible to describe images with the Azure AI services | Computer vision API.
Now you can make the curl request like so:
url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/12/Broadway_and_Times_Square_by_night.jpg/450px-Broadway_and_Times_Square_by_night.jpg"
image_analysis = client.analyze_image(
url, visual_features=[VisualFeatureTypes.tags])
Full code example is in this replit: https://replit.com/@allenmcgehee/HonoredCarefulBackticks#main.py
© 2022 - 2024 — McMap. All rights reserved.