Return coordinates for bounding boxes Google's Object Detection API

About

Asked 4/11, 2017 at 12:3 Answered 3/12, 2017 at 14:36

Solved tensorflow object-detection object-detection-api

How can i get the coordinates of the produced bounding boxes using the inference script of Google's Object Detection API? I know that printing boxes[0][i] returns the predictions of the ith detection in an image but what exactly is the meaning of these returned numbers? Is there a way that i can get xmin,ymin,xmax,ymax? Thanks in advance.

Human answered 4/11, 2017 at 12:3 Comment(1)

if you are happy with my answer feel free to mark it as the accepted one. – Delinquency 20/11, 2019 at 8:52

Google Object Detection API returns bounding boxes in the format [ymin, xmin, ymax, xmax] and in normalised form (full explanation here). To find the (x,y) pixel coordinates we need to multiply the results by width and height of the image. First get the width and height of your image:

width, height = image.size

Then, extract ymin,xmin,ymax,xmax from the boxes object and multiply to get the (x,y) coordinates:

ymin = boxes[0][i][0]*height
xmin = boxes[0][i][1]*width
ymax = boxes[0][i][2]*height
xmax = boxes[0][i][3]*width

Finally print the coordinates of the box corners:

print 'Top left'
print (xmin,ymin,)
print 'Bottom right'
print (xmax,ymax)

Delinquency answered 3/12, 2017 at 14:36 Comment(7)

Any explanation for why this is done? Your link is dead. Is it because the input images get resized to a standard size? And that normalised coordinates are useful to work any sized input? – Koralie 8/3, 2018 at 6:4

is image a numpy array? If so image.size gives number of elements in the array, and image.shape gives dimensions of the image. But I thought it gives number of rows, then number of columns for a matrix i.e. height, width = image.shape. – Quittor 8/3, 2018 at 17:41

@CMCDragonkai, yes that would make sense. Lots of sizing and resizing in neural networks. – Delinquency 11/3, 2018 at 8:52

@Quittor Expect the docs to keep moving for some time to come. tensorflow.org/api_guides/python/… – Delinquency 11/3, 2018 at 8:56

@Delinquency Thanks for the updated link. My comment was about the line in your answer that says width, height = image.size. I think this should be height, width = image.shape[:2]. I still think so after reading the updated link. The very first section "Encoding and Decoding" says "Encoded images are represented by scalar string Tensors, decoded images by 3-D uint8 tensors of shape [height, width, channels]. It would be great if you can clarify why you use width, height = image.size. – Quittor 11/3, 2018 at 11:40

The link to the documentation is dead. – Fervid 28/5, 2018 at 18:13

Does the boxes object still work? I cannot find it. – Totter 6/9, 2018 at 12:19

The boxes array that you mention contains this information and the format is a [N, 4] array where each row is of the format: [ymin, xmin, ymax, xmax] in normalized coordinates relative to the size of the input image.

Brunel answered 5/11, 2017 at 20:39 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags