How to get the coordinates of the bounding box in YOLO object detection?

Asked 14/6, 2017 at 12:12 Answered 23/9, 2021 at 6:58

Solved python deep-learning computer-vision object-detection

I need to get the bounding box coordinates generated in the above image using YOLO object detection.

Encrimson answered 14/6, 2017 at 12:12 Comment(3)

YOLO also has a --save-text flag you can set to save the coordinate information for each bounding box to disk. – Exaltation 15/5, 2022 at 18:57

Relatedly, does anyone know how to get the confidence scores for each bounding box? – Exaltation 15/5, 2022 at 18:59

@Exaltation You can check detect.py file and edit it. Look for a function to save prediction image, labels, xyxy, etc. Labels also contain confidence score for each label. – Germen 7/12, 2022 at 0:49

A quick solution is to modify the image.c file to print out the bounding box information:

...
if(bot > im.h-1) bot = im.h-1;

// Print bounding box values 
printf("Bounding Box: Left=%d, Top=%d, Right=%d, Bottom=%d\n", left, top, right, bot); 
draw_box_width(im, left, top, right, bot, width, red, green, blue);
...

Heterozygous answered 16/6, 2017 at 14:58 Comment(5)

Seriously, thank you so much for suggesting image.c. It helped me solve a totally different problem: When running YOLO in Python (via OpenCV-DNN), the detections are given in a float format. And literally every article I've ever seen has the WRONG MATH for turning the YOLO floats (center X/Y, and width/height) into pixel coordinates. But the official image.c has the math! Right here! github.com/pjreddie/darknet/blob/… - I just had to port that to python. :-) – Andromede 10/9, 2019 at 19:4

@Brian O'Donnell How can I modify the "image.c" to only get four numbers for the coordinates of bounding boxes (without any additional description)? – Grey 13/6, 2020 at 16:31

Do you just want the numbers? If so you would want: printf("%d,%d,%d,%d\n", left, top, right, bot); – Arboreous 13/6, 2020 at 19:10

@MitchMcMabers Do you know why is there a need to multiply with the width and height? – Floridafloridia 25/1, 2022 at 18:27

@varungupta, the bounding box coordinates and dimensions are normalized by dividing by image width and height. – Exaltation 15/5, 2022 at 18:58

for python user in windows:

first..., do several setting jobs:

setting python path of your darknet folder in environtment path:

PYTHONPATH = 'YOUR DARKNET FOLDER'
add PYTHONPATH to Path value by add:

%PYTHONPATH%
edit file coco.data in cfg folder, by change the names folder variable to your coco.names folder, in my case:

names = D:/core/darknetAB/data/coco.names

with this setting, you can call darknet.py (from alexeyAB\darknet repository) as your python module from any folder.

start scripting:

from darknet import performDetect as scan #calling 'performDetect' function from darknet.py

def detect(str):
    ''' this script if you want only want get the coord '''
    picpath = str
    cfg='D:/core/darknetAB/cfg/yolov3.cfg' #change this if you want use different config
    coco='D:/core/darknetAB/cfg/coco.data' #you can change this too
    data='D:/core/darknetAB/yolov3.weights' #and this, can be change by you
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False, initOnly=False) #default format, i prefer only call the result not to produce image to get more performance

    #until here you will get some data in default mode from alexeyAB, as explain in module.
    #try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))], 
    #to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []
    if len(test) >=2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
            newdata.append(data)

    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
        newdata.append(data)

    else:
        newdata = False

    return newdata

How to use it:

table = 'D:/test/image/test1.jpg'
checking = detect(table)'

to get the coordinate:

if only 1 result:

x1, y1, x2, y2 = checking[2]

if many result:

for x in checking:
    item = x[0]
    x1, y1, x2, y2 = x[2]
    print(item)
    print(x1, y1, x2, y2)

Petropavlovsk answered 27/4, 2019 at 18:57 Comment(2)

The code is untested there is typo in weight_size and height_size. And you should use test[0] to extract item, confidence_rate, imagedata in the single detection. I have commented below with working code. Anyway lots of thanks for your code that helped me kick start. – Krummhorn 11/3, 2020 at 11:47

yeahh..., sorry for the typo...just try to help and inspirate... btw, already fix the typo....should be work now... Noted: The Newest OpenCV (4.1.1 above) already have Darknet RNN model, so, we can implement darknet, straight in opencv. OpenCV like All in One machine now... – Petropavlovsk 7/4, 2020 at 9:39

If you are going to implement this in python, there is this small python wrapper that I have created in here. Follow the ReadMe file and install it. It will be very easy to install.

After that follow this example code to know how to detect objects.
If your detection is det

top_left_x = det.bbox.x
top_left_y = det.bbox.y
width = det.bbox.w
height = det.bbox.h

If you need, you can get the midpoint by:

mid_x, mid_y = det.bbox.get_point(pyyolo.BBox.Location.MID)

Hope this helps..

Lichee answered 8/2, 2019 at 16:24 Comment(0)

Inspired from @Wahyu answer above. There are few changes, modification and bug fixes and tested with single object detection and multiple object detection.

# calling 'performDetect' function from darknet.py
from darknet import performDetect as scan
import math


def detect(img_path):
    ''' this script if you want only want get the coord '''
    picpath = img_path
    # change this if you want use different config
    cfg = '/home/saggi/Documents/saggi/prabin/darknet/cfg/yolo-obj.cfg'
    coco = '/home/saggi/Documents/saggi/prabin/darknet/obj.data'  # you can change this too
    # and this, can be change by you
    data = '/home/saggi/Documents/saggi/prabin/darknet/backup/yolo-obj_last.weights'
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False,
                initOnly=False)  # default format, i prefer only call the result not to produce image to get more performance

    # until here you will get some data in default mode from alexeyAB, as explain in module.
    # try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
    # to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []

    # For multiple Detection
    if len(test) >= 2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate,
                    (x_start, y_start, x_end, y_end), (w_size, h_size))
            newdata.append(data)

    # For Single Detection
    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate,
                (x_start, y_start, x_end, y_end), (w_size, h_size))
        newdata.append(data)

    else:
        newdata = False

    return newdata


if __name__ == "__main__":
    # Multiple detection image test
    # table = '/home/saggi/Documents/saggi/prabin/darknet/data/26.jpg'
    # Single detection image test
    table = '/home/saggi/Documents/saggi/prabin/darknet/data/1.jpg'
    detections = detect(table)

    # Multiple detection
    if len(detections) > 1:
        for detection in detections:
            print(' ')
            print('========================================================')
            print(' ')
            print('All Parameter of Detection: ', detection)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected label: ', detection[0])

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object Confidence: ', detection[1])

            x1, y1, x2, y2 = detection[2]
            print(' ')
            print('========================================================')
            print(' ')
            print(
                'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
            print('x1: ', x1)
            print('y1: ', y1)
            print('x2: ', x2)
            print('y2: ', y2)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object width and height: ', detection[3])
            b_width, b_height = detection[3]
            print('Weidth of bounding box: ', math.ceil(b_width))
            print('Height of bounding box: ', math.ceil(b_height))
            print(' ')
            print('========================================================')

    # Single detection
    else:
        print(' ')
        print('========================================================')
        print(' ')
        print('All Parameter of Detection: ', detections)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected label: ', detections[0][0])

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object Confidence: ', detections[0][1])

        x1, y1, x2, y2 = detections[0][2]
        print(' ')
        print('========================================================')
        print(' ')
        print(
            'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
        print('x1: ', x1)
        print('y1: ', y1)
        print('x2: ', x2)
        print('y2: ', y2)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object width and height: ', detections[0][3])
        b_width, b_height = detections[0][3]
        print('Weidth of bounding box: ', math.ceil(b_width))
        print('Height of bounding box: ', math.ceil(b_height))
        print(' ')
        print('========================================================')

# Single detections output:
# test value  [('movie_name', 0.9223029017448425, (206.79859924316406, 245.4672393798828, 384.83673095703125, 72.8630142211914))]

# Multiple detections output:
# test value  [('movie_name', 0.9225175976753235, (92.47076416015625, 224.9121551513672, 147.2491912841797, 42.063255310058594)),
#  ('movie_name', 0.4900225102901459, (90.5261459350586, 12.4061279296875, 182.5990447998047, 21.261077880859375))]

Krummhorn answered 11/3, 2020 at 11:44 Comment(2)

Howcome you didn't need to use the anchors ? – Formality 28/4, 2020 at 15:28

@ Pe Dro, read at section in my answer above. there is an explanation how it works, it's still use the anchor, with binding method. and to make it works, need to make some configuration that I already explain in my answer... – Petropavlovsk 30/8, 2020 at 4:39

If the Accepted Answer does not work for you this might be because you are using AlexyAB's darknet model instead of pjreddie's darknet model.

You just need to go to image_opencv.cpp file in the src folder and uncomment the following section:

            ...

            //int b_x_center = (left + right) / 2;
            //int b_y_center = (top + bot) / 2;
            //int b_width = right - left;
            //int b_height = bot - top;
            //sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);

This will print the Bbox center coordinates as well as the width and height of the Bbox. After making the changes make sure to make the darknet again before running YOLO.

Ventriculus answered 29/12, 2020 at 14:39 Comment(2)

Thanks a lot. This worked. But I want to print like: "Bounding box of <object>: Left, right,.." What else changes do I need to make? – Jaredjarek 5/11, 2021 at 5:37

` sprintf("Bounding box of %s : %d, %d", labelstr, b_x_center, b_y_center); ` – Ventriculus 6/11, 2021 at 6:20

If you are using yolov4 in the darknet framework (by which I mean the version compiled directly from the GitHub repo https://github.com/AlexeyAB/darknet) to run object detection on static images, something like the following command can be run at the command line to get the bounding box as relative coordinates:

.\darknet.exe detector test .\cfg\coco.data .\cfg\yolov4.cfg .\yolov4.weights -ext_output .\data\people1.jpg -out result.json

Note the above is in the syntax of Windows, so you may have to change the backward slashes into forward slashes for it to work on a macOS or Linux operating system. Also, please make sure the paths are accurate before running. In the command, the input is the people1.jpg file in the data directory contained in the root. The output will be stored in a file named result.json. Feel free to modify this output name but retain the .json extension to change its name.

Pammie answered 23/9, 2021 at 6:58 Comment(2)

Is it possible to save the real-time streaming reslut with the certain time interval. For example: 10 seconds. – Jaredjarek 5/11, 2021 at 8:48

I think that should be possible by modifying a script similar to this: github.com/IdoGalil/People-counting-system/blob/master/yolov3/… – Pammie 8/11, 2021 at 2:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags