How to convert bounding box (x1, y1, x2, y2) to YOLO Style (X, Y, W, H)
Asked Answered
M

8

20

I'm training a YOLO model, I have the bounding boxes in this format:-

x1, y1, x2, y2 => ex (100, 100, 200, 200)

I need to convert it to YOLO format to be something like:-

X, Y, W, H => 0.436262 0.474010 0.383663 0.178218

I already calculated the center point X, Y, the height H, and the weight W. But still need a away to convert them to floating numbers as mentioned.

Metalliferous answered 13/5, 2019 at 15:52 Comment(0)
E
3

YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.

This all depends on the image arrays being oriented the same way.

Here is a C function to perform the conversion

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>

struct yolo {
    float   u;
    float   v;
    };

struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
    struct yolo point;

    if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
    {
        point.u = (float)x / (float)XMAX;
        point.v = (float)y / (float)YMAX;
    }
    else
    {
        point.u = INFINITY;
        point.v = INFINITY;
        errno = ERANGE;
    }

    return point;
}/* convert */


int main()
{
    struct yolo P;

    P = convert (99, 201, 255, 324);

    printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);

    exit (EXIT_SUCCESS);
}/* main */
Emmery answered 13/5, 2019 at 17:17 Comment(0)
I
33

for those looking for the reverse of the question (yolo format to normal bbox format)

def yolobbox2bbox(x,y,w,h):
    x1, y1 = x-w/2, y-h/2
    x2, y2 = x+w/2, y+h/2
    return x1, y1, x2, y2
Intravenous answered 14/4, 2021 at 18:23 Comment(5)
don't you need the total size for this?Carboloy
no you don't, you're only converting different formats. Converting meters to inches doesn't need you to know the full size of the house, you just run the equationIntravenous
Your equation and the fact that you put it here saved me 15 minutes yesterday, thanks a lot, and for that I also upvoted it. Even if I had to add the multiplication with the size, because converting back to pixel coordinates would very well need the size. 0.4 in a 500px image is x=200. 0.4 in a 1000 pixel image is x=400. If you're not converting back to a pixel based format, it would probably be good to mention that in the posting.Carboloy
actually there's no need for multiplying to convert to pixel coordinates, but you probably do need to round it. in the example: yolobbox2bbox(5,5,2,2): output:(4.0, 4.0, 6.0, 6.0). which is exactly in pixel dimensions. Check your input to this function, if the largest value is 1, then that's why you needed to multiply, this function is generic and takes pixel coordinates and returns pixel coordinates, or takes scaled coordinates (0,1) and returns scaled coordinates. You could scale it before or after. you shouldn't need to multiply if the input is pixels.Intravenous
But you're not converting meters to inches, you're converting percent to inches. If I said I ran half way, you need know how far I ran unless you know how far the full length is.Rosemarie
S
22

Here's code snipet in python to convert x,y coordinates to yolo format

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])


print(xmin, xmax, ymin, ymax) #define your x,y coordinates
b = (xmin, xmax, ymin, ymax)
bb = convert((w,h), b)

Check my sample program to convert from LabelMe annotation tool format to Yolo format https://github.com/ivder/LabelMeYoloConverter

Serles answered 14/5, 2019 at 0:15 Comment(7)
Doesn't this convert it to the center-normalised coordinates? Is this the same as the YOLO bounding box encoding which is relative to the grid cell??Afrikah
@Afrikah it's relative to the grid cell when you perform detection. This format is for the training dataSerles
I think this is wrong. convert returns the coordinates in center-normalised coordinates. This is relative to the entire image and not the grid cells. To make it relative to the grid cells, you need to multiply (7 * center_x) - floor(7 * center_x), assuming a grid size of 7Afrikah
@Afrikah I already told you that you don't have to make the coordinate relative to the grid cells when you prepare annotation on your dataset. Could you give me a link or source that tell you that you have to calculate the coord related to grid cell when ANNOTATING training data, not when training or during inference?Serles
What is size in the convert function?Correspondence
@Correspondence it is (w,h)Serles
warning for others, the question asks (x1, y1, x2, y2) while the answer provided is in (xmin, xmax, ymin, ymax), so please adapt accordinglyDrawstring
A
9

There is a more straight-forward way to do those stuff with pybboxes. Install with,

pip install pybboxes

use it as below,

import pybboxes as pbx

voc_bbox = (100, 100, 200, 200)
W, H = 1000, 1000  # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="yolo", image_size=(W,H))
>>> (0.15, 0.15, 0.1, 0.1)

Note that, converting to YOLO format requires the image width and height for scaling.

Apeman answered 1/5, 2022 at 9:9 Comment(0)
E
3

YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.

This all depends on the image arrays being oriented the same way.

Here is a C function to perform the conversion

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>

struct yolo {
    float   u;
    float   v;
    };

struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
    struct yolo point;

    if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
    {
        point.u = (float)x / (float)XMAX;
        point.v = (float)y / (float)YMAX;
    }
    else
    {
        point.u = INFINITY;
        point.v = INFINITY;
        errno = ERANGE;
    }

    return point;
}/* convert */


int main()
{
    struct yolo P;

    P = convert (99, 201, 255, 324);

    printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);

    exit (EXIT_SUCCESS);
}/* main */
Emmery answered 13/5, 2019 at 17:17 Comment(0)
A
2

There are two potential solutions. First of all you have to understand if your first bounding box is in the format of Coco or Pascal_VOC. Otherwise you can't do the right math.

Here is the formatting;

Coco Format: [x_min, y_min, width, height]
Pascal_VOC Format: [x_min, y_min, x_max, y_max]

Here are some Python Code how you can do the conversion:

Converting Coco to Yolo

# Convert Coco bb to Yolo
def coco_to_yolo(x1, y1, w, h, image_w, image_h):
    return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]

Converting Pascal_voc to Yolo

# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
    return [((x2 + x1)/(2*image_w)), ((y2 + y1)/(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]

If need additional conversions you can check my article at Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742

Axinomancy answered 7/4, 2022 at 12:28 Comment(0)
P
0

For yolo format to x1,y1, x2,y2 format

def yolobbox2bbox(x,y,w,h):
    x1 = int((x - w / 2) * dw)
    x2 = int((x + w / 2) * dw)
    y1 = int((y - h / 2) * dh)
    y2 = int((y + h / 2) * dh)

    if x1 < 0:
        x1 = 0
    if x2 > dw - 1:
        x2 = dw - 1
    if y1 < 0:
        y1 = 0
    if y2 > dh - 1:
        y2 = dh - 1

return x1, y1, x2, y2
Periphery answered 24/8, 2021 at 8:26 Comment(0)
B
0

Just reading the answers I am also looking for this but find this more informative to know what happening at the backend. Form Here: Source

Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:

x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin

You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:

x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height

This assumes a top-left origin, you will have to apply a shift factor if this is not the case.

So the answer

Bickel answered 13/9, 2021 at 11:22 Comment(0)
S
0

There are two things you need to do:

  1. Divide the coordinates by the image size to normalize them to [0..1] range.
  2. Convert (x1, y1, x2, y2) coordinates to (center_x, center_y, width, height).

If you're using PyTorch, Torchvision provides a function that you can use for the conversion:

from torch import tensor
from torchvision.ops import box_convert

image_size = tensor([608, 608])
boxes = tensor([[100, 100, 200, 200], [300, 300, 400, 400]], dtype=float)
boxes[:, :2] /= image_size
boxes[:, 2:] /= image_size
boxes = box_convert(boxes, "xyxy", "cxcywh")
Shriver answered 23/12, 2022 at 9:35 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.