To calculate world coordinates from screen coordinates with OpenCV
Asked Answered
H

2

9

I have calculated the intrinsic and extrinsic parameters of the camera with OpenCV. Now, I want to calculate world coordinates (x,y,z) from screen coordinates (u,v).

How I do this?

N.B. as I use the kinect, I already know the z coordinate.

Any help is much appreciated. Thanks!

Headrick answered 17/8, 2012 at 14:32 Comment(1)
So you are saying that you have Xscreen,Yscreen,and Zworld? And you want Xworld,Yworld,Zworld?Cassaundracassava
S
30

First to understand how you calculate it, it would help you if you read some things about the pinhole camera model and simple perspective projection. For a quick glimpse, check this. I'll try to update with more.

So, let's start by the opposite which describes how a camera works: project a 3d point in the world coordinate system to a 2d point in our image. According to the camera model:

P_screen = I * P_world

or (using homogeneous coordinates)

| x_screen | = I * | x_world |
| y_screen |       | y_world |
|    1     |       | z_world |
                   |    1    |

where

I = | f_x    0    c_x    0 | 
    |  0    f_y   c_y    0 |
    |  0     0     1     0 |

is the 3x4 intrinsics matrix, f being the focal point and c the center of projection.

If you solve the system above, you get:

x_screen = (x_world/z_world)*f_x + c_x
y_screen = (y_world/z_world)*f_y + c_y

But, you want to do the reverse, so your answer is:

x_world = (x_screen - c_x) * z_world / f_x
y_world = (y_screen - c_y) * z_world / f_y

z_world is the depth the Kinect returns to you and you know f and c from your intrinsics calibration, so for every pixel, you apply the above to get the actual world coordinates.

Edit 1 (why the above correspond to world coordinates and what are the extrinsics we get during calibration):

First, check this one, it explains the various coordinates systems very well.

Your 3d coordinate systems are: Object ---> World ---> Camera. There is a transformation that takes you from object coordinate system to world and another one that takes you from world to camera (the extrinsics you refer to). Usually you assume that:

  • Either the Object system corresponds with the World system,
  • or, the Camera system corresponds with the World system

1. While capturing an object with the Kinect

When you use the Kinect to capture an object, what is returned to you from the sensor is the distance from the camera. That means that the z coordinate is already in camera coordinates. By converting x and y using the equations above, you get the point in camera coordinates.

Now, the world coordinate system is defined by you. One common approach is to assume that the camera is located at (0,0,0) of the world coordinate system. So, in that case, the extrinsics matrix actually corresponds to the identity matrix and the camera coordinates you found, correspond to world coordinates.

Sidenote: Because the Kinect returns the z in camera coordinates, there is also no need from transformation from the object coordinate system to the world coordinate system. Let's say for example that you had a different camera that captured faces and for each point it returned the distance from the nose (which you considered to be the center of the object coordinate system). In that case, since the values returned would be in the object coordinate system, we would indeed need a rotation and translation matrix to bring them to the camera coordinate system.

2. While calibrating the camera

I suppose you are calibrating the camera using OpenCV using a calibration board with various poses. The usual way is to assume that the board is actually stable and the camera is moving instead of the opposite (the transformation is the same in both cases). That means that now the world coordinate system corresponds to the object coordinate system. This way, for every frame, we find the checkerboard corners and assign them 3d coordinates, doing something like:

std::vector<cv::Point3f> objectCorners;

for (int i=0; i<noOfCornersInHeight; i++) 
{
    for (int j=0; j<noOfCornersInWidth; j++) 
    {
        objectCorners.push_back(cv::Point3f(float(i*squareSize),float(j*squareSize), 0.0f));
    }
} 

where noOfCornersInWidth, noOfCornersInHeight and squareSize depend on your calibration board. If for example noOfCornersInWidth = 4, noOfCornersInHeight = 3 and squareSize = 100, we get the 3d points

(0  ,0,0)  (0  ,100,0)  (0  ,200,0)    (0  ,300,0)
(100,0,0)  (100,100,0)  (100,200,0)    (100,300,0)
(200,0,0)  (200,100,0)  (200,200,0)    (200,300,0)

So, here our coordinates are actually in the object coordinate system. (We have assumed arbitrarily that the upper left corner of the board is (0,0,0) and the rest corners' coordinates are according to that one). So here we indeed need the rotation and transformation matrix to take us from the object(world) to the camera system. These are the extrinsics that OpenCV returns for each frame.

To sum up in the Kinect case:

  • Camera and World coodinate systems are considered the same, so no need for extrinsics there.
  • No need for Object to World(Camera) transformation, since Kinect return value is already in Camera system.

Edit 2 (On the coordinate system used):

This is a convention and I think it depends also on which drivers you use and the kind of data you get back. Check for example that, that and that one.

Sidenote: It would help you a lot if you visualized a point cloud and played a little bit with it. You can save your points in a 3d object format (e.g. ply or obj) and then just import it into a program like Meshlab (very easy to use).

Satyriasis answered 17/8, 2012 at 18:56 Comment(15)
Thank you very much. Now using the following extrinsic parameters can I pass by the coordinates of CAM in the coordinates world?Headrick
The calibration I have the following extrinsic parameters (for a single installation of the board):1.7261576010447846e-01 3.1158880577193560e-01 1.2720406228471280e-02 -1.1592911113815259e+02 -2.2406582979927950e+02 8.1420941356557194e+02Headrick
You find those extrinsic parameters during the calibration, right? When you do the capture with the Kinect, do you capture the same board in the same position?Satyriasis
When I do the capture with the kinect, I use same board but with different position, then I obtain as many rows (with 6 values) as the number of images that use. For example, I have obtained these extrinsic parameters (3 for rotation and 3 for translation) for a single installation of the chessboard: 1.7261576010447846e-01 3.1158880577193560e-01 1.2720406228471280e-02 -1.1592911113815259e+02 -2.2406582979927950e+02 8.1420941356557194e+02.Headrick
What I am asking is, are those the extrinsics during the calibration or during the capture (if the position is different)? Sorry, I didn't understand that from your comment above.Satyriasis
This extrinsic parameters were obtained during the calibration. I saved 15 shots with fixed CAM and board that moved.Using OpenCV I have obtained these extrinsic parameters (matrix 15x6): 1.7261576010447846e-01 3.1158880577193560e-01 1.2720406228471280e-02 -1.1592911113815259e+02 -2.2406582979927950e+02 8.1420941356557194e+02 ....Headrick
Please don't copy and paste the parameters again, they don't mean anything. They represent the relative position of the camera in respect to the calibration board IN EACH FRAME. They are not valid for anything else. For example, you cannot use the extrinsics calculated for the first frame in your second frame. OpenCV calculates them as part of a least squares system, in order to find the intrinsics parameters of the camera.Satyriasis
yes, you're right. Then once I obtained this matrix of 6 columns (3 values ​​for the rotation and 3 for the translation) and n rows (with n number of images of the board used for calibration) as I use it to obtain the position of an object than the world coordinate system?Headrick
Again, you cannot use those parameters for any other purpose. Each extrinsics matrix refers ONLY to that specific board position. If the position changes or you capture anything else, you CANNOT use them. What exactly do you want to do at the end?Satyriasis
I want to derive the position of a marker positioned on a moving object in the scene. While the kinect is fixed. With the intrinsic parameters can obtain x and y coordinates with respect to CAM (instead of z is the output of the kinect). What I can do with the extrinsic parameters? Thanks for the helpHeadrick
You are the one that defines what your world coordinate system is. By convention, you assume that your camera is located at (0,0,0) of the world coordinate system, so the coordinates you find above are considered to be on the world coordinate system.Satyriasis
Therefore, the extrinsic parameters that I get from a particular pose of the board, what they represent? the matrix of roto-translation of the board with respect to the camera coordinate system (world)?Headrick
I'm confused, because by definition: R,T are the extrinsic parameters which denote the coordinate system transformations from 3D world coordinates to 3D camera coordinates.Headrick
@user1545210 See my edit and read the new link and let me knowSatyriasis
You have been very kind and accurate. thank you. Another thing, by convention, the orientations of the axes of the CAM are like those shown in your link?Headrick
I
0

Edit 2 (On the coordinate system used):

This is a convention and I think it depends also on which drivers you use and the kind of data you get back. Check for example that, that and that one.

if you for instance use microsoft sdk: then Z is not the distance to the camera but the "planar" distance to the camera. This might change the appropriate formulas.

Ician answered 18/2, 2013 at 16:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.