Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

About

Asked 20/10, 2011 at 12:47 Answered 18/9, 2019 at 16:53

Solved c++opencv camera-calibration homogenous-transformation pose-estimation

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?

enter image description here

1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.

2) I can use the OpenCV documentation approach:

enter image description here

as I know u, v, R , t , I can compute X,Y,Z.

However both methods end up in different results that are not correct.

What am I'm doing wrong?

Perfume answered 20/10, 2011 at 12:47 Comment(1)

Very good answer, please, if that answer help, tick it as correct – Pillion 24/8, 2012 at 11:58

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as

                   H = K*[r1, r2, t],       //eqn 8.1, Hartley and Zisserman

with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.

Then normalize dividing everything by t3.

What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.

Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:

        p          = [x y 1];
        projection = H * p;                   //project
        projnorm   = projection / p(z);      //normalize

Hope this helps.

Imprimis answered 25/5, 2012 at 7:56 Comment(10)

could it be that you have written the columns wrong? did you maybe mean column (r12 r22 r32) and (r13 r23 and r33) instead? – Tardy 4/7, 2012 at 18:14

Aren't you assuming here that the pose is relative to z == 0? You may want to specify that. The third column of the pose is only redundant if the incoming coordinates always have z == 0. – Klump 22/8, 2012 at 15:28

Well yes, z=0 is the camera origin, where the "offline marker" is supossed to be when yo take a picture of it from above. That's the principle of this technique in augmented reality. Tomorrow I will post the link to the publication where all augmented reality methods and principles are explained. – Imprimis 22/8, 2012 at 20:59

I wonder how it could be possible to obtain the homography from the pose when the plane is not at Z=0. I found this relation H = R+t/d*n.T where d is the distance from the camera to the plane and n is the normal to the plane. I've tried but it's not working for me, in my case the plane is not at Z=0 – Tophet 16/11, 2012 at 12:7

I didn't get the normalize part. p/p(z) will give z of all points as 1. so how to get 3D points? – Gaylordgaylussac 11/3, 2013 at 10:28

This solution is true if the object is planar.For nonplanar object you need to have atleast two poses to recover the 3D points in object frame. – Catlaina 23/4, 2013 at 12:43

This is a standard method that appears on publications, it is only intended for planar textures, and always the z=0 because you uuse a jpeg of the texture. If you use a different trackable, you need to know its geometry. For example an autocad model. – Imprimis 23/4, 2013 at 20:17

Homography is for planar scene only and also for such obtained by pure rotation. Fundamental matrix/essential matrix is the way to go for the general case but you need at least two views of the scene with the point in both of them as user2311339 mentioned. Then you triangulate the 3D point based on the pair of matched 2D points one in each view. – Piezoelectricity 29/6, 2014 at 14:10

Please someone help me, i don't understand whats the meaning of projection / p(z)? where is the value of p(z)?which element ? – Arthrospore 3/8, 2018 at 14:1

@Imprimis why when the yaw,pitch,roll of the camera in real world is 0,0,0 and the pixel I am selecting is the image center, I am getting direction of 0,0,1 ? like the camera is pointing down ? – Bradybradycardia 22/1, 2019 at 12:20

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.

p          = [x y 1];
projection = H * p;                   //project
projnorm   = projection / p(z);      //normalize

One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.

Botelho answered 18/9, 2019 at 16:53 Comment(2)

Hey @DerekG, Could you share link to the places where I can read more about the Pytorch/CNN method ? Including the source code... Thanks :) – Tshombe 10/8, 2020 at 13:6

github.com/DerekGloudemans/KITTI-utils Repository isn't really maintained anymore but its there – Botelho 10/8, 2020 at 13:51

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags