image coordinate to world coordinate opencv
Asked Answered
S

4

9

I calibrated my mono camera using opencv. Now I know the camera intrinsic matrix and distortion coefs [K1, K2, P1 ,P2,K3 ,K4, K5, K6] of my camera. Assuming that camera is place in [x, y, z] with [Roll, Pitch, Yaw] rotations. how can I get each pixel in world coordinate when the camera is looking on the floor [z=0].

enter image description here

Stouffer answered 28/5, 2015 at 9:35 Comment(0)
R
7

You say that you calibrated your camera which gives you:

  • Intrinsic parameters
  • Extrinsic parameters (rotation, translation)
  • Distortion coefficients

First, to compensate for the distortion, you can use the undistort function and get an undistorted image. Now, what you are left with is the intrinsic/extrinsic parameters and the pinhole camera model. The equation below taken from the OpenCV documentation explains how to transform 3D world coordinates into 2D image coordinates using those parameters:

enter image description here

Basically, you multiply the 3D coordinates by a projection matrix, which in turn is a combination of the intrinsic parameters (the first matrix in the equation) and the extrinsic parameters (the second matrix in the equation). The extrinsic parameters matrix contains both rotation and translation components [R|T].

Ramos answered 29/5, 2015 at 20:53 Comment(3)
Thanks for your answer. I think it is the general case. It seems that transition [X, Y, Z] of the camera position should goes to t1 t2 t3. but I don't understand how can I get r11 ~ r33 from [Roll, Pitch, Yaw]. And what about my constranit that the camera is looking to the floor [Z=0]?Stouffer
If you're looking to the floor, Z=0 in the equation, and the transformation becomes invertible, as described in the slide 27 of the presentation I mentioned in my answer (gris.tu-darmstadt.de/teaching/courses/ss11/cv1/…).Scratchy
Regarding the values r11~r33, they form the so called rotation matrix. It can be written as R=ABC, where A, B, and C represent rotations about a different axis (the axes are chosen according to a convention). The rotation amount is given by the roll, pitch, and yaw angles, but since there are different conventions, the values of A, B, and C can vary. As a reference for implementing the conversion, you can use mathworld.wolfram.com/EulerAngles.html.Scratchy
S
3

I suggest you start by studying the pinhole camera model, which models the process through which a point in the 3D world is mapped to the image plane using the camera intrinsic parameters. As you'll see, this process is not one-to-one, and thus it usually cannot be inverted (image to 3D), unless you have depth information (which you have, since you said the points are located at z=0). This particular case is mentioned on slide 27 of this presentation. Previous lectures explain in details the image formation process, and can be used as a first reference to actually determine the transformation from image to world coordinates. Szeliski's book and this PDF are also great resources.

Scratchy answered 28/5, 2015 at 23:55 Comment(1)
Unfortunately the link to the presentation does not work any more. Do you know whether the presentation is gone or was simply moved?Buckling
A
2

Suppose your camera has T=[x y x]' translation according to world reference, and as you told your camera has R=[roll, pitch yawn] rotation and your camera instrics parameter is in K. Any pixel ([px py] on image plane) has W=[X,Y] coordinate on world plane adn W can be calculated just with following Matlab code

R = rotationVectorToMatrix(R)'
H=K*[R T];`
Q=inv([H(:,1) H(:,2) -[px;py;1]])*-H(:,4);
W=Q(1:2)

Here, end of the document is good example what I mean, https://github.com/muhammetbalcilar/Stereo-Camera-Calibration-Orthogonal-Planes

Arv answered 15/7, 2018 at 20:22 Comment(0)
W
1

I have made a function in Python to get world point on XZ plane from image coordinates (I have added comments later with Codeium):

def image2worldY0Position(u :int, v :int, mtx :np.ndarray, dist:np.ndarray, rvec:np.ndarray, tvec:np.ndarray):
    """
    Converts the pixel coordinates of a point in an image to its corresponding 
    world coordinates on XZ plane.
    
    Args:
        u (int): x-coordinate of the point in the image.
        v (int): y-coordinate of the point in the image.
        mtx (ndarray): Camera matrix.
        dist (ndarray): Distortion coefficients.
        rvec (ndarray): Rotation vector.
        tvec (ndarray): Translation vector.
    
    Returns:
        ndarray: The world coordinates of the point.
    """
    
    # Extract rotation matrix and its inverse
    R, _ = cv2.Rodrigues(rvec)
    R_inv = R.T
    
    # Compute optimal camera matrix and its inverse
    optimalMtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (ww, hh), 0, (ww, hh))
    mtx_inv = np.linalg.inv(optimalMtx)
    
    # Compute intermediate matrices
    tempMat = R_inv @ np.linalg.inv(mtx) @ np.array([[u],[v],[1]])
    tempMat2 = R_inv @ tvec
    
    # Compute the scaling factor
    s = tempMat2[1,0] / tempMat[1,0]
    
    # Undistort the pixel coordinates
    uv_1 = np.array([[u, v]], dtype=np.float32)
    uv_undistorted = cv2.undistortPoints(uv_1, mtx, dist, None, optimalMtx)
    uv_undistorted = np.array([[uv_undistorted[0][0][0], uv_undistorted[0][0][1],1]], dtype=np.float32).T
    
    # Compute camera coordinates
    camera_point = mtx_inv @ (s*uv_undistorted)
    
    # Compute world coordinates
    world_point = R_inv @ (camera_point - tvec)
    
    # Project world coordinates to image coordinates
    points_2d, _ = cv2.projectPoints(world_point, rvec, tvec, mtx, dist)
    
    return world_point
Withstand answered 2/7, 2024 at 17:26 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Hermon

© 2022 - 2025 — McMap. All rights reserved.