I calibrated my mono camera using opencv. Now I know the camera intrinsic matrix and distortion coefs [K1, K2, P1 ,P2,K3 ,K4, K5, K6] of my camera. Assuming that camera is place in [x, y, z] with [Roll, Pitch, Yaw] rotations. how can I get each pixel in world coordinate when the camera is looking on the floor [z=0].
You say that you calibrated your camera which gives you:
- Intrinsic parameters
- Extrinsic parameters (rotation, translation)
- Distortion coefficients
First, to compensate for the distortion, you can use the undistort function and get an undistorted image. Now, what you are left with is the intrinsic/extrinsic parameters and the pinhole camera model. The equation below taken from the OpenCV documentation explains how to transform 3D world coordinates into 2D image coordinates using those parameters:
Basically, you multiply the 3D coordinates by a projection matrix, which in turn is a combination of the intrinsic parameters (the first matrix in the equation) and the extrinsic parameters (the second matrix in the equation). The extrinsic parameters matrix contains both rotation and translation components [R|T]
.
I suggest you start by studying the pinhole camera model, which models the process through which a point in the 3D world is mapped to the image plane using the camera intrinsic parameters. As you'll see, this process is not one-to-one, and thus it usually cannot be inverted (image to 3D), unless you have depth information (which you have, since you said the points are located at z=0). This particular case is mentioned on slide 27 of this presentation. Previous lectures explain in details the image formation process, and can be used as a first reference to actually determine the transformation from image to world coordinates. Szeliski's book and this PDF are also great resources.
Suppose your camera has T=[x y x]' translation according to world reference, and as you told your camera has R=[roll, pitch yawn] rotation and your camera instrics parameter is in K. Any pixel ([px py] on image plane) has W=[X,Y] coordinate on world plane adn W can be calculated just with following Matlab code
R = rotationVectorToMatrix(R)'
H=K*[R T];`
Q=inv([H(:,1) H(:,2) -[px;py;1]])*-H(:,4);
W=Q(1:2)
Here, end of the document is good example what I mean, https://github.com/muhammetbalcilar/Stereo-Camera-Calibration-Orthogonal-Planes
I have made a function in Python to get world point on XZ plane from image coordinates (I have added comments later with Codeium):
def image2worldY0Position(u :int, v :int, mtx :np.ndarray, dist:np.ndarray, rvec:np.ndarray, tvec:np.ndarray):
"""
Converts the pixel coordinates of a point in an image to its corresponding
world coordinates on XZ plane.
Args:
u (int): x-coordinate of the point in the image.
v (int): y-coordinate of the point in the image.
mtx (ndarray): Camera matrix.
dist (ndarray): Distortion coefficients.
rvec (ndarray): Rotation vector.
tvec (ndarray): Translation vector.
Returns:
ndarray: The world coordinates of the point.
"""
# Extract rotation matrix and its inverse
R, _ = cv2.Rodrigues(rvec)
R_inv = R.T
# Compute optimal camera matrix and its inverse
optimalMtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (ww, hh), 0, (ww, hh))
mtx_inv = np.linalg.inv(optimalMtx)
# Compute intermediate matrices
tempMat = R_inv @ np.linalg.inv(mtx) @ np.array([[u],[v],[1]])
tempMat2 = R_inv @ tvec
# Compute the scaling factor
s = tempMat2[1,0] / tempMat[1,0]
# Undistort the pixel coordinates
uv_1 = np.array([[u, v]], dtype=np.float32)
uv_undistorted = cv2.undistortPoints(uv_1, mtx, dist, None, optimalMtx)
uv_undistorted = np.array([[uv_undistorted[0][0][0], uv_undistorted[0][0][1],1]], dtype=np.float32).T
# Compute camera coordinates
camera_point = mtx_inv @ (s*uv_undistorted)
# Compute world coordinates
world_point = R_inv @ (camera_point - tvec)
# Project world coordinates to image coordinates
points_2d, _ = cv2.projectPoints(world_point, rvec, tvec, mtx, dist)
return world_point
© 2022 - 2025 — McMap. All rights reserved.