There are many posts about 3D reconstruction from stereo views of known internal calibration, some of which are excellent. I have read a lot of them, and based on what I have read I am trying to compute my own 3D scene reconstruction with the below pipeline / algorithm. I'll set out the method then ask specific questions at the bottom.
0. Calibrate your cameras:
- This means retrieve the camera calibration matrices K1 and K2 for Camera 1 and Camera 2. These are 3x3 matrices encapsulating each camera's internal parameters: focal length, principal point offset / image centre. These don't change, you should only need to do this once, well, for each camera as long as you don't zoom or change the resolution you record in.
- Do this offline. Do not argue.
- I'm using OpenCV's
CalibrateCamera()
and checkerboard routines, but this functionality is also included in the Matlab Camera Calibration toolbox. The OpenCV routines seem to work nicely.
1. Fundamental Matrix F:
- With your cameras now set up as a stereo rig. Determine the fundamental matrix (3x3) of that configuration using point correspondences between the two images/views.
- How you obtain the correspondences is up to you and will depend a lot on the scene itself.
- I am using OpenCV's
findFundamentalMat()
to get F, which provides a number of options method wise (8-point algorithm, RANSAC, LMEDS). - You can test the resulting matrix by plugging it into the defining equation of the Fundamental matrix:
x'Fx = 0
where x' and x are the raw image point correspondences(x, y)
in homogeneous coordinates(x, y, 1)
and one of the three-vectors is transposed so that the multiplication makes sense. The nearer to zero for each correspondence, the better F is obeying it's relation. This is equivalent to checking how well the derived F actually maps from one image plane to another. I get an average deflection of ~2px using the 8-point algorithm.
2. Essential Matrix E:
- Compute the Essential matrix directly from F and the calibration matrices.
- E = K2TFK1
3. Internal Constraint upon E:
- E should obey certain constraints. In particular, if decomposed by SVD into
USV.t
then it's singular values should be =a, a, 0
. The first two diagonal elements of S should be equal, and the third zero. - I was surprised to read here that if this is not true when you test for it, you might choose to fabricate a new Essential matrix from the prior decomposition like so:
E_new = U * diag(1,1,0) * V.t
which is of course guaranteed to obey the constraint. You have essentially set S = (100,010,000) artificially.
4. Full Camera Projection Matrices:
- There are two camera projection matrices P1 and P2. These are 3x4 and obey the
x = PX
relation. Also,P = K[R|t]
and thereforeK_inv.P = [R|t]
(where the camera calibration has been removed). - The first matrix P1 (excluding the calibration matrix K) can be set to
[I|0]
then P2 (excluding K) isR|t
- Compute the Rotation and translation between the two cameras R, t from the decomposition of E. There are two possible ways to calculate R (
U*W*V.t
andU*W.t*V.t
) and two ways to calculate t (±third column of U), which means that there are four combinations ofRt
, only one of which is valid. - Compute all four combinations, and choose the one that geometrically corresponds to the situation where a reconstructed point is in front of both cameras. I actually do this by carrying through and calculating the resulting P2 = [R|t] and triangulating the 3d position of a few correspondences in normalised coordinates to ensure that they have a positive depth (z-coord)
5. Triangulate in 3D
- Finally, combine the recovered 3x4 projection matrices with their respective calibration matrices: P'1 = K1P1 and P'2 = K2P2
- And triangulate the 3-space coordinates of each 2d point correspondence accordingly, for which I am using the LinearLS method from here.
QUESTIONS:
- Are there any howling omissions and/or errors in this method?
- My F matrix is apparently accurate (0.22% deflection in the mapping compared to typical coordinate values), but when testing E against
x'Ex = 0
using normalised image correspondences the typical error in that mapping is >100% of the normalised coordinates themselves. Is testing E againstxEx = 0
valid, and if so where is that jump in error coming from? - The error in my fundamental matrix estimation is significantly worse when using RANSAC than the 8pt algorithm, ±50px in the mapping between x and x'. This deeply concerns me.
- 'Enforcing the internal constraint' still sits very weirdly with me - how can it be valid to just manufacture a new Essential matrix from part of the decomposition of the original?
- Is there a more efficient way of determining which combo of R and t to use than calculating P and triangulating some of the normalised coordinates?
- My final re-projection error is hundreds of pixels in 720p images. Am I likely looking at problems in the calibration, determination of P-matrices or the triangulation?