I've been trying to implement a simple SFM pipeline in OpenCV for a project and I'm having a bit of trouble.
It's for uncalibrated cameras so I don't have a camera matrix (Yes, I know it's going to make things much more complicated and ambiguous).
I know that I should be reading a lot more before attempting something like this but I'm quite hard pressed for time and I'm trying to read about things as I come across them.
Here's my current pipeline I've gathered from a number of articles, code samples and books. I've posted questions about specific steps after it and would also like to know is there something I'm missing in this or something I'm doing wrong?
Here's my current pipeline.
Extract SIFT/SURF Keypoints from the images.
Pairwise Matching of Images.
- During Pairwise Matching I run the "Ratio Test" to reduce the number of keypoints.
- (Not sure about this) I read that calculating Fundamental Matrix (RANSAC) and getting rid of the outliers from matches further helps it.
Q) Do I need to even do this? Is it too much Or should I be doing something else like Homography to avoid the Degenerate case of the 8-Point?
Next, I need to choose 2 images to begin the reconstruction with.
- I find the number of Homography Inliers between image pairs. I iterate through a list of image pairs in order of most number of % inliers.
- I calculate the Fundamental Matrix.
- I "guess" a K matrix and calculate Essential Matrix with the formula in Hartley's.
- I decompose this Essential Matrix with SVD and then verify the 4 solutions.
- I used the logic from Wikipedia's entry and this python gist to implement my checks.
Q) Is this right? Or should I just triangulate the points and then determine if they are in front of the camera or not or does it work out to the same thing? 5. If there was some problem finding Essential Matrix then skip it and check the next image pair
Set P=[I|0] and P1=[R|T], perform Triangulation and store the 3d points in some Data Structure. Also store the P matrices.
Run a Bundle Adjustment Step with a large-ish number of iterations to minimize error.
It gets a little hazy from here and I'm pretty sure I'm messing something up.
- Choose the next Image to add based off the most number of 3d points it has observed.
- Estimate the pose of this new Image from already known 3D points using something like PnPRasnac. Use the values of R & t as it's projective Matrix P1=[R|t]
- Triangulate this new image with all (I know, I don't need to do it with ALL of them) the images triangulated so far using their P matrices as P=PMatrices[ImageAlreadyTriangulated] and P1 obtained above.
Q) Is it really as simple as just using the original value of P we have used? Will that get everything into the same coordinate space? As in, will the triangulated points all be the same system as those obtained from the initial values of P and P1 or do I need to do some kind of transformation here?
- From the points we obtain from triangulation, only add those 3D points that we don't already have stored.
- Run a Bundle Adjustment after every couple of images
- Back to step 6 till all images are added.
General questions:
- Should I be using undistort for the points or something even though my camera matrix K is only a guess?
- For bundle adjustment, I'm outputting the points to a file in the Bundle Adjustment at Large (BAL) format. Should I be converting them to World Coordinate Space by R=R' & T=-RT or just leave them be?