Is there a minimal, complete, working example of structure from motion/3d reconstruction?

Like the question says, I am looking for a complete, minimal, working example of the structure from motion (aka 3d reconstruction) pipeline.

Right away let me say I do not have the camera parameters. I do not know focal length or camera intrinsics. So right away, 90% of the examples/tutorials out there are not valid.

There are many questions on this topic but the code is just in snippets, and not for the complete SfM process. Many instructions are contradictory, or are just guessing, and open-source external libraries are hard to follow.

So I am looking for a short, complete, minimal, working example. Most importantly is the working requirement, since so much code out there produces bad results.

I have made a stab at it with the code below. I use synthetic data of matching pairs so there is no noise or bad correspondence issues to work around. The goal is to reconstruct a cube (8 3d points) from 2 views, each with 8 2d points. However, the final results are awful. There is no semblance of a cube shape. (I have tried normalizing and centering the data, that is not the issue).

Anyone who can provide a better minimal working example, or point out what is wrong with my attempt, is appreciated.

import cv2
import numpy as np
import scipy.linalg


def combineTR(T,R):  #turn a translation vector and a rotation matrix into one 3x4 projection matrix
    T4 = np.eye(4)
    T4[:3, 3] = T  # make it 4x4 so we can dot product it
    R4 = np.eye(4)
    R4[:3, :3] = R
    P = np.dot(T4, R4)  # combine rotation and translation into one matrix
    P = P[:3, :]  # cut off bottom row
    return P

####################################################################

# # ground truth
# Wpts = np.array([[1, 1, 1],  # A Cube in world points
#                  [1, 2, 1],
#                  [2, 1, 1],
#                  [2, 2, 1],
#                  [1, 1, 2],
#                  [1, 2, 2],
#                  [2, 1, 2],
#                  [2, 2, 2]])


views = np.array(
        [[[ 0.211,  0.392],
        [ 0.179,  0.429],
        [ 0.421,  0.392],
        [ 0.358,  0.429],
        [ 0.189,  0.193],
        [ 0.163,  0.254],
        [ 0.378,  0.193],
        [ 0.326,  0.254]],

       [[ 0.392,  0.211],
        [ 0.392,  0.421],
        [ 0.429,  0.179],
        [ 0.429,  0.358],
        [ 0.193,  0.189],
        [ 0.193,  0.378],
        [ 0.254,  0.163],
        [ 0.254,  0.326]]])

F = cv2.findFundamentalMat(views[0], views[1],cv2.FM_8POINT)[0]

# hartley and zimmermans method for finding P
e2 = scipy.linalg.null_space(F.T) #epipole of second image
C2R = np.cross(e2.T, F)  #camera 2 rotation
C2T = e2.T[0]
P = combineTR(C2T, C2R)  #projection matrix for camera 2

R = np.eye(3)  # rotation matrix for camera 1
T = [0, 0, 0]  # translation
P0 = combineTR(T,R)

tpts = cv2.triangulatePoints(P0,P,views[0].T,views[1].T)  #triangulated point
tpts /= tpts[-1]  #divide by last row and scale it

tpts *= -100
print(tpts)

Ground truth:

My results:

Recommended topics

Hot tags