Is there a minimal, complete, working example of structure from motion/3d reconstruction?
Asked Answered
N

0

10

Like the question says, I am looking for a complete, minimal, working example of the structure from motion (aka 3d reconstruction) pipeline.

Right away let me say I do not have the camera parameters. I do not know focal length or camera intrinsics. So right away, 90% of the examples/tutorials out there are not valid.

There are many questions on this topic but the code is just in snippets, and not for the complete SfM process. Many instructions are contradictory, or are just guessing, and open-source external libraries are hard to follow.

So I am looking for a short, complete, minimal, working example. Most importantly is the working requirement, since so much code out there produces bad results.

I have made a stab at it with the code below. I use synthetic data of matching pairs so there is no noise or bad correspondence issues to work around. The goal is to reconstruct a cube (8 3d points) from 2 views, each with 8 2d points. However, the final results are awful. There is no semblance of a cube shape. (I have tried normalizing and centering the data, that is not the issue).

Anyone who can provide a better minimal working example, or point out what is wrong with my attempt, is appreciated.

import cv2
import numpy as np
import scipy.linalg


def combineTR(T,R):  #turn a translation vector and a rotation matrix into one 3x4 projection matrix
    T4 = np.eye(4)
    T4[:3, 3] = T  # make it 4x4 so we can dot product it
    R4 = np.eye(4)
    R4[:3, :3] = R
    P = np.dot(T4, R4)  # combine rotation and translation into one matrix
    P = P[:3, :]  # cut off bottom row
    return P

####################################################################

# # ground truth
# Wpts = np.array([[1, 1, 1],  # A Cube in world points
#                  [1, 2, 1],
#                  [2, 1, 1],
#                  [2, 2, 1],
#                  [1, 1, 2],
#                  [1, 2, 2],
#                  [2, 1, 2],
#                  [2, 2, 2]])


views = np.array(
        [[[ 0.211,  0.392],
        [ 0.179,  0.429],
        [ 0.421,  0.392],
        [ 0.358,  0.429],
        [ 0.189,  0.193],
        [ 0.163,  0.254],
        [ 0.378,  0.193],
        [ 0.326,  0.254]],

       [[ 0.392,  0.211],
        [ 0.392,  0.421],
        [ 0.429,  0.179],
        [ 0.429,  0.358],
        [ 0.193,  0.189],
        [ 0.193,  0.378],
        [ 0.254,  0.163],
        [ 0.254,  0.326]]])

F = cv2.findFundamentalMat(views[0], views[1],cv2.FM_8POINT)[0]

# hartley and zimmermans method for finding P
e2 = scipy.linalg.null_space(F.T) #epipole of second image
C2R = np.cross(e2.T, F)  #camera 2 rotation
C2T = e2.T[0]
P = combineTR(C2T, C2R)  #projection matrix for camera 2

R = np.eye(3)  # rotation matrix for camera 1
T = [0, 0, 0]  # translation
P0 = combineTR(T,R)

tpts = cv2.triangulatePoints(P0,P,views[0].T,views[1].T)  #triangulated point
tpts /= tpts[-1]  #divide by last row and scale it

tpts *= -100
print(tpts)

Ground truth: enter image description here

My results:enter image description here

Nanci answered 28/11, 2019 at 4:14 Comment(5)
You explain that you lack camera parameters. Ok, fair enough. Is there maybe some Use Case, some restriction on the video input, which would let us recover those parameters as a first step? For example, you inject an object of known dimensions into the scene being filmed, and require that it move in a certain way? or that the camera move in a certain way? to resolve visual ambiguities and recover the parameters. Or perhaps require that someone's head be continuously visible from near then far distances, or from two angles?Kristakristal
@Kristakristal Nope. No constraints. The only information available is the points.Nanci
"constraints" means something preventing you from having intrinsics. can you get intrinsics or does your SFM also have to estimate those? that would be called "autocalibration", it requires a ton of data, and it's less reliable than a proper calibration. -- SFM requires points to have some identity so they can be matched. otherwise you can't even begin to figure out what appearances (2D) belong to what 3D position in space. your scene has eight identical balls. let's assume that you have them in matching order...Bridgework
@ChristophRackwitz Ah, but there are two scenes. Its theoretically possible. Just trying to find code to do it.Nanci
both your views show points that are between 0.0 and 0.5. why this skew? this is neither centered around 0, nor centered in a 0..1 range. it probably doesn't matter but I thought I'd ask.Bridgework

© 2022 - 2024 — McMap. All rights reserved.