How do I construct a 3D model of a room from 2 stereo cameras? What is the determining factor to an accurate construction?

Asked 18/6, 2010 at 9:1 Answered 22/6, 2010 at 17:13

computer-vision 3d-reconstruction

Currently, I have extracted depth points to construct a 3D model from 2 stereo cameras. The methods I have used are openCV graphCut method and a software from http://sourceforge.net/projects/reconststereo/. However, the generated 3D models are not very accurate, which leads me to question: 1) What is the problem with pixel-based method? 2) Should I change my pixel-based method to feature-based or object-recognition-based method? Is there a best method? 3) Are there any other ways to do such reconstruction?

Additionally, the depth extracted comes only from 2 images. What if I am turning the camera 360 degrees to obtain a video? Looking forward to suggestion on how to combine this depth information.

Thank you very much :)

Nika answered 18/6, 2010 at 9:1 Comment(2)

For general information about making 3D images and videos from footage from 2 camereas see here: forum.videohelp.com/threads/… – Schneider 18/6, 2010 at 9:9

:) I was thinking of getting 3D models in the computer – Nika 20/6, 2010 at 6:58

The key problem that defines the accuracy of stereo reconstruction is disparity estimation. This area has been investigated extensively, but state-of-the-art results are collected on the page: http://vision.middlebury.edu/stereo/eval/ I recommend you to pick up one of the top methods. Probably you will need to implement it by yourself (references to the papers are in the bottom of the page), or try to find an implementation on the homepages of the authors. Also look at http://vision.middlebury.edu/MRF/code/ .

You should also try to figure out the reason of low accuracy. It may be inability of the algorithm to capture the structure of a scene, or just low resolution of an output. In the latter case you need to go to the sub-pixel accuracy. The number of methods address this problem. Use the Error Threshold combo-box to rank the algorithms according to the desired precision.

Multiple cameras could help as well. Keywords are "multi-view stereo".

Wilda answered 22/6, 2010 at 17:13 Comment(3)

after looking at them, do you have any idea why the depth estimation fails if it is a featureless surface (eg lamp in tsukuba)? – Nika 24/6, 2010 at 4:6

Which method do you mean? A simple window-based method cannot estimate the disparity in a textureless region since it cannot match two windows from the different images. They don't have features, and any shift is equaly possible. Modern methods don't have this problem since they use the context e.g. via MRFs. They know the disparity on the border of the lamp and propagate it to the center. – Wilda 28/6, 2010 at 14:31

Seems like the website hasn't been updated since 2015 or there has been no progress on this problem since then – Lepidolite 17/6, 2021 at 15:2

There's project for this in Sourceforge: 3D Reconstruction

Tiffanytiffi answered 19/6, 2010 at 13:51 Comment(3)

:) I used this but at closer examination, there were spikes coming out of the pictures due to noises...so looking for a better way to handle it – Nika 20/6, 2010 at 7:6

That's pretty common when creating 3D image from 2D images. I think you are pushing the edges of what can be done currently. – Tiffanytiffi 21/6, 2010 at 10:57

i think at present,maybe it is to focus on featureless surface. – Nika 24/6, 2010 at 4:1

What if I am turning the camera 360 degrees to obtain a video?

I think you meant 180 degrees. If you turn both cameras (i.e. the stereo rig) through 180 degrees, then it's fine.

     V        V
    [.]      [.] 

Turn the rig 180 degrees

    [.]      [.] 
     ^        ^

But if both cameras are 180 degrees to each other, and since there's no overlap, there's nothing you can do.

     V 
    [.]

    [.]
     ^

Also, for your question regarding pixel-based vs. feature-based vs. object recognition-based --- what's your final objective?

Cottonwood answered 18/6, 2010 at 13:39 Comment(4)

I think he means "what if I would rotate the cameras and took multiple images from different angles of the same scene" – Rudman 18/6, 2010 at 13:59

That would be the first scenario, which is OK. – Cottonwood 18/6, 2010 at 14:19

:) yup, multiple images. My main aim is to obtain a 3D model without human help. eg. The computer will be clever enough to identify that it's a table and is capable of perceiving its depth. It's more of like the surrounding environment rather than solely on an object. – Nika 20/6, 2010 at 7:3

That's probably too much asked from a computer, knowing "this is a table, this is a chair" in uncontrolled environment is hard. – Tiffanytiffi 21/6, 2010 at 10:58

Is there a best method?

The best method is to make the model yourself. Requires few weeks of training with blender. With several high-resolution cameras you can make a fairly decent result very quickly. You'll do better job than a computer.

Are there any other ways to do such reconstruction?

Laser scanning. Google for "homemade laser scanner" or "homemade 3d scanner" . Several people tried to develop such systems with various success. You'll need a line laser (can make one from laser pointer). But you won't get color information this way - only relief.

What if I am turning the camera 360 degrees to obtain a video?

You cannot obtain depth information from only one camera even if you rotate it. You need 2 or more overlapping shots taken from different points. Or you could try putting object on turntable (although because you're making a room, it isn't possible).

Oleograph answered 18/6, 2010 at 13:55 Comment(4)

:)hmm...but why can't I get depth from one camera since the images obtain will overlap each other – Nika 20/6, 2010 at 7:0

@yasumi: Because they all are taken from a single point. To find distance to the object, you need at least two points of reference. This is geometry - to find sides of triangle (distance to object) you need to know at least length of one side (distance between two cameras), and two angles (angle between camera's line of sight and line towards object, for each camera) – Oleograph 20/6, 2010 at 8:16

actually, it is possible to get scene reconstruction from one moving camera, google monocular reconstruction/slam (there is a paper on CVPR 2010 by Newcombe&Davison). though, it is current research topic and not yet practical. Use laser scanner :) – Unpaidfor 23/7, 2010 at 19:21

@Cfr: It is obvious that you can reconstruct if camera is moving. The OP was talking about rotating it 360 degrees. Which isn't the same thing.... – Oleograph 23/7, 2010 at 19:36

Recommended topics

Hot tags