3D reconstruction -- How to create 3D model from 2D image?
Asked Answered
A

6

47

If I take a picture with a camera, so I know the distance from the camera to the object, such as a scale model of a house, I would like to turn this into a 3D model that I can maneuver around so I can comment on different parts of the house.

If I sit down and think about taking more than one picture, labeling direction, and distance, I should be able to figure out how to do this, but, I thought I would ask if someone has some paper that may help explain more.

What language you explain in doesn't matter, as I am looking for the best approach.

Right now I am considering showing the house, then the user can put in some assistance for height, such as distance from the camera to the top of that part of the model, and given enough of this it would be possible to start calculating heights for the rest, especially if there is a top-down image, then pictures from angles on the four sides, to calculate relative heights.

Then I expect that parts will also need to differ in color to help separate out the various parts of the model.

Audet answered 9/10, 2011 at 17:55 Comment(4)
supervised + unsupervised learning demo: youtube.com/watch?v=UzxYlbK2c7E#t=54m50sFlan
Some other FUN links: make3d.cs.cornell.edu insight3d.sourceforge.netMelda
The question sounds like the desire is 3D reconstruction from a single image, which I don't think is very feasible. Stereo reconstruction (using multiple viewpoints) or structure-from-motion both require at least two images.Exuberate
@AdrianMcCarthy Although it is not feasible for a single image, there are programs like MiDaS that do "monocular depth estimation" from only one photo.Penelope
I
20

Research has made significant progress and these days it is possible to obtain pretty good-looking 3D shapes from 2D images. For instance, in our recent research work titled "Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks" took a big step in solving the problem of obtaining 3D shapes from 2D images. In our work, we show that you can not only go from 2D to 3D directly and get a good, approximate 3D reconstruction but you can also learn a distribution of 3D shapes in an efficient manner and generate/synthesize 3D shapes. Below is an image of our work showing that we are able to do 3D reconstruction even from a single silhouette or depth map (on the left). The ground-truth 3D shapes are shown on the right.

enter image description here

The approach we took has some contributions related to cognitive science or the way the brain works: the model we built shares parameters for all shape categories instead of being specific to only one category. Also, it obtains consistent representations and takes the uncertainty of the input view into account when producing a 3D shape as output. Therefore, it is able to naturally give meaningful results even for very ambiguous inputs. If you look at the citation to our paper you can see even more progress just in terms of going from 2D images to 3D shapes.

Ideology answered 4/5, 2018 at 13:38 Comment(2)
Does this require images of objects with white or no background?Ratan
@Goldname Yes. It requires no background because the data we used had no background.Ideology
U
91

As mentioned, the problem is very hard and is often also referred to as multi-view object reconstruction. It is usually approached by solving the stereo-view reconstruction problem for each pair of consecutive images.

Performing stereo reconstruction requires that pairs of images are taken that have a good amount of visible overlap of physical points. You need to find corresponding points such that you can then use triangulation to find the 3D co-ordinates of the points.

Epipolar geometry

Stereo reconstruction is usually done by first calibrating your camera setup so you can rectify your images using the theory of epipolar geometry. This simplifies finding corresponding points as well as the final triangulation calculations.

If you have:

you can calculate the fundamental and essential matrices using only matrix theory and use these to rectify your images. This requires some theory about co-ordinate projections with homogeneous co-ordinates and also knowledge of the pinhole camera model and camera matrix.

If you want a method that doesn't need the camera parameters and works for unknown camera set-ups you should probably look into methods for uncalibrated stereo reconstruction.

Correspondence problem

Finding corresponding points is the tricky part that requires you to look for points of the same brightness or colour, or to use texture patterns or some other features to identify the same points in pairs of images. Techniques for this either work locally by looking for a best match in a small region around each point, or globally by considering the image as a whole.

If you already have the fundamental matrix, it will allow you to rectify the images such that corresponding points in two images will be constrained to a line (in theory). This helps you to use faster local techniques.

There is currently still no ideal technique to solve the correspondence problem, but possible approaches could fall in these categories:

  • Manual selection: have a person hand-select matching points.
  • Custom markers: place markers or use specific patterns/colours that you can easily identify.
  • Sum of squared differences: take a region around a point and find the closest whole matching region in the other image.
  • Graph cuts: a global optimisation technique based on optimisation using graph theory.

For specific implementations you can use Google Scholar to search through the current literature. Here is one highly cited paper comparing various techniques: A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms.

Multi-view reconstruction

Once you have the corresponding points, you can then use epipolar geometry theory for the triangulation calculations to find the 3D co-ordinates of the points.

This whole stereo reconstruction would then be repeated for each pair of consecutive images (implying that you need an order to the images or at least knowledge of which images have many overlapping points). For each pair you would calculate a different fundamental matrix.

Of course, due to noise or inaccuracies at each of these steps you might want to consider how to solve the problem in a more global manner. For instance, if you have a series of images that are taken around an object and form a loop, this provides extra constraints that can be used to improve the accuracy of earlier steps using something like bundle adjustment.

As you can see, both stereo and multi-view reconstruction are far from solved problems and are still actively researched. The less you want to do in an automated manner the more well-defined the problem becomes, but even in these cases quite a bit of theory is required to get started.

Alternatives

If it's within the constraints of what you want to do, I would recommend considering dedicated hardware sensors (such as the XBox's Kinect) instead of only using normal cameras. These sensors use structured light, time-of-flight or some other range imaging technique to generate a depth image which they can also combine with colour data from their own cameras. They practically solve the single-view reconstruction problem for you and often include libraries and tools for stitching/combining multiple views.

Epipolar geometry references

My knowledge is actually quite thin on most of the theory, so the best I can do is to further provide you with some references that are hopefully useful (in order of relevance):

I'm not sure how helpful all of this is, but hopefully it includes enough useful terminology and references to find further resources.

Unfinished answered 3/6, 2012 at 13:13 Comment(3)
I think the references will be a huge help. My son likes to build with Legos and it would be nice if he could take pictures and then rotate his design to explain what different parts are for.Audet
For Legos I believe it should be possible to develop a solution that does everything and creates a complete 3D model of the outermost visible layer of your assembly. Legos have the benefit of being a small set of blocks with known shapes. It'd probably make for a nice Ph.D. dissertation :)Caravel
The previous comment has the seeds of your solution if your goal is generating 3D models of 2D lego pictures - one thing to consider that would vastly simplify your algorithmic misery is to restrict the lego construction such that each particular block shape has a unique color - then your system can recover data lost due to the projection that formed the image easily - otherwise, it is still a PhD level effort and a lot of heuristicsInflammation
I
20

Research has made significant progress and these days it is possible to obtain pretty good-looking 3D shapes from 2D images. For instance, in our recent research work titled "Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks" took a big step in solving the problem of obtaining 3D shapes from 2D images. In our work, we show that you can not only go from 2D to 3D directly and get a good, approximate 3D reconstruction but you can also learn a distribution of 3D shapes in an efficient manner and generate/synthesize 3D shapes. Below is an image of our work showing that we are able to do 3D reconstruction even from a single silhouette or depth map (on the left). The ground-truth 3D shapes are shown on the right.

enter image description here

The approach we took has some contributions related to cognitive science or the way the brain works: the model we built shares parameters for all shape categories instead of being specific to only one category. Also, it obtains consistent representations and takes the uncertainty of the input view into account when producing a 3D shape as output. Therefore, it is able to naturally give meaningful results even for very ambiguous inputs. If you look at the citation to our paper you can see even more progress just in terms of going from 2D images to 3D shapes.

Ideology answered 4/5, 2018 at 13:38 Comment(2)
Does this require images of objects with white or no background?Ratan
@Goldname Yes. It requires no background because the data we used had no background.Ideology
A
16

This problem is known as Photogrammetry.

Google will supply you with endless references, just be aware that if you want to roll your own, it's a very hard problem.

Achlorhydria answered 29/5, 2012 at 15:18 Comment(0)
T
5

Check out The Daedalus Project, although that website does not contain a gallery with illustrative information about the solution, it post several papers and info about the working method.

I watched a lecture from one of the main researchers of the project (Roger Hubbold), and the image results are quite amazing! Although is a complex and long problem. It has a lot of tricky details to take into account to get an approximation of the 3d data, take for example the 3d information from wall surfaces, for which the heuristic to work is as follows: Take a photo with normal illumination of the scene, and then retake the picture in same position with full flash active, then subtract both images and divide the result by a pre-taken flash calibration image, apply a box filter to this new result and then post-process to estimate depth values, the whole process is explained in detail in this paper (which is also posted/referenced in the project website)

Therrien answered 4/6, 2012 at 0:41 Comment(0)
I
4

Google Sketchup (free) has a photo matching tool that allows you to take a photograph and match its perspective for easy modeling.

EDIT: It appears that you're interested in developing your own solution. I thought you were trying to obtain a 3D model of an image in a single instance. If this answer isn't helpful, I apologize.

Immunogenetics answered 21/11, 2011 at 22:9 Comment(0)
L
0

Hope this helps if you are trying to construct 3d volume from 2d stack of images !! You can use open source tool such as ImageJ Fiji which comes with 3d viewer plugin..

https://quppler.com/creating-a-classifier-using-image-j-fiji-for-3d-volume-data-preparation-from-stack-of-images/

Lattermost answered 28/12, 2018 at 5:30 Comment(1)
the link says page does'nt existLandreth

© 2022 - 2024 — McMap. All rights reserved.