Making a trained model (machine learning) from 3D models

Asked 10/7, 2017 at 15:22 Answered 7/11, 2017 at 13:39

machine-learning image-recognition object-recognition coreml

i have a database with almost 20k 3D files, they are drawings from machine parts designed in a CAD software (solid works). Im trying to build a trained model from all of this 3D models, so i can build a 3D object Recognition App when someone can take a picture from one of this parts (in the real world) and the app can provide useful information about material , size , treatment and so on.

If anyone already do something similar, any information you can provide me would be greatly appreciated!

Sidoon answered 10/7, 2017 at 15:22 Comment(7)

What you're asking for is not trivial, to say the least. You might be able to simplify this by treating it as more of a 2-D image classification problem. Perhaps you could script the generation of a bank of 2-D training images from these parts, at various angles. You'd have to make sure these training images are close to what someone would see looking at a part in the real world, under various lighting conditions, so you'd need pretty realistic rendering with a variety of backgrounds. Combinations of materials, treatments, etc. would lead to a large matrix of possible classes. – Chit 10/7, 2017 at 17:18

I can export multiple png files with one script i already have but i haven't thought i needed to add to this images realistic backgrounds and lighting to make it more succeful, you make a pretty good point. Thanks a lot. – Sidoon 11/7, 2017 at 21:45

Why can't it be possible to make the machine's learning algorithm to create various ambient lighting renders by itself and train on it and it feels so obvious why lot of people didn't thought of it – Florist 23/8, 2017 at 10:48

@PabloL did you find a solution to this? Or at least some direction? – Loire 31/10, 2017 at 16:2

Have you researched any possible ways this could be done? There are plenty of recognition algorithms, but without knowing what you have already looked at your question is incredibly broad. – Psycholinguistics 7/11, 2017 at 11:43

What you are trying to do is a quite complex endeavor. Since what you are trying to do is to basically match a single picture to a 3D object, my first shot would be to generate several images of each model, from several different views (lets say 20 views per object). Then, I would try to match the pictures to any of the views. I guess that a Bag-of-Words might be useful, for this. – Induration 7/11, 2017 at 12:38

@PabloL did you get any success in this? – Oryx 2/11, 2020 at 15:28

Some ideas:

1) Several pictures: instead of only one. As Rodrigo commented and Brad Larson tried to circumvent with his method, the problem with the user taking only one picture for the input is that you are necessarily lacking information to make a triangulation and form a point cloud in 3D. With 4 pictures taken from a slightly different angle, you can already reconstruct parts of the object. Comparing point clouds would make the endeavor much easier for any ML algorithm, Neuronal Networks (NN), Support Vector Machine (SVM) or others. A common standard to create point clouds is ASTM E2807, which uses the e57 file format.

On the downside a 3D vision algorithm might be heavy on the user's device, and is not the easiest to implement.

2) Artificial picture training: By training on pre-computed artificial pictures like Brad Larson suggested, you take over much of the computation, to the user's benefit. Be aware that you should probably use "features" extracted from the pictures, not the complete picture, both to train and to classify. The problem with this method is that you might be very sensitive to lighting and background context. You should take care to produce CAD pictures that have the same lightning conditions for all objects, so that the classifier doesn't overfit certain aspects of the "pictures" that do not belong to the object.

This aspect is where solution 1) is much more stable, it is less sensitive to the visual context.

3) Scale: The size of your object is an important descriptor. You should thus add scale information to your object descriptor before training. You could ask the user to take pictures with a reference object. Alternatively you can ask the user to make a rule-of-thumb estimate of the object size ("What are the approximate dimensions of the object, in [cm]?"). Providing size could make your algorithm significantly faster and more accurate.

Cumber answered 7/11, 2017 at 13:39 Comment(0)

If your test data in production is mainly images of the 3D object, then the method in the comment section by Brad Larson is the better approach and it is also easier to implement and takes a lot less effort and resources to get it up and running.

However if you want to classify between 3D models there are existing networks which exist to classify 3D point clouds. You will have to convert these models to point clouds and use them as training samples. One of those and which I have used is Voxnet. I also suggest you to add more variations to the training data like different rotations of the 3D model.

Stoic answered 1/11, 2017 at 4:40 Comment(2)

my test data is a 3D model of type step/iges/parasolid; what I would like to achieve is learn my model with such a file and then recognize real world objects (those are "fixed" objects like bottles etc., NOT dogs/cats/...) – Loire 6/11, 2017 at 7:22

@Loire You will need to convert those files to [XYZ format ] which is basically a simple csv type representation of the vertices. Once this is done , you can train the models using Voxnet (in answer) – Stoic 7/11, 2017 at 10:48

You can used Pre-Trained 3D Deep Neural Networks as there are many networks that could help you in your work and would produce high accuracy.

Alula answered 3/11, 2017 at 10:56 Comment(0)

Recommended topics

Hot tags