Partition large amount of 3D point data
Asked Answered
P

1

6

I need to partition a large set of 3D points (using C++). The points are stored on the HDD as binary float array, and the files are usually larger than 10GB. I need to divide the set into smaller subsets that have a size less than 1GB. The points in the subset should still have the same neighborhood because I need to perform certain algorithms on the data (e.g., object detection).

I thought I could use a KD-Tree. But how can I construct the KD-Tree efficiently if I can't load all the points into RAM? Maybe I could map the file as virtual memory. Then I could save a pointer to each 3D point that belongs to a segment and store it in a node of the KD-Tree. Would that work? Any other ideas?

Thank you for your help. I hope you can unterstand the problem :D

Ploughboy answered 22/6, 2015 at 17:26 Comment(8)
break your task into chunks (say a million points per chunk), partition that and write it to files, repeat and append with the rest of the chunks - assuming your data is randomly distributed you should get a good initial guess of the splitting plane - otherwise you'll need to initially sample a representative amount to get the same behaviourTaite
Static structures like grids or octrees are simpler to implement. Furthermore, you could pursue a streaming-based approach where you sort the points along a certain direction and process them in this order. Neighboring points will then be very close in the stream.Dotdotage
You may partition your points into overlapping (!) neighborhoods small enough for your computation.Coff
Do you need all the points? What if you preprocess the object first by reducing the number of points, as suggested above, build a grid and average all the points that are part of the same cell?Priming
To get a better idea of what i am doing: I have large 3D data taken from a airplane. I am planing on detecting objects in this point cloud. But it takes a long time to compute everything, so I would like to do that parallel. For that I need to partition the pointcloud to segments of a similar size. @AlexandruBarbarosie the datasets I work with are already reduced. If I reduce them even more the level of detail won't suffice to perform the tasks.Ploughboy
Are there any assumptions you can make about the range and distribution of the data? Or do you need to discover that? In other words can you say, "I know it fits in this size box, and I'm going to save things into the following overlapping boxes." Then proceed from there.Creosol
Can you sort the boxes along one axis (say the centerline of the airplane) in a preprocessing step? So for a k-d tree, you could choose the first 4 levels of splitting planes along this axis, and then descend on each of the 16 subsets separately? You can do an out-of-core sort by by first sorting 1 GB chunks by their X-coords (or Y or Z) , and then using a heap to merge the chunks.Aude
I would like to do that parallel take a look at OpenMP then, it allows shared memory multiprocessing.Priming
Y
1

You basically need an out-of-core algorithm for computing (approximate) medians. Given a large file, find its median and then partition it into two smaller files. A k-d tree is the result of applying this process recursively along varying dimensions (and when the smaller files start to fit in memory, you don't have to bother with the out-of-core algorithm any more).

To approximate the median of a large file, use reservoir sampling to grab a large but in-memory sample, then run an in-core median finding algorithm. Alternatively, for an exact median, compute the (e.g.) approximate 45th and 55th percentiles, then make another pass to extract the data points between them and compute the median exactly (unless the sample was unusually non-random, in which case retry). Details are in the Motwani--Raghavan book on randomized algorithms.

Yolandoyolane answered 27/6, 2015 at 15:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.