I need to partition a large set of 3D points (using C++). The points are stored on the HDD as binary float array, and the files are usually larger than 10GB. I need to divide the set into smaller subsets that have a size less than 1GB. The points in the subset should still have the same neighborhood because I need to perform certain algorithms on the data (e.g., object detection).
I thought I could use a KD-Tree. But how can I construct the KD-Tree efficiently if I can't load all the points into RAM? Maybe I could map the file as virtual memory. Then I could save a pointer to each 3D point that belongs to a segment and store it in a node of the KD-Tree. Would that work? Any other ideas?
Thank you for your help. I hope you can unterstand the problem :D
I would like to do that parallel
take a look at OpenMP then, it allows shared memory multiprocessing. – Priming