PCL kd-tree implementation extremely slow
Asked Answered
L

1

7

I am using Point Cloud Library (PCL) based C++ implementation of kd-tree nearest neighbour(NN) searching. The data set contains about 2.2 million points. I am searching NN points for every other point. The search radius is set at 2.0. To fully compute that, it takes about 12 hours! I am using windows 64 bit machine with 4GB RAM. Is it very common for kd-tree searching? I wonder if there is any other c++ library for 3D point cloud data, which is faster. I heard about ANN C++ library & CGAL, but not sure how fast these are.

Litalitany answered 22/8, 2014 at 18:24 Comment(4)
I'll give a +1 since the problem is difficult. However, consider testing yourself next time and ask about the time results.Angevin
You mentioned that your search radius is set at 2.0. How is your data distributed (e.g. what range does it cover in the XYZ directions)? If a lot of points fall into the search radius, the search can become really slow.Napoleon
Using a search radius of 2.0, approximate number of points is about 7-80. But the data is not uniformly distributed. The NN number varies from 10-150. I also think that the problem is not with the dimension, but with the distribution of the data.Litalitany
Are you going to say something about the answer?Angevin
A
7

In short:

You can only be sure if you run yourself the time measurements.

You should ensure if the NN library is faster over brute force, which probably is the case for your data.

As anderas mentioned in the comment, the search radius plays also a significant role. If a lot of points fall into the search radius, the search can become really slow.

Full answer:

3 dimensions are not much. The problem occurs due to the number of points you have.

ANN stands for approximate nearest neighbour searching. It is very common to accept the trade-off between accuracy and speed when it comes to NNS (Nearest Neighbour search).

This means that you perform the search faster, but you may not find the exact NN, but a close one (for example not the closest point, but the second closest one and so on). More speed means less accuracy and vice versa.

CGAL has also a parameter epsilon, which stands for accuracy (ε = 0 means full accuracy). CGAL is meant to go till 3 dimensions, so you could give it a shot.

I could just test myself and post the answers. However, this would not be 100% safe, since the points you have may have some relation. It is very important for the performance of the library, if it is going to exploit the relation the points (may) have to each other.

Another factor to take into account, easiness of installation.

CGAL is hard to install. When I did it, I followed these steps. ANN is easy to install. I would also suggest you to take a look at BOOST Geometry, which may come in handy.

FLANN is also a strong player in the field, but I would not suggest it, since it's meant to handle datasets of much bigger dimensions (like 128 for example).

....

OK I admit it, I am now curious myself and I am going to run some tests!

....

ANN

(I am not posting the code, so that the answer is not getting too big. There are examples in the documentation and you can ask of course if you want!). Output:

// for a Klein bottle

samaras@samaras-A15:~/Inria/nn_libs/ANN/ann_1.1.2/code$ g++ main.cpp -O3 -I ../include/ -L../lib -lANN -std=c++0x -o myExe
samaras@samaras-A15:~/Inria/nn_libs/ANN/ann_1.1.2/code$ ./myExe
N = 1000000
D = 3
ε = 0 // full accuracy
Misses: 0
creation of the tree took 1.06638 seconds.
Brute force for min: 0.009985598 seconds.
NN for 0.0           0.000078551 seconds.
Speedup of ANN for NN = 8.721533780

For ε = 0.1 I got:

Misses: 1000
creation of the tree took 0.727613 seconds.
Brute force for min: 0.006351318 seconds.
NN for 0.0           0.000004260 seconds.
Speedup of ANN for NN = 8.678169573

// for a shpere

ε = 0
Misses: 0
creation of the tree took 1.28098 seconds.
Brute force for min: 0.006382912 seconds.
NN for 0.0           0.000022341 seconds.
Speedup of ANN for NN = 4.897436311

ε = 0.1
Misses: 1000
creation of the tree took 1.25572 seconds.
Brute force for min: 0.006482465 seconds.
NN for 0.0           0.000004379 seconds.
Speedup of ANN for NN = 5.144413371

Notice the difference in the speedup! This has to do with the nature of the datasets, as explained above (the relation the points have).

CGAL:

// Klein bottle

samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.02478 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 471492
 Tree depth: 34
0x80591e4
Brute force for min: 0.007979495 seconds.
NN for 0.0           0.008085704 seconds.
Speedup of cgal for NN = 0.983849423

ε = 0.1
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.02628 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 471492
 Tree depth: 34
0x80591e4
Brute force for min: 0.007852951 seconds.
NN for 0.0           0.007856228 seconds.
Speedup of cgal for NN = 0.996250305

// Sphere

samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.025502 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 465002
 Tree depth: 32
0x80591e4
Brute force for min: 0.007946504 seconds.
NN for 0.0           0.007981456 seconds.
Speedup of cgal for NN = 0.992449817


samaras@samaras-A15:~/code/NN$ ./myExe
ε = 0.1
SplitingRule = Sliding_midpoint, N = 1000000, D = 3
creation of the tree took 0.025106 seconds.
Tree statistics:
Number of items stored: 1000000
Number of nodes: 465002
 Tree depth: 32
0x80591e4
Brute force for min: 0.008115519 seconds.
NN for 0.0           0.008117261 seconds.
Speedup of cgal for NN = 0.996702679

ANN is clearer faster than CGAL for my tests, probably it is for yours too!


A side-note:

You actually asking for the NN of every point. However, the library doesn't know that and doesn't take into account the previous work done for every point, which is a pity. However, I am not aware of any library that does this.

Angevin answered 23/8, 2014 at 22:23 Comment(6)
PCL is using FLANN internally. Have you also done a benchmark with FLANN? I'm pretty familiar with the FLANN wrapper of PCL, and in my experience it is usually not slower than ANN (or even faster) in the case of low-dimensional data.Napoleon
No I didn't run FLANN @anderas, because the OP doesn't mention it and mostly, because FLANN is intended for much higher dimensions, while OP's dataset lies in 3 dimensions.Angevin
I dont understand your result, did they actually showed that CGAL's tree is slower than just brute force search? Thats impossible something has to be wrong.Invitation
@AleksanderFular when the dimension is high, CGAL is known to perform poor with Nearest Neighbors.Angevin
@Angevin I was not aware of that, anyway Werent your tests performed on D=3 ? Thats what I got from your responseInvitation
@AleksanderFular sorry, you are right! Probably this happens because of the nature of the input.Angevin

© 2022 - 2024 — McMap. All rights reserved.