I have a code where I use the pcl/gpu namespace:
pcl::gpu::Octree::PointCloud clusterCloud;
clusterCloud.upload(cloud_filtered->points);
pcl::gpu::Octree::Ptr octree_device (new pcl::gpu::Octree);
octree_device->setCloud(clusterCloud);
octree_device->build();
/*tree->setCloud (clusterCloud);*/
// Create the cluster extractor object for the planar model and set all the parameters
std::vector<pcl::PointIndices> cluster_indices;
pcl::gpu::EuclideanClusterExtraction ec;
ec.setClusterTolerance (0.1);
ec.setMinClusterSize (2000);
ec.setMaxClusterSize (250000);
ec.setSearchMethod (octree_device);
ec.setHostCloud (cloud_filtered);
ec.extract (cluster_indices);
I have installed CUDA and included the needed pcl/gpu ".hpp"s to do this. It compiles (I have a catkin workspace with ROS) and when I do run it works really slow. I used nvidia-smi and my code is only running in the CPU, and I don't know why and how to solve it.
This code is an implementation of the gpu/segmentation example here: pcl/seg.cpp