Why is my pcl cuda code running in CPU instead of GPU?
Asked Answered
C

1

1

I have a code where I use the pcl/gpu namespace:

pcl::gpu::Octree::PointCloud clusterCloud;
clusterCloud.upload(cloud_filtered->points);

pcl::gpu::Octree::Ptr octree_device (new pcl::gpu::Octree);
octree_device->setCloud(clusterCloud);
octree_device->build();

/*tree->setCloud (clusterCloud);*/

// Create the cluster extractor object for the planar model and set all the parameters
std::vector<pcl::PointIndices> cluster_indices;
pcl::gpu::EuclideanClusterExtraction ec;
ec.setClusterTolerance (0.1);
ec.setMinClusterSize (2000);
ec.setMaxClusterSize (250000);
ec.setSearchMethod (octree_device);
ec.setHostCloud (cloud_filtered);

ec.extract (cluster_indices);

I have installed CUDA and included the needed pcl/gpu ".hpp"s to do this. It compiles (I have a catkin workspace with ROS) and when I do run it works really slow. I used nvidia-smi and my code is only running in the CPU, and I don't know why and how to solve it.

This code is an implementation of the gpu/segmentation example here: pcl/seg.cpp

Cultus answered 15/2, 2019 at 8:14 Comment(1)
Sorry, I'm a beginner so I don't know anything about host-side or device-side. Do you mean that std is not a gpu variable? Anyway the example does exactly the same, so I suppose that it should work like that. Maybe my configuration is bad or something, or I could have forgotten something in my CMakeList, but I don't find what's the problem.Cultus
M
1

(Making this an answer since it's too long for a comment.)

I don't know pcl, but maybe it's because you pass a host-side std::vector rather than data that's on the device side.

... what is "host side" and "device side", you ask? And what's std?

Well, std is just a namespace used by the C++ standard library. std::vector is a (templated) class in the C++ standard library, which dynamically allocates memory for the elements you put in it.

The thing is, the memory std::vector uses is your main system memory (RAM) which doesn't have anything to do with the GPU. But it's likely that your pcl library requires that you pass data that's in GPU memory - which can't be the data in an std::vector. You would need to allocate device-side memory and copy your data there from the host side memory.

See also:

Why we do not have access to device memory on host side?

and consult the CUDA programming guide regarding how to perform this allocation and copying (at least, how to perform it at the lowest possible level; your "pcl" may have its own facilities for this.)

Miltonmilty answered 15/2, 2019 at 10:36 Comment(2)
I'm not sure if that is the problem, because the pcl::gpu function has the std::vector parameter as input, this is the declaration: void pcl::gpu::EuclideanClusterExtraction::extract (std::vector<pcl::PointIndices> &clusters) And can the use of std::vector force my code to use only CPU? I suppose that it would stop the compilation or execution but the problem is that my program is being executed only with the CPU.Cultus
I have seen that the function EuclideanClusterExtraction creates "DeviceArrays" to copy the info to the device, like you said, so I think that's not the problem. Then this function processes this info, etc., and it should be processed by the GPU, but I don't know why it never uses it, always the CPU. I have two graphic cards so I tried to change between them, but nothing change, so I think that it is simply not using any GPU at all. Why could it happen?Cultus

© 2022 - 2024 — McMap. All rights reserved.