Cuda virtual class
Asked Answered
D

1

6

I would like to execute some virtual methods in a cuda kernel, but instead of creating the object in the same kernel I would like to create it on the host and copy it to gpu memory.

I am successfully creating objects in a kernel and call a virtual method. The problem arises when copying the object. This makes sense because obviously the virtual function pointer is bogus. What happens is simply "Cuda grid launch failed", at least this is what Nsight says. But when having a look at the SASS it crashes on the dereferencing of the virtual function pointer, which makes sense.

I am of course using Cuda 4.2 as well as compiling with "compute_30" on a fitting card.

So what is the recommended way to go? Or is this feature simply not supported?

I had the idea to run a different kernel first which creates dummy objects and extract the virtual function pointer to "patch" my objects before copying them. Sadly this is not really working (haven't figured it out yet) as well as it would be an ugly solution.

P.S. This is actually a rerun of this question, which sadly was never fully answered.

Edit :

So I found a way to do what I wanted. But just to be clear : This is not at all an answer or solution, the answer was already provided, this is only a hack, just for fun.

So first lets see what Cuda is doing when calling a virtual method, below is debug SASS

//R0 is the address of our object
LD.CG R0, [R0];  
IADD R0, R0, 0x4;  
NOP;  
MOV R0, R0;  
LD.CG R0, [R0];
...
IADD R0, RZ, R9;  
MOV R0, R0;  
LDC R0, c[0x2][R0];
...
BRX R0 - 0x5478

So assuming that "c[0x2][INDEX]" is constant for all kernels we can just get the index for a class by just running a kernel and doing this, where obj is a newly created object of the class looking at:

unsigned int index = *(unsigned int*)(*(unsigned int*)obj + 4);

Then use something like this :

struct entry
{
    unsigned int vfptr;// := &vfref, thats our value to store in an object
    int dummy;// := 1234, great for debugging
    unsigned int vfref;// := &dummy
    unsigned int index;
    char ClassName[256];//use it as a key for a dict
};

Store this in host aswell as device memory(the memory locations are device ones) and on the host you can use the ClassName as a lookup for an object to "patch".

But again : I would not use this in anything serious, because performance wise, virtual functions are not great at all.

Disfigure answered 3/10, 2012 at 2:32 Comment(1)
This is an exact duplicate, but I voted to close the other question because the asker of the other one has not been on SO since he asked the question. If you are reading this and can vote to close, please vote to close the other question.Gandhi
G
6

What you are trying to do is not supported, currently, by the CUDA compiler and runtime (as of CUDA 5.0). Section D.2.6.3 of the CUDA C Programming Guide v5.0 reads:

D.2.6.3 Virtual Functions

When a function in a derived class overrides a virtual function in a base class, the execution space qualifiers (i.e., __host__, __device__) on the overridden and overriding functions must match.

It is not allowed to pass as an argument to a __global__ function an object of a class with virtual functions.

The virtual function table is placed in global or constant memory by the compiler.

What I recommend is that you encapsulate the data of your class separately from the functionality of the class. For example, store the data in a struct. If you plan to operate on arrays of these objects, store the data in a structure of arrays (for performance -- outside the scope of this question). Allocate the data structures on the host using cudaMalloc, and then pass the data to the kernel as arguments, rather than passing the class with virtual methods.

Then construct your objects with virtual methods on the device. The constructor of your class with virtual methods would take the device pointer kernel parameters as arguments. The virtual device methods could then operate on the device data.

The same approach would work to enable allocating the data in one kernel on the device, and accessing it in another kernel on the device (since again, classes with virtual functions can't be parameters to the kernels).

Gandhi answered 3/10, 2012 at 7:46 Comment(3)
I have accepted this answer because the Programming Guide is clearly stating it. I see one problem with your workaround : Creating the device objects from the struct would require to store type information in the struct aswell as a big switch statement which I think is not that nice. But nonetheless I will try it out, thank you!Disfigure
Yes, I was assuming that the data would be the same between different subclasses, but that's obviously not the general case. Can you accomplish what you need with templates rather than virtual functions? e.g. policy classes? Virtual functions are probably not the best in performance-sensitive code anyway (and usually CUDA is used for performance-sensitive code).Gandhi
Well I had never even heard of policy based design, sounds pretty smart. But I am not able to us it cause I don't know what I am dealing with at compile time. You are absolutley right about the performance aspect though I am only doing this for fun, trying to figure out what kind of impact it will have.Disfigure

© 2022 - 2024 — McMap. All rights reserved.