I would like to execute some virtual methods in a cuda kernel, but instead of creating the object in the same kernel I would like to create it on the host and copy it to gpu memory.
I am successfully creating objects in a kernel and call a virtual method. The problem arises when copying the object. This makes sense because obviously the virtual function pointer is bogus. What happens is simply "Cuda grid launch failed", at least this is what Nsight says. But when having a look at the SASS it crashes on the dereferencing of the virtual function pointer, which makes sense.
I am of course using Cuda 4.2 as well as compiling with "compute_30" on a fitting card.
So what is the recommended way to go? Or is this feature simply not supported?
I had the idea to run a different kernel first which creates dummy objects and extract the virtual function pointer to "patch" my objects before copying them. Sadly this is not really working (haven't figured it out yet) as well as it would be an ugly solution.
P.S. This is actually a rerun of this question, which sadly was never fully answered.
Edit :
So I found a way to do what I wanted. But just to be clear : This is not at all an answer or solution, the answer was already provided, this is only a hack, just for fun.
So first lets see what Cuda is doing when calling a virtual method, below is debug SASS
//R0 is the address of our object
LD.CG R0, [R0];
IADD R0, R0, 0x4;
NOP;
MOV R0, R0;
LD.CG R0, [R0];
...
IADD R0, RZ, R9;
MOV R0, R0;
LDC R0, c[0x2][R0];
...
BRX R0 - 0x5478
So assuming that "c[0x2][INDEX]" is constant for all kernels we can just get the index for a class by just running a kernel and doing this, where obj is a newly created object of the class looking at:
unsigned int index = *(unsigned int*)(*(unsigned int*)obj + 4);
Then use something like this :
struct entry
{
unsigned int vfptr;// := &vfref, thats our value to store in an object
int dummy;// := 1234, great for debugging
unsigned int vfref;// := &dummy
unsigned int index;
char ClassName[256];//use it as a key for a dict
};
Store this in host aswell as device memory(the memory locations are device ones) and on the host you can use the ClassName as a lookup for an object to "patch".
But again : I would not use this in anything serious, because performance wise, virtual functions are not great at all.