So I want to allocate an object with virtual functions on the device, then call a kernel and execute some of those virtual functions. I have tried two ways to do this but neither work:
1) Allocate and copy the object from the host using cudaMalloc and cudaMemcpy. This copies over the virtual function table that contains host memory pointers which obviously crash the kernel when executing on the device.
2) Allocate the object from a second kernel, save the device memory pointer to the object and pass that pointer to the original kernel. However, since the kernels are different, the functions are not in the same places in device memory upon kernel execution and the virtual function table is incorrect and crashes the kernel when used.
Can I only use virtual functions with objects created in the kernel the functions are called from?
Can I somehow reference the original kernel when I allocate my objects to get the virtual function table right?
Do I even understand what the actual problem is here?