I am trying to allocate shared memory in a CUDA kernel within a templated class:
template<typename T, int Size>
struct SharedArray {
__device__ T* operator()(){
__shared__ T x[Size];
return x;
}
};
This works as long no shared memory with same type and size is retrieved twice. But when I try to get two times shared memory with same type and size, then the second shared memory points to the first one:
__global__
void test() {
// Shared array
SharedArray<int, 5> sharedArray;
int* x0 = sharedArray();
int* y0 = sharedArray();
x0[0] = 1;
y0[0] = 0;
printf("%i %i\n\n", x0[0], y0[0]);
// Prints:
// 0 0
}
One solution is to add an id with each call to the shared memory class like:
template<int ID, typename T, int Size>
struct StaticSharedArrayWithID {
__device__ static T* shared(){
__shared__ T x[Size];
return x;
}
};
But then I have to provide some counter which provides a very ugly user interface:
__global__
void test() {
int& x1 = StaticSharedArrayWithID<__COUNTER__, int, 5>::shared();
int& y1 = StaticSharedArrayWithID<__COUNTER__, int, 5>::shared();
x1[0] = 1;
y1[0] = 0;
printf("%i %i\n\n", x1[0], y1[0]);
// Prints:
// 1 0
}
Does anyone has a idea to get rid of the __COUNTER__
macro in the user interface? It is okay when it is hidden.
bulk::malloc
. It is described in this presentation. – Justinajustine__COUNTER__
internally, is this something you are looking for? – Justinajustine__shared__ int x[5];
. – Harvell__shared__
member variable, which is forbidden in CUDA? (see #12707978 ) – Scoot