Why can't member variables be shared?
Asked Answered
N

2

8

I would like to instantiate a class in CUDA code, that shares some of its members with other threads in the same block.

However, when trying to compile the following code, I get the error:

attribute "shared" does not apply here

(nvcc version 4.2).

class SharedSomething {

public:
    __shared__ int i; // this is not allowed
};

__global__ void run() {

    SharedSomething something;
}

What is the rationale behind that? Is there a work-around to achieve the desired behavior (shared members of a class across one block)?

Nyctophobia answered 3/10, 2012 at 11:23 Comment(0)
P
7

Rost explained the rationale behind the limitation. To answer the second part of the question, a simple workaround is to have the kernel declare the shared memory, and initialize a pointer to it owned by the class, e.g. in the class constructor. Example.

class Foo 
{
public:
  __device__
  Foo(int *sPtr) : sharedPointer(sPtr, gPtr) {
    sharedPointer[threadIdx.x] = gPtr[blockIdx.x * blockDim.x + threadIdx.x];
    __syncthreads();
  }

  __device__
  void useSharedData() { printf("my data: %f\n", sharedPointer[threadIdx.x]); }

private:
  int *sharedPointer;
};

__global__ void example(int *gData) 
{
  __shared__ int sData[BLOCKDIM];

  Foo f(sData, gData);

  f.useSharedData();
}

Caveat: code written in browser, unverified, untested (and it's a trivial example, but the concept extends to real code—I have used this technique myself).

Phox answered 3/10, 2012 at 11:54 Comment(1)
Thanks for the work-around. This can even be made more generic by declaring an inner class Shared in Foo that holds all the shared data. The calling code instantiates a shared Foo::Shared and passes it to the constructor of Foo. This way, the calling code does not have to be changed if Foo::Shared changes.Nyctophobia
J
8

Objects marked as __shared__ reside in shared memory that is dedicated per thread block. It has limited size and has the same lifetime as thread block.

So this is the reason why you cannot declare class members as shared - their lifetime is not managed by class instance, but by thread block. Possibly static class members could be shared, but didn't check it.

See CUDA Programming Guide for details.

Jakoba answered 3/10, 2012 at 11:41 Comment(0)
P
7

Rost explained the rationale behind the limitation. To answer the second part of the question, a simple workaround is to have the kernel declare the shared memory, and initialize a pointer to it owned by the class, e.g. in the class constructor. Example.

class Foo 
{
public:
  __device__
  Foo(int *sPtr) : sharedPointer(sPtr, gPtr) {
    sharedPointer[threadIdx.x] = gPtr[blockIdx.x * blockDim.x + threadIdx.x];
    __syncthreads();
  }

  __device__
  void useSharedData() { printf("my data: %f\n", sharedPointer[threadIdx.x]); }

private:
  int *sharedPointer;
};

__global__ void example(int *gData) 
{
  __shared__ int sData[BLOCKDIM];

  Foo f(sData, gData);

  f.useSharedData();
}

Caveat: code written in browser, unverified, untested (and it's a trivial example, but the concept extends to real code—I have used this technique myself).

Phox answered 3/10, 2012 at 11:54 Comment(1)
Thanks for the work-around. This can even be made more generic by declaring an inner class Shared in Foo that holds all the shared data. The calling code instantiates a shared Foo::Shared and passes it to the constructor of Foo. This way, the calling code does not have to be changed if Foo::Shared changes.Nyctophobia

© 2022 - 2024 — McMap. All rights reserved.