How to define a CUDA shared memory with a size known at run time?

Asked 30/3, 2012 at 2:51 Answered 18/10, 2016 at 18:20

The __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ memory size is only know at run time, i.e.

int size=get_size();
__shared__ mem[size];

This will end up with "error: constant value is not known", and I'm not sure how to get around this problem.

Nude answered 30/3, 2012 at 2:51 Comment(1)

possible duplicate of allocating shared memory – Armoured 30/3, 2012 at 5:35

The purpose of shared memory is to allow the threads in a block to collaborate. When you declare an array as __shared__, each thread in the block sees the same memory, so it would not make sense for a given thread to be able to set its own size for an array in shared memory.

However, the special case of dynamically specifying the size of a single __shared__ array that is the same size for all threads IS supported. See allocating shared memory.

If you do need to dynamically allocate memory for each thread, you can use new or malloc inside a kernel (on Fermi), but they allocate global memory, which is likely to be slow.

Platonism answered 30/3, 2012 at 3:31 Comment(4)

Actually I was trying to make blockDim.x as the shared memory size, which will be the same for all the threads in the same block, but it still failed (with a different error though) – Nude 30/3, 2012 at 3:50

@HailiangZhang: You may not want to plan on solving your problem by dynamically varying the block dimensions. Typically, you would get best performance by carefully considering the resource use of your kernel and setting a fixed, optimal block dimension based on that. For instance by using the CUDA Occupancy Calculator spreadsheet. Also, the dimensions should multiply up to a multiple of the warp size for best performance. You would only adjust the grid dimensions dynamically, to fit your data. – Platonism 30/3, 2012 at 4:36

@RogerDahl: This answer is incorrect. You can determine kernel shared memory dynamically at run time - this has been a feature of CUDA since 1.0. See this answer for how. – Armoured 30/3, 2012 at 5:36

@talonmies: Ah, so that's what that mysterious 3rd argument is for! Live and learn... Thank you. I have fixed the answer. – Platonism 30/3, 2012 at 6:5

You should use extern__shared__ mem[];(Dynamic Shared Memory) instead of __shared__ mem[size];(Static Shared Memory). see [https://devblogs.nvidia.com/parallelforall/using-shared-memory-cuda-cc/][1]

Borstal answered 18/10, 2016 at 18:20 Comment(0)

Recommended topics

Hot tags