CUDA shared memory occupancy

Yes, that is correct.

Shared memory must support the "footprint" of all "resident" threadblocks. In order for a threadblock to be launched on a SM, there must be enough shared memory to support it. If not, it will wait until the presently executing threadblock has finished.

There is some nuance to this arriving with Maxwell GPUs (cc 5.0, 5.2). These GPUs support either 64KB (cc 5.0) or 96KB (cc 5.2) of shared memory. In this case, the maximum shared memory available to a single threadblock is still limited to 48KB, but multiple threadblocks may use more than 48KB in aggregate, on a single SM. This means a cc 5.2 SM could support 2 threadblocks, even if both were using 32KB shared memory.

Recommended topics

Hot tags