Why does my kernel's shared memory seems to be initialized to zero?
Asked Answered
I

1

5

As was mentioned in this Shared Memory Array Default Value question, shared memory is non-initialized, i.e. can contain any value.

#include <stdio.h>

#define BLOCK_SIZE 512

__global__ void scan(float *input, float *output, int len) {
    __shared__ int data[BLOCK_SIZE];

    // DEBUG
    if (threadIdx.x == 0 && blockIdx.x == 0)
    {
        printf("Block Number: %d\n", blockIdx.x);
        for (int i = 0; i < BLOCK_SIZE; ++i)
        {
            printf("DATA[%d] = %d\n", i, data[i]);
        }
    }
    
}

int main(int argc, char ** argv) {
    dim3 block(BLOCK_SIZE, 1, 1);
    dim3 grid(10, 1, 1);
    scan<<<grid,block>>>(NULL, NULL, NULL);
    cudaDeviceSynchronize();
    return 0;
}

But why in this code it is not true and I'm constantly getting zeroed shared memory?

DATA[0] = 0
DATA[1] = 0
DATA[2] = 0
DATA[3] = 0
DATA[4] = 0
DATA[5] = 0
DATA[6] = 0
...

I tested with Release and Debug Mode: -O3 -arch=sm_20, -O3 -arch=sm_30 and -arch=sm_30. The result is always the same.

Implied answered 4/3, 2014 at 13:10 Comment(6)
Did you tested it under release and debug mode? In some projects I had observe that in debug mode shared memory was initialized to 0, but not in release mode and not general in all projects. This isn't a defined behaviour as @CygnusX1 answered in your linked question. You have to initialize shared memory on your own!Nutrilite
If it can contain any value than it can contain zeros, no? System may still need to reinitialize memory sometimes to prevent information leaking between processes (security).Loraleeloralie
Yes, I tested. With "-arch=sm_30" and "-O3 -arch=sm_30" options, also with "-arch=sm_20". The result is the same - zeroed shared memory.Implied
Yes, it can contain and zero too, but the strange is that no any other values in shared memory, means that it is specially zeroed.Implied
Zero is within the subset of "any value".Malliemallin
If you launch more than one wave of blocks you will likely see non-zero values in the second wave. On context switch the shared memory is reset to zero.Bernitabernj
L
15

tl;dr: shared memory is not initialized to 0

I think your conjecture of shared memory initialized to 0 is questionable. Try the following code, which is a slight modification of yours. Here, I'm calling the kernel twice and altering the values of the data array. The first time the kernel is launched, the "uninitialized" values of data will be all 0's. The second time the kernel is launched, the "uninitialized" values of data will be all different from 0's.

I think this depends on the fact that shared memory is SRAM, which exhibits data remanence.

#include <stdio.h>

#define BLOCK_SIZE 32

__global__ void scan(float *input, float *output, int len) {

    __shared__ int data[BLOCK_SIZE];

    if (threadIdx.x == 0 && blockIdx.x == 0)
    {
        for (int i = 0; i < BLOCK_SIZE; ++i)
        {
            printf("DATA[%d] = %d\n", i, data[i]);
            data[i] = i;
        }

    }
}

int main(int argc, char ** argv) {
    dim3 block(BLOCK_SIZE, 1, 1);
    dim3 grid(10, 1, 1);
    scan<<<grid,block>>>(NULL, NULL, NULL);
    scan<<<grid,block>>>(NULL, NULL, NULL);
    cudaDeviceSynchronize();
    getchar();
    return 0;
}
Larios answered 4/3, 2014 at 14:20 Comment(1)
Yes, you are absolutely right! Thanks for this good explanation, now it is clear why we should initialize shared memory manually!Implied

© 2022 - 2024 — McMap. All rights reserved.