gpu-shared-memory

1

Solved

What is warp shuffling in CUDA and why is it useful?

From the CUDA Programming Guide: [Warp shuffle functions] exchange a variable between threads within a warp. I understand that this is an alternative to shared memory, thus it's being used for th...

cuda gpu gpu-shared-memory gpu-warp

Conti asked 27/4, 2023 at 18:54

3

Solved

In CUDA, what instruction is used to load data from global memory to shared memory?

I am currently studying CUDA and learned that there are global memory and shared memory. I have checked the CUDA document and found that GPUs can access shared memory and global memory using ld.sha...

cuda gpu nvidia gpu-shared-memory

Carnay asked 15/11, 2022 at 5:5

2

Solved

When to use volatile with shared CUDA Memory

Under what circumstances should you use the volatile keyword with a CUDA kernel's shared memory? I understand that volatile tells the compiler never to cache any values, but my question is about th...

cuda gpgpu volatile gpu-shared-memory

Marthena asked 11/3, 2013 at 4:2

3

Solved

CUDA: When to use shared memory and when to rely on L1 caching?

After Compute Capability 2.0 (Fermi) was released, I've wondered if there are any use cases left for shared memory. That is, when is it better to use shared memory than just let L1 perform its magi...

caching cuda gpu-shared-memory

Plasmolysis asked 30/6, 2012 at 16:31

1

Solved

Using maximum shared memory in Cuda

I am unable to use more than 48K of shared memory (on V100, Cuda 10.2) I call cudaFuncSetAttribute(my_kernel, cudaFuncAttributePreferredSharedMemoryCarveout, cudaSharedmemCarveoutMaxShared); bef...

cuda gpu-shared-memory

Beria asked 5/9, 2020 at 18:29

1

Solved

Understanding in details the algorithm for inversion of a high number of 3x3 matrixes

I make following this original post : PyCuda code to invert a high number of 3x3 matrixes. The code suggested as an answer is : $ cat t14.py import numpy as np import pycuda.driver as cuda from pyc...

python cuda pycuda gpu-shared-memory

Ovalle asked 26/11, 2019 at 17:17

4

Solved

Negative array indexing in shared memory based 1d stencil CUDA implementation

I'm currently working with CUDA programming and I'm trying to learn off of slides from a workshop I found online, which can be found here. The problem I am having is on slide 48. The following code...

arrays cuda gpu-shared-memory

Neurologist asked 29/10, 2014 at 7:42

2

Solved

Templated CUDA kernel with dynamic shared memory

I want to call different instantiations of a templated CUDA kernel with dynamically allocated shared memory in one program. My first naive approach was to write: template<typename T> __global...

c++cuda gpu-shared-memory

Chari asked 19/12, 2014 at 16:58

5

Solved

allocating shared memory

i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this: __global__ void Kernel(const int count) { __shared__ int a[count]; } and i a...

c++c cuda gpu-shared-memory

Otology asked 3/4, 2011 at 17:34

2

Solved

GPU Shared Memory Bank Conflict

I am trying to understand how bank conflicts take place. I have an array of size 256 in global memory and I have 256 threads in a single block, and I want to copy the array to shared memory. Theref...

c++cuda gpgpu gpu-shared-memory bank-conflict

Weisshorn asked 9/12, 2010 at 8:22

2

Solved

How to define a CUDA shared memory with a size known at run time?

The __shared__ memory in CUDA seems to require a known size at compile time. However, in my problem, the __shared__ memory size is only know at run time, i.e. int size=get_size(); __shared__ mem[s...

cuda gpu-shared-memory

Nude asked 30/3, 2012 at 2:51

1

Solved

CUDA shared memory wrapped in templated class, points to same memory

I am trying to allocate shared memory in a CUDA kernel within a templated class: template<typename T, int Size> struct SharedArray { __device__ T* operator()(){ __shared__ T x[Size]; ret...

c++templates cuda gpu-shared-memory

Trondheim asked 26/10, 2015 at 12:13

1

Solved

In cuda, loading to shared memory is slower than loading to registers

I'm not an experienced CUDA programmer. I got a problem like this. I'm trying to load a 32x32 tile of a large 10Kx10K matrix from global memory into shared memory and I'm timing it while it happens...

c performance cuda gpu-shared-memory

Mcmillan asked 13/8, 2015 at 10:38

3

Solved

Is there a way of setting default value for shared memory array?

Consider the following code: __global__ void kernel(int *something) { extern __shared__ int shared_array[]; // Some operations on shared_array here. } Is it possible to initialize the whole sh...

cuda gpu-shared-memory

Conglomeration asked 25/6, 2011 at 13:42

1

Solved

Numba CUDA shared memory size at runtime?

In CUDA C++ it's straightforward to define a shared memory of size specified at runtime. How can I do this with Numba/NumbaPro CUDA? What I've done so far has only resulted in errors with the messa...

python cuda numba gpu-shared-memory

Mathers asked 28/5, 2015 at 15:14

1

Solved

cuda shared memory - inconsistent results

I'm trying to do a parallel reduction to sum an array in CUDA. Currently i pass an array in which to store the sum of the elements in each block. This is my code: #include <cstdlib> #include...

c++cuda gpu-shared-memory

Elmerelmina asked 1/12, 2014 at 14:33

1

Solved

CUDA shared memory occupancy

If I have a 48kB shared memory per SM and I write a kernel where I allocate 32kB shared memory that means that only 1 block can be running on one SM at the same time?

cuda gpu-shared-memory

Prosaic asked 24/9, 2014 at 21:18

1

2D median filtering in CUDA: how to efficiently copy global memory to shared memory

I'm trying to do a median filter with a window x*y where x and y are odd and parameters of the program. My idea is first see how many threads I can execute in a single block and how much shared me...

filter cuda median gpu-shared-memory

Banas asked 1/6, 2013 at 19:34

1

Solved

Dynamic Shared Memory in CUDA

There are similar questions to what I'm about to ask, but I feel like none of them get at the heart of what I'm really looking for. What I have now is a CUDA method that requires defining two array...

cuda gpu-shared-memory

Bainbrudge asked 24/7, 2014 at 19:13

2

Solved

Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block ins...

cuda atomic volatile gpu-shared-memory

Juanajuanita asked 13/4, 2014 at 16:55

3

GPU shared memory size is very small - what can I do about it?

The size of the shared memory ("local memory" in OpenCL terms) is only 16 KiB on most nVIDIA GPUs of today. I have an application in which I need to create an array that has 10,000 integers. so the...

gpu nvidia gpu-shared-memory

Pongee asked 13/2, 2011 at 11:4

1

Solved

3D Convolution with CUDA using shared memory

I'm currently trying to adapt the 2D convolution code from THIS question to 3D and having trouble trying to understand where my error is. My 2D Code looks like this: #include <iostream> #d...

c++cuda gpu-shared-memory

Morpheus asked 22/3, 2014 at 12:53

1

Solved

Why does my kernel's shared memory seems to be initialized to zero?

As was mentioned in this Shared Memory Array Default Value question, shared memory is non-initialized, i.e. can contain any value. #include <stdio.h> #define BLOCK_SIZE 512 __global__ void ...

c cuda nvidia gpgpu gpu-shared-memory

Implied asked 4/3, 2014 at 13:10

1

CUDA: bank conflicts between different warps?

I just learned (from Why only one of the warps is executed by a SM in cuda?) that Kepler GPUs can actually execute instructions from several (apparently 4) warps at once. Can a shared memory bank...

cuda gpu-shared-memory bank-conflict

Sanhedrin asked 15/2, 2014 at 19:22

1

Solved

Upload data in shared memory for convolution kernel

I am having some difficulties to understand the batch loading as in the comments is referred. In order to compute the convolution in a pixel the mask whose size is 5 must become centered on this sp...

cuda gpu gpu-shared-memory

Archibald asked 27/1, 2014 at 12:10

gpu-shared-memory Questions

Recommended topics

Hot tags