cuda 'memory bound' vs 'latency bound' vs 'bandwidth bound' vs 'compute bound'

In the many resources online it is possible to find different usages of 'memory','bandwidth' 'latency' bound kernels. It seems to me that the authors sometimes use their own definition of these terms and I think if would be very beneficial for someone to make a clear distinction.

To my understanding: Bandwidth bound kernels approach the physical limits of the device in terms of access to global memory. E.g. an application uses 170GB/s out of 177GB/s on an M2090 device.

A latency bound kernel is one whose predominant stall reason is due to memory fetches. So we are not saturating the global memory bus, but still have to wait to get the data into the kernel.

A compute bound kernel is one in which computation dominates the kernel time, under the assumption that there is no problem feeding the kernel with memory, and there is good overlap of arithmetic and latency.

If I got these correct, what would a 'memory bound' kernel be? Is there ambiguity, and if yes, should we limit the conversation to the three above terms?

Thanks!

Recommended topics

Hot tags