Is there any performance difference between Buffer, StructuredBuffer and ByteAddressBuffer (also their RW variants)?

Asked 1/8, 2022 at 15:46 Answered 17/11, 2022 at 20:57

Solved performance gpu directx-11 compute-shader

I tried looking this up on various websites, including MS Docs on DirectX 11 Compute Shader types; but I haven't found anything mentioning performance differences of these buffer types.
Are they exactly the same performance-wise ?
If no, what is the most optimum way of using each in various scenarios ?

Redoubtable answered 1/8, 2022 at 15:46 Comment(0)

Performance will eventually differ from GPU/Driver combination.

There is a project here that does benchmark access for those (the linear/random cases are the most useful).

Constant access is also useful if you want to compare cbuffer access versus other buffer access (on NVidia it is common to perform a buffer to cbuffer gpu copy before to go on an expensive shader for example).

https://github.com/sebbbi/perftest

Note that also different buffers (in d3d11 land) have different limitations. So the performance benefit can be hindered by those.

Structured buffers cannot be bound as vertex/index buffers. So if you want to use them you need to perform an extra copy. (For vertex buffers you can just fetch from vertex id, there is no penalty of this, index buffers can be read but are a bit more problematic).
Byte address allow to store anything in a non structured way (just a basic pointer somehow). Reads are still aligned to 4 bytes (int size). Converting to float (reads) need a asfloat, from float (writes) need a asuint, but in driver cases this is generally a nop, so there is no performance impact.
Byte address (and typed buffers) can be used as index buffer or vertex buffers. No copy necessary.
Typed buffers do not support Interlocked operations too well, in this case you need to use a Structured/ByteAddress buffer (note that you can use interlocked on a small buffer and perform the read/writes on a typed buffer if you want).
Byte address can be more annoying to use if you have an array of elements of the same type (even a float4x4 is a decent amount of code to fetch versus a StructuredBuffer < float4x4 >
Structured buffers allow you to bind "Partial views". So even if your buffers has let's say 2048 floats, you can bind a range from 4-456 (it also allows you to bind 500-600 as write at the same time since they are not overlapping).
For all buffers, if you use them as readonly, don't bind them as RW, this generally has a decent penalty.

Criner answered 3/8, 2022 at 16:32 Comment(0)

To add to the accepted answer,

There is also a performance penalty for elements in the StructuredBuffer not being aligned to a 128 bit stride [sizeof float4]. If not there is the possability that a single float4 for example could span across cache lines causing up to a 5% perf penalty.

An example of how to solve this is to use padding to re-align elements:

struct Foo
{
    float4 Position;
    float  Radius;
    float pad0;
    float pad1;
    float pad2;
    float4 Rotation;
};

NVIDIA post with more detail

Fieldwork answered 17/11, 2022 at 20:57 Comment(1)

While the NVIDIA post raises a good point, adding the padding might be undesirable in some cases as it would increase the size of the buffer. This might be more important if you are updating the structured buffer from CPU. – Germinative 30/6 at 14:43

Recommended topics

Hot tags