Unexpected sizes of arrays in a HLSL Constant Buffer

I have not yet used more complicated CBs like this here but, from what I understand, my C++ alignment and packing has to match what HLSL expects. So I'm trying to figure out the rules so I can predictably lay out the C++ struct to match what HLSL expects.

I was doing some tests in a Vertex Shader v5 to see packing produced in the output and used this structure in the vs.hlsl:

cbuffer conbuf {
    float m0;
    float m1;
    float4 m2;
    bool m3[1];
    bool m4[4];
    float4 m5;
    float m6;
    float4 m7;
    matrix m8;
    float m9;
    float m10;
    float4 m11[2];
    float m12[8];
    float m13;
};

which produced the following output (in the Header File Name VC++ Project HLSL Settings):

cbuffer conbuf {
    float m0; // Offset: 0 Size: 4
    float m1; // Offset: 4 Size: 4
    float4 m2; // Offset: 16 Size: 16
    bool m3; // Offset: 32 Size: 4
    bool m4[4]; // Offset: 48 Size: 52
    float4 m5; // Offset: 112 Size: 16
    float m6; // Offset: 128 Size: 4
    float4 m7; // Offset: 144 Size: 16
    float4x4 m8; // Offset: 160 Size: 64
    float m9; // Offset: 224 Size: 4
    float m10; // Offset: 228 Size: 4
    float4 m11[2]; // Offset: 240 Size: 32
    float m12[8]; // Offset: 272 Size: 116
    float m13; // Offset: 388 Size: 4
};

I pretty much figured out how offsets work (based on sizes) but I cannot understand the array sizes.

Some array sizes in here seem random. I can't figure out how the bool m4[4] array has size: 52. Same for float m12[8] which is size: 116. How does the HLSL compiler manage to produce these sizes?

Any help? I've already looked on MSDN packing page but they don't say much about arrays.

I'll simplify your example a little bit, since you already get padding.

One important part for arrays, as per Packing rules (link you mentioned) is :

Arrays are not packed in HLSL by default. To avoid forcing the shader to take on ALU overhead for offset computations, every element in an array is stored in a four-component vector.

So let's take this simple cbuffer:

cbuffer cbPerObj : register( b0 )
{
     float Alpha[4];
};

As per the above rule (each float is stored in four vector), this would be (almost) equivalent to:

cbuffer cbPerObj : register( b0 )
{
     float4 Alpha[4];
};

Or (expanded)

 cbuffer cbPerObj : register( b0 )
{
     float Alpha1;
     float3 Dummy1;
     float Alpha2;
     float3 Dummy2;
     float Alpha3;
     float3 Dummy3;
     float Alpha4;
};

As you can notice, your last element is not padded, this is why you can notice in your case:

bool m4[4]; // Offset: 48 Size: 52
float4 m5; // Offset: 112 Size: 16

m4 is 16*4 = 64 (minus the last 3), 64-12 = 52

You can also notice that of course, 48 + 52 = 100 (so since m5 needs not to cross boundary, you can find the 12 lost bytes for the offset)

In the case you had,

bool m4[4]; // Offset: 48 Size: 52
float m5;

Offset for m5 would be 100, since it can fit the boundary.

Hope that makes sense.

Recommended topics

Hot tags