My 9600GT hates me.
Fragment shader:
#version 130
uint aa[33] = uint[33](
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0
);
void main() {
int i=0;
int a=26;
for (i=0; i<a; i++) aa[i]=aa[i+1];
gl_FragColor=vec4(1.0,0.0,0.0,1.0);
}
If a=25
program runs at 3000 fps.
If a=26
program runs at 20 fps.
If size of aa
<=32 issue doesn't appear.
Viewport size is 1000x1000.
Problem occurs only when the size of aa
is >32.
Value of a
as the threshold varies with the calls to the array inside the loop (aa[i]=aa[i+1]+aa[i-1]
gives a different deadline).
I know gl_FragColor
is deprecated. But that's not the issue.
My guess is that GLSL doesn't unroll automatically the loop if a>25 and size(aa)>32. Why. The reason why it depends on the size of the array is unknown to mankind.
A quite similar behavior explained here:
http://www.gamedev.net/topic/519511-glsl-for-loops/
Unwinding the loop manually does solve the issue (3000 fps), even if aa
size is >32:
aa[0]=aa[1];
aa[1]=aa[2];
aa[2]=aa[3];
aa[3]=aa[4];
aa[4]=aa[5];
aa[5]=aa[6];
aa[6]=aa[7];
aa[7]=aa[8];
aa[8]=aa[9];
aa[9]=aa[10];
aa[10]=aa[11];
aa[11]=aa[12];
aa[12]=aa[13];
aa[13]=aa[14];
aa[14]=aa[15];
aa[15]=aa[16];
aa[16]=aa[17];
aa[17]=aa[18];
aa[18]=aa[19];
aa[19]=aa[20];
aa[20]=aa[21];
aa[21]=aa[22];
aa[22]=aa[23];
aa[23]=aa[24];
aa[24]=aa[25];
aa[25]=aa[26];
aa[26]=aa[27];
aa[27]=aa[28];
aa[28]=aa[29];
aa[29]=aa[30];
aa[30]=aa[31];
aa[31]=aa[32];
aa[32]=aa[33];
#pragma
directives. And unrolling loops is one of those directives. The particular non-portable pragma you want is#pragma optionNV (unroll all)
. It's usually better to unroll the stuff yourself, since AMD/Intel/... don't know what this#pragma
is. The joys of each vendor implementing their own compiler - one thing I sort of like about HLSL, Microsoft implements the one and only compiler so everything is pretty consistent there. – Cogency