I have a set of questions about NOT uniform flow control in GLSL, and its performance cost on modern desktop GPUs. First of all, I want to note that I have read the manual but still didn't find answer. Lets get started.
Alpha check and zero multiplication optimization. Which fragment shader will work faster? (the header is the same for both)
in vec2 textureCoordIn; //interpolated texture coords from vertex shader out vec4 outputColor; //resulted color should be here uniform sampler2D alphaMask; // splat alpha mask for textures1-4; uniform sampler2D mainTexture1; uniform sampler2D mainTexture2; uniform sampler2D mainTexture3; uniform sampler2D mainTexture4; void main(){ vec4 maskValues = texture(alphaMask,textureCoordIn); if (maskValues.r>0){ outputColor += maskValues.r * texture(mainTexture1,textureCoordIn); } if (maskValues.g>0){ outputColor += maskValues.g * texture(mainTexture2,textureCoordIn); } if (maskValues.b>0){ outputColor += maskValues.b * texture(mainTexture3,textureCoordIn); } if (maskValues.w>0){ outputColor += maskValues.w * texture(mainTexture4,textureCoordIn); } }
OR
void main(){ vec4 maskValues = texture(alphaMask,textureCoordIn); outputColor += maskValues.r * texture(mainTexture1,textureCoordIn); outputColor += maskValues.g * texture(mainTexture2,textureCoordIn); outputColor += maskValues.b * texture(mainTexture3,textureCoordIn); outputColor += maskValues.w * texture(mainTexture4,textureCoordIn); }
Lets assume that maskValues can have zeroes in 50% cases. What shader will perform faster? Also it is interesting, if glsl have the build-in optimization for zero multiplication. Does somebody knows?
Texture array possible wrong index optimization. Avoiding undefined behaviour? Lets assume we have texture array (sampler2DArray). Every vertex has ivec4 attribute, that contain 4 texture indexes for this texture array. In fragment shader we need to return sum of texture colors for this indexes. Fairy simple. But what should we do, if we want to handle case, when indexes can point to "null" texture. At init step we can setup this indexes (vertex attributes) as "-1", that means the vec4(0,0,0,0) color. What is the best (and correct!) way to handle it?
in vec2 textureCoordIn; //interpolated texture coords from vertex shader out vec4 outputColor; //resulted color should be here uniform sampler2DArray globalTextureArray; flat in ivec4 textureIndexes; void main(){ if (textureIndexes.x > -1){ outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.x)); } if (textureIndexes.y > -1){ outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.y)); } if (textureIndexes.z > -1){ outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.z)); } if (textureIndexes.w > -1){ outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.w)); } }
OR
we should put "fake" (transparent-black) texture into globalTextureArray, and use their index to handle such case. So what is faster for this - if-else fork OR 4x texture lookups?