Should I use conditional branching in GLSL to avoid possible texture lookup?
Asked Answered
E

0

7

I have a set of questions about NOT uniform flow control in GLSL, and its performance cost on modern desktop GPUs. First of all, I want to note that I have read the manual but still didn't find answer. Lets get started.

  1. Alpha check and zero multiplication optimization. Which fragment shader will work faster? (the header is the same for both)

    in vec2 textureCoordIn; //interpolated texture coords from vertex shader
    out vec4 outputColor; //resulted color should be here
    uniform sampler2D alphaMask; // splat alpha mask for textures1-4;
    uniform sampler2D mainTexture1;
    uniform sampler2D mainTexture2;
    uniform sampler2D mainTexture3;
    uniform sampler2D mainTexture4;
    
    void main(){
        vec4 maskValues = texture(alphaMask,textureCoordIn);
        if (maskValues.r>0){
            outputColor += maskValues.r * texture(mainTexture1,textureCoordIn);
        }
        if (maskValues.g>0){
            outputColor += maskValues.g * texture(mainTexture2,textureCoordIn);
        }
        if (maskValues.b>0){
            outputColor += maskValues.b * texture(mainTexture3,textureCoordIn);
        }
        if (maskValues.w>0){
            outputColor += maskValues.w * texture(mainTexture4,textureCoordIn);
        }
    }
    

    OR

    void main(){
        vec4 maskValues = texture(alphaMask,textureCoordIn);
        outputColor += maskValues.r * texture(mainTexture1,textureCoordIn);
        outputColor += maskValues.g * texture(mainTexture2,textureCoordIn);
        outputColor += maskValues.b * texture(mainTexture3,textureCoordIn);
        outputColor += maskValues.w * texture(mainTexture4,textureCoordIn);
    }
    

    Lets assume that maskValues can have zeroes in 50% cases. What shader will perform faster? Also it is interesting, if glsl have the build-in optimization for zero multiplication. Does somebody knows?

  2. Texture array possible wrong index optimization. Avoiding undefined behaviour? Lets assume we have texture array (sampler2DArray). Every vertex has ivec4 attribute, that contain 4 texture indexes for this texture array. In fragment shader we need to return sum of texture colors for this indexes. Fairy simple. But what should we do, if we want to handle case, when indexes can point to "null" texture. At init step we can setup this indexes (vertex attributes) as "-1", that means the vec4(0,0,0,0) color. What is the best (and correct!) way to handle it?

    in vec2 textureCoordIn; //interpolated texture coords from vertex shader
    out vec4 outputColor; //resulted color should be here
    uniform sampler2DArray globalTextureArray;
    flat in ivec4 textureIndexes;
    
    void main(){
        if (textureIndexes.x > -1){
            outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.x));
        }
        if (textureIndexes.y > -1){
            outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.y));
        }
        if (textureIndexes.z > -1){
            outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.z));
        }
        if (textureIndexes.w > -1){
            outputColor += texture(globalTextureArray, vec3(textureCoordIn,textureIndexes.w));
        }
    }
    

    OR

    we should put "fake" (transparent-black) texture into globalTextureArray, and use their index to handle such case. So what is faster for this - if-else fork OR 4x texture lookups?

Enter answered 6/9, 2017 at 14:16 Comment(3)
If you want to use mipmaps (or any other algorithm that depends on derivatives), the texture access has to happen in uniform control flow. If you don't need mipmaps, you should clarify that in the question.Nunley
For most other parts of your question the only answer is "Measure it on your target hardware". The answer may largely differ between different hardware vendors and/or driver versions. An optimization could be there in one driver and not in another one. Also the costs of a texture lookup might be different.Nunley
If I were in your shoes, I'd implement the version without the conditionals, and investigate the alternative only as a late stage optimization. My gut feeling is that the conditionals will only make things slower. BTW - there's a typo in your non-conditional shader, the first assignment uses += instead of =.Oram

© 2022 - 2024 — McMap. All rights reserved.