Shader's function parameters performance
Asked Answered
B

3

13

I'm trying to understand how passing parameters is implemented in shader languages.

I've read several articles and documentation, but still I have some doubts. In particular I'm trying to understand the differences with a C++ function call, with a particular emphasis on performances.

There are slightly differences between HLSL,Cg and GLSL but I guess the underline implementation is quite similar.

What I've understood so far:

  • Unless otherwise specified a function parameter is always passed by value (is this true even for matrix?)
  • Passing by value in this context hasn't the same implications as with C++. No recursion is supported, so the stack isn't used and most function are inlined and arguments directly put into registers.
  • functions are often inlined by default (HLSL) or at least inline keyword is always respected by the compiler (Cg)

Are the considerations above right?

Now 2 specific question:

  1. Passing a matrix as function parameter

    inline float4 DoSomething(in Mat4x4 mat, in float3 vec) { ... }

Considering the function above, in C++ that would be nasty and would be definitely better to use references : const Mat4x4&.

What about shaders? Is this a bad approach? I read that for example inout qualifier could be used to pass a matrix by reference, but actually it implicates that matrix be modified by the called function..

  1. Does the number (and type of arguments) have any implication? For example is better use functions with a limited set of arguments?Or avoid passing matrices? Is inout modifier a valid way to improve performance here? If so, anyone does know how a typical compiler implement this?

  2. Are there any difference between HLSL an GLSL on this? Does anyone have hints on this?

Beadruby answered 15/2, 2015 at 15:22 Comment(0)
M
13

According to the spec, values are always copied. For in parameters, the are copied at call time, for out parameters at return time, and for inout parameters at both call and return time.

In the language of the spec (GLSL 4.50, section 6.1.1 "Function Calling Conventions"):

All arguments are evaluated at call time, exactly once, in order, from left to right. Evaluation of an in parameter results in a value that is copied to the formal parameter. Evaluation of an out parameter results in an l-value that is used to copy out a value when the function returns. Evaluation of an inout parameter results in both a value and an l-value; the value is copied to the formal parameter at call time and the lvalue is used to copy out a value when the function returns.

An implementation is of course free to optimize anything it wants as long as the result is the same as it would be with the documented behavior. But I don't think you can expect it to work in any specify way.

For example, it wouldn't be save to pass all inout parameters by reference. Say if you had this code:

vec4 Foo(inout mat4 mat1, inout mat4 mat2) {
    mat1 = mat4(0.0);
    mat2 = mat4(1.0);
    return mat1 * vec4(1.0);
}

mat4 myMat;
vec4 res = Foo(myMat, myMat);

The correct result for this is a vector containing all 0.0 components. If the arguments were passed by reference, mat1 and mat2 inside Foo() would alias the same matrix. This means that the assignment to mat2 also changes the value of mat1, and the result is a vector with all 1.0 components. Which would be wrong.

This is of course a very artificial example, but the optimization has to be selective to work correctly in all cases.

Merril answered 18/2, 2015 at 7:3 Comment(3)
thanks for the answer..so for sure inout don't correspond to pass by reference. I read somewhere on SO that it would work as reference, but your example shows that is clearly wrong and doesn't optimize anything.Beadruby
@Heisenbug: You will see it discussed that way in countless places on Stack Overflow and elsewhere. It's not so much that inout is a performance optimization, as it is, the closest thing GLSL has to references. The proper behavior is actually that the data is copied in on call and out on return, but an optimizing compiler isn't going to waste time making unnecessary copies unless you do something like shown here. It goes back to what I was discussing in my answer, the qualifier itself is not what matters, but it's really what you do with the variable you pass.Hiding
A clever compiler does it by reference or inline or anything it believes clever. A clever compiler catches the point of Foo(myMat, myMat) too. What a clever compiler does, it is a black box for us. But don't take to seriously that a compiler "copies" instances of mat4, because specification says it.Erenow
H
4

Your first bullet point does not work when you consider arguments qualified using inout.

The real issue is what you do with the parameter inside the function, if you modify a parameter qualified with in then it cannot be "passed by reference" and a copy will have to be made. On modern hardware this probably is not a big deal, but Shader Model 2.0 was pretty limited in terms of number of temp registers and I ran into these kinds of issues more than once when GLSL and Cg first came out.

For reference, consider the following GLSL code:

vec4 DoSomething (mat4 mat, vec3 vec)
{
  // Pretty straight forward, no temporary registers are required to pass arguments.
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

vec4 DoSomethingCopy (mat4 mat, vec3 vec)
{
  mat [0][0] = 0.0; // This requires the compiler to make a local copy of mat
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

vec4 DoSomethingInOut (inout mat4 mat, in vec3 vec)
{
  mat [0][0] = 0.0; // No copy required, but the original mat is modified
  return vec4 (mat [0] + vec4 (vec, 0.0));
}

I cannot really comment on performance, my only bad experiences had to do with hitting actual hardware limits on older GPUs. Of course you should assume that any time something has to be copied it is going to negatively impact performance.

Hiding answered 16/2, 2015 at 2:28 Comment(8)
so as long as I don't modify matrix passed as argument, a copy isn't required and the compiler will optimize that?Beadruby
@Heisenbug: I can say conclusively that in HLSL this is the case (Microsoft implements that compiler, and you can take a look at the generated bytecode if you want to see how it optimizes something). GLSL is a different story, since every vendor implements their own compiler. It is almost inconceivable that a vendor's compiler would not recognize this situation, but stranger things have happened.Hiding
thanks for the information. I'm mostly using profile 2.0 targeting mobile (through unity so Cg is crosscompiled into glsl making things even harder to predict). I still let the question opened for a while, in order to collect as much info as possible.Beadruby
@Heisenbug: Please do keep it open. I was able to fit the gist of that answer into a comment originally (minus the code sample), which is usually an indication (to me anyway) that the answer is incomplete ;) Your question might get more attention if it included a graphics API as one of its tags though.Hiding
@Beadruby just my two cents: I've run into cases with exactly this in which the compiler hasn't optimized things I assumed it could (specifically passing dynamically indexed arrays). Specs say literally "copied in" and "copied out" depending on in/out qualifiers. Implementation may optimize but doesn't have to. I guess optimizations are tricky with aliasing. I.e. you write to a global variable from within a function you've passed it to as an out.Zeph
@jozxyqk: thanks for the tip. I'm refactoring tons of shader code (at the moment efficient but not maintainable). Now I'm a bit worried of refactoring using functions, if I can encounter a penalty of copying matrices. What is not completely clear to me is that from specs, inout qualifier should prevent matrix copying, so despite the wrong semantic(because I won't actually modify the matrix), could be a way to enforce a matrix not to be copied?Beadruby
@Beadruby The compiler does a decent job in most cases, I just wanted to point out its not infallible. The only way I know of is to sift through the text part from glGetProgramBinary (some intermediate semi-compiled instructions) to check what's actually happening. Though it can get pretty complex with large amounts of code. I wish GLSL had similar tools and support CUDA gets. One option (not a pretty one) is to define macros instead of functions.Zeph
@AndonM.Coleman Your comment is not longer correct in either modern opengl or Vulkan, both now use SPIR-V intermediary language (or at least you have the option now to use SPIR-V in opengl). This is now similar to HLSL byte code (though its more of a "4 byte" code than byte code, everything is 4 bytes in SPIR-V). Vendor only needs to JIT the SPIR-V code and Khronos has optimizer for SPIR-V itself.Paraph
C
2

All shader functions are inlined (recursive function are forbidden). The concept of reference/pointer is invalid here too. The only case when some code will be generated is when you write on an input parameter. However, if the original registers aren't used anymore the compiler will probably use the same registers, and the copy (mov operation) won't be needed.

Bottom line: function invocation is free.

Corridor answered 17/2, 2015 at 16:8 Comment(2)
Do you have a source for this claim? I don't see anything in the specs saying that implementations have to inline all functions.Merril
I guess that the specs didn't say this. However, since the specs says that recursion is not allowed, compilers are free to do this. Moreover I read the assembly code of some GLSL shaders on Nvidia hardware and all functions were inlined: no function calling, no function definition. I guess AMD hardware will do the same. I don't know a reason to not do this if recursive functions are forbidden.Corridor

© 2022 - 2024 — McMap. All rights reserved.