I have a shader where I want to move half of the vertices in the vertex shader. I'm trying to decide the best way to do this from a performance standpoint, because we're dealing with well over 100,000 verts, so speed is critical. I've looked at 3 different methods: (pseudo-code, but enough to give you the idea. The <complex formula>
I can't give out, but I can say that it involves a sin()
function, as well as a function call (just returns a number, but still a function call), as well as a bunch of basic arithmetic on floating point numbers).
if (y < 0.5)
{
x += <complex formula>;
}
This has the advantage that the <complex formula>
is only executed half the time, but the downside is that it definitely causes a branch, which may actually be slower than the formula. It is the most readable, but we care more about speed than readability in this context.
x += step(y, 0.5) * <complex formula>;
Using HLSL's step() function (which returns 0 if the first param is greater and 1 if less), you can eliminate the branch, but now the <complex formula>
is being called every time, and its results are being multiplied by 0 (thus wasted effort) half of the time.
x += (y < 0.5) ? <complex formula> : 0;
This I don't know about. Does the ?:
cause a branch? And if not, are both sides of the equation evaluated or only the one that is relevant?
The final possibility is that the <complex formula>
could be offloaded back to the CPU instead of the GPU, but I worry that it will be slower in calculating sin() and other operations, which might result in a net loss. Also, it means one more number has to be passed to the shader, and that could cause overhead as well. Anyone have any insight as to which would be the best course of action?
Addendum:
According to http://msdn.microsoft.com/en-us/library/windows/desktop/bb509665%28v=vs.85%29.aspx
the step()
function uses a ?:
internally, so it's probably no better than my 3rd solution, and potentially worse since <complex formula>
is definitely called every time, whereas it may be only called half the time with a straight ?:
. (Nobody's answered that part of the question yet.) Though avoiding both and using:
x += (1.0 - y) * <complex formula>;
may well be better than any of them, since there's no comparison being made anywhere. (And y
is always either 0 or 1.) Still executes the <complex formula>
needlessly half the time, but might be worth it to avoid branches altogether.
RenderMonkey
can analyze performance for Radeon cards). In addition, is bottleneck in vertex shader? Maybe all variants will give the same results :) – Rabbincomplex formula
gets evaluated no matter what in any case. – Brookitey
can be computed as a function of the mesh, maybe you just split the mesh or the scene geometry and run two different shaders. – Brookitey
will always be either exactly 0 or 1, nothing in between. (It's the texture coords of a quad.) I suppose thestep()
function could be replaced with simply(1.0 - y)
and have the same effect. Still causes the formula to calculate twice as much as strictly necessary... – Silky