A triangle with 3 varyings of same value.. does GPU interpolate / waste performance?

Asked 10/4, 2014 at 22:13 Answered 25/1, 2016 at 20:31

I have a simple question of which I was unable to find solid facts about GPUs behaviour in case of 3 vertexes having the same varying output from vertex shader. Does the GPU notice that case or does it try to interpolate when its not even needed ?

This might be interesting as there are quite some cases where you want a constantish varying available in fragment shader per triangle. Please don't just guess, try to bring up references or atleast reasons why you think its the one way or another.

Farnese answered 10/4, 2014 at 22:13 Comment(6)

As the number of varyings is fixed and basically all hardware will have been implemented to calculate the next value in one cycle, surely there'd be no advantage to engineering an additional special case pathway? I've no references, this is just a guess. So it's not an answer. – Muff 10/4, 2014 at 23:1

It would be interesting to know if interpolation is still a fixed function or even hardware implemented function. I could imagine they would implement a check BEFORE generating a few thousand fragment interpolation jobs. In that place it might be kinda cheap with huge gains for that case. However, if its dedicated hardware doing the interpolation they might just say "screw it, there's no harm in using an otherwise unused resource". Its a kinda interesting question to me. – Farnese 10/4, 2014 at 23:7

@ManuelArwedSchmidt: There is dedicated attribute interpolator hardware. AMD calls them SPIs (Shader Processor Interpolators), and in the DX11 shader model the pixel (fragment) shader is capable of requesting interpolation work on-demand rather than having it done before the shader even starts working. Really smart shader compilers could, therefore, avoid interpolating some parameters except for during the run-time execution of infrequent branches of code on hardware that supports the "pull-model". – Sapindaceous 11/4, 2014 at 2:27

Since you wanted some supporting documentation, and AMD has the most open hardware architecture by lightyears, you should skim through this document for references to SPI. – Sapindaceous 11/4, 2014 at 2:32

I can't imagine there is any perf gain to be had by implementing that. You'd need more complex hardware, more complex software to check, vs just doing it. Interpolation is not complex at all. You're basically just adding a constant at each pixel. So, while interesting, what's the point? You're arguably not going to get any perf gains. Even if there was a positive perf difference (I honestly believe there'd be a negative perf difference), but even if there was a positive one it would be tiny in comparison to the rest of your shader that it would be nearly unmeasurable. – Kelso 11/4, 2014 at 6:55

Here's some code that implements varyings in software. You'll see they're a single add. Optimizing out that add would make code slower not faster as you'd have 2 options (1) generate a new code that doesn't have the adds. Then you'd end up having to generate new code for every combination of varyings. (2) use a kind of function pointer and set it to a no-op function. But just adding the function indirection would arguably make it slower. Especially because in real hardware there'd be no function. – Kelso 11/4, 2014 at 6:59

The GPU does the interpolation, no matter if it's needed or not.

The reason is quite simple: checking if the varying variable has already been changed is very expensive.

Shaders are small programs, that are executed concurrently on different GPU cores. So if you would like to avoid that two different cores are computing the same value, you would have to "reserve" the output variable. So you need an additional data structure (like a flag or mutex) that every core can read. In your case this would mean, that three different cores have to read the same flag, and the first of them has to reserve it if it's not already reserved.

This has to happen atomically, meaning that the reserving core has to be the only one who is setting the flag at a time. To do this all other cores would e.g. have to be stopped for a tick. As you don't know the which cores are computing the vertex shader you would have to stop ALL other cores (on a GTX Titan this would be 2687 others).

Additionally, when the variable is set and a new frame is rendered, all the flags would have to be reset, so the race for the flag can begin again.

To conclude: you would need additional hardware in your GPU, that is expensive and slows down the rendering pipeline.

It is the programmers job to avoid that multiple shaders are producing the same output. So if you are doing your job right this does not happen or you know, that avoiding it (on the CPU) would cost more than ignoring it.

An example would be the stiching for different levels of detail (like on a height map), where most methods are creating some fragments twice. This is a very small impact on the rendering performance but would require a lot of CPU time to avoid.

Sergius answered 25/1, 2016 at 15:6 Comment(0)

If the behavior isn't mandated in the OpenGL specificiation then the answer is that it's up to the implementation.

The comments and other answers are almost certainly spot on that there is no optimization path for identical values because there would be little to no benefit from the added complexity to make such a path.

Lilylivered answered 25/1, 2016 at 20:31 Comment(0)

Recommended topics

Hot tags