Which OpenGL functions are not GPU-accelerated?

Asked 26/4, 2010 at 12:32 Answered 26/4, 2010 at 20:52

Solved opengl gpu hardware-acceleration opengl-3

I was shocked when I read this (from the OpenGL wiki):

glTranslate, glRotate, glScale

Are these hardware accelerated?

No, there are no known GPUs that execute this. The driver computes the matrix on the CPU and uploads it to the GPU.

All the other matrix operations are done on the CPU as well : glPushMatrix, glPopMatrix, glLoadIdentity, glFrustum, glOrtho.

This is the reason why these functions are considered deprecated in GL 3.0. You should have your own math library, build your own matrix, upload your matrix to the shader.

For a very, very long time I thought most of the OpenGL functions use the GPU to do computation. I'm not sure if this is a common misconception, but after a while of thinking, this makes sense. Old OpenGL functions (2.x and older) are really not suitable for real-world applications, due to too many state switches.

This makes me realise that, possibly, many OpenGL functions do not use the GPU at all.

So, the question is:

Which OpenGL function calls don't use the GPU?

I believe knowing the answer to the above question would help me become a better programmer with OpenGL. Please do share some of your insights.

Edit:

I know this question easily leads to optimisation level. It's good, but it's not the intention of this question.

If anyone knows a set of GL functions on a certain popular implementation (as AshleysBrain suggested, nVidia/ATI, and possibly OS-dependent) that don't use the GPU, that's what I'm after!

Plausible optimisation guides come later. Let's focus on the functions, for this topic.

Edit2:

This topic isn't about how matrix transformations work. There are other topics for that.

Kizer answered 26/4, 2010 at 12:32 Comment(8)

There's nothing to be shocked about. You don't call glTranslate for each vertex or fragment, and it's just one matrix multiplication anyway, so there is normally no performance hit by running it on the CPU. – Heterogenetic 26/4, 2010 at 12:35

@Thomas: You're entirely correct. But in the beginning, it's hard for people to think about the graphics rendering pipeline, and where the vertex shader sits with glTranslate(). After a while it becomes more clear, but in the beginning I thought "Oh, this must be GPU-accelerated", hence the question. – Kizer 26/4, 2010 at 13:9

I thought T&L stood for Transformation & Lighting which is (was?) accelerated by the graphics card. – Canned 26/4, 2010 at 13:29

@graham.reeds: T&L is outdated. No professional-looking apps use it as a primary rendering function. See en.wikipedia.org/wiki/Transform,_clipping,_and_lighting. Mainly, people have moved on to vertex and fragment shaders, over fixed-function. – Kizer 26/4, 2010 at 13:39

Vertex shaders are a superset of what T&L provided. It's not outdated, it just evolved. – Ockham 26/4, 2010 at 16:24

@Axel: Fair enough! :] – Kizer 26/4, 2010 at 16:48

Shows how long it has been since I've done OpenGL in anger. I would of thought that those fixed functions would become essentially vertex shaders behind the scenes. – Canned 27/4, 2010 at 7:52

OpenGL never angered me. It's C++ that gets me. – Kizer 27/4, 2010 at 7:57

Boy, is this a big subject.

First, I'll start with the obvious: Since you're calling the function (any function) from the CPU, it has to run at least partly on the CPU. So the question really is, how much of the work is done on the CPU and how much on the GPU.

Second, in order for the GPU to get to execute some command, the CPU has to prepare a command description to pass down. The minimal set here is a command token describing what to do, as well as the data for the operation to be executed. How the CPU triggers the GPU to do the command is also somewhat important. Since most of the time, this is expensive, the CPU does not do it often, but rather batches commands in command buffers, and simply sends a whole buffer for the GPU to handle.

All this to say that passing work down to the GPU is not a free exercise. That cost has to be pitted against just running the function on the CPU (no matter what we're talking about).

Taking a step back, you have to ask yourself why you need a GPU at all. The fact is, a pure CPU implementation does the job (as AshleysBrain mentions). The power of the GPU comes from its design to handle:

specialized tasks (rasterization, blending, texture filtering, blitting, ...)
heavily parallel workloads (DeadMG is pointing to that in his answer), when a CPU is more designed to handle single-threaded work.

And those are the guiding principles to follow in order to decide what goes in the chip. Anything that can benefit from those ought to run on the GPU. Anything else ought to be on the CPU.

It's interesting, by the way. Some functionality of the GL (prior to deprecation, mostly) are really not clearly delineated. Display lists are probably the best example of such a feature. Each driver is free to push as much as it wants from the display list stream to the GPU (typically in some command buffer form) for later execution, as long as the semantics of the GL display lists are kept (and that is somewhat hard in general). So some implementations only choose to push a limited subset of the calls in a display list to a computed format, and choose to simply replay the rest of the command stream on the CPU.

Selection is another one where it's unclear whether there is value to executing on the GPU.

Lastly, I have to say that in general, there is little correlation between the API calls and the amount of work on either the CPU or the GPU. A state setting API tends to only modify a structure somewhere in the driver data. It's effect is only visible when a Draw, or some such, is called.

A lot of the GL API works like that. At that point, asking whether glEnable(GL_BLEND) is executed on the CPU or GPU is rather meaningless. What matters is whether the blending will happen on the GPU when Draw is called. So, in that sense, Most GL entry points are not accelerated at all.

I could also expand a bit on data transfer but Danvil touched on it.

I'll finish with the little "s/w path". Historically, GL had to work to spec no matter what the hardware special cases were. Which meant that if the h/w was not handling a specific GL feature, then it had to emulate it, or implement it fully in software. There are numerous cases of this, but one that struck a lot of people is when GLSL started to show up.

Since there was no practical way to estimate the code size of a GLSL shader, it was decided that the GL was supposed to take any shader length as valid. The implication was fairly clear: either implement h/w that could take arbitrary length shaders -not realistic at the time-, or implement a s/w shader emulation (or, as some vendors chose to, simply fail to be compliant). So, if you triggered this condition on a fragment shader, chances were the whole of your GL ended up being executed on the CPU, even when you had a GPU siting idle, at least for that draw.

Afferent answered 26/4, 2010 at 20:52 Comment(1)

I've gotta say, that is a long post. It has a lot of history as well, very nice. I thoroughly enjoyed reading it. So far, I've concluded that there isn't a definite list, but more of a general guideline and understanding yes? Thanks everyone. I'm going to accept this answer and ponder the next impossible. – Kizer 26/4, 2010 at 21:5

The question should perhaps be "What functions eat an unexpectedly high amount of CPU time?"

Keeping a matrix stack for projection and view is not a thing the GPU can handle better than a CPU would (on the contrary ...). Another example would be shader compilation. Why should this run on the GPU? There is a parser, a compiler, ..., which are just normal CPU programs like the C++ compiler.

Potentially "dangerous" function calls are for example glReadPixels, because data can be copied from host (=CPU) memory to device (=GPU) memory over the limited bus. In this category are also functions like glTexImage_D or glBufferData.

So generally speaking, if you want to know how much CPU time an OpenGL call eats, try to understand its functionality. And beware of all functions, which copy data from host to device and back!

Bibliofilm answered 26/4, 2010 at 13:6 Comment(1)

Thanks, Danvil. You brought up a good candidate key to pick out which functions eats CPU time. And memory bus as well. +1. – Kizer 26/4, 2010 at 13:10

Typically, if an operation is per-something, it will occur on the GPU. An example is the actual transformation - this is done once per vertex. On the other hand, if it occurs only once per large operation, it'll be on the CPU - such as creating the transformation matrix, which is only done once for each time the object's state changes, or once per frame.

That's just a general answer and some functionality will occur the other way around - as well as being implementation dependent. However, typically, it shouldn't matter to you, the programmer. As long as you allow the GPU plenty of time to do it's work while you're off doing the game sim or whatever, or have a solid threading model, you shouldn't need to worry about it that much.

@sending data to GPU: As far as I know (only used Direct3D) it's all done in-shader, that's what shaders are for.

Bala answered 26/4, 2010 at 17:53 Comment(1)

@DeadMG: Your answer was my other choice, just FYI. Concise and short. – Kizer 26/4, 2010 at 21:6

glTranslate, glRotate and glScale change the current active transformation matrix. This is of course a CPU operation. The model view and projection matrices just describes how the GPU should transforms vertices when issue a rendering command.

So e.g. by calling glTranslate nothing is translated at all yet. Before rendering the current projection and model view matrices are multiplied (MVP = projection * modelview) then this single matrix is copied to the GPU and then the GPU does the matrix * vertex multiplications ("T&L") for each vertex. So the translation/scaling/projection of the vertices is done by the GPU.

Also you really should not be worried about the performance if you don't use these functions in an inner loop somewhere. glTranslate results in three additions. glScale and glRotate are a bit more complex.

My advice is that you should learn a bit more about linear algebra. This is essential for working with 3D APIs.

Ockham answered 26/4, 2010 at 14:56 Comment(6)

Translation, Rotation, and Scaling can all be done using matrix operations and could be done by the hardware very quickly. They're not part of the pipeline but just change a matrix in the pipeline, so it makes sense to keep it on the CPU. But then WHY put these functions into the OpenGL API? I can see the source of confusion. – Hygrometer 26/4, 2010 at 15:26

History. I think this has something to do with the client/server model and the matrix stack the driver provides. D3D is cleaner in that sense, because it puts these functions in D3DX. – Ockham 26/4, 2010 at 16:23

Thanks, Axel, for clarifying where the matrices are uploaded to the GPU. I am aware of how these matrices work, and how they perform in mathematical terms. As I edited in the topic, I'm not worried about speed in this topic. || By the way, do you know how to "copy" this MVP matrix onto the GPU for rendering with OpenGL 3.x? I know one way - vertex shader - but is there a more preferred way? || (My upvote is capped, will +1 when I can.) – Kizer 26/4, 2010 at 16:54

In older GLSL versions the normal GL matrices were available through the special uniform variables gl_ModelViewProjectionMatrix etc. But I think this was deprecated for GL 3.0 or later and you can use any uniform you want via glUniformMatrix. This is also how it's done in D3D. In that sense the MVP matrix is just another shader input and is not treated specially anymore. If you use the fixed function pipeline ("T&L") the driver itself will create an appropriate vertex shader for the set lights etc. and will assign a uniform with the MVP matrix itself. – Ockham 26/4, 2010 at 18:9

Thanks for commenting and adding a lot of insight, Axel. I learned the most from your posts. If my topic was about learning the pipelines of OpenGL, your answer would have been accepted. – Kizer 26/4, 2010 at 21:7

Well, to be precise glTranslate doesn't result in just 3 additions, but 12 multiplications and 12 additions at a minimum. Remember that the current matrix is post-multiplied by the translation matrix. – Discontented 18/2, 2012 at 14:33

There are software rendered implementations of OpenGL, so it's possible that no OpenGL functions run on the GPU. There's also hardware that doesn't support certain render states in hardware, so if you set a certain state, switch to software rendering, and again, nothing will run on the GPU (even though there's one there). So I don't think there's any clear distinction between 'GPU-accelerated functions' and 'non-GPU accelerated functions'.

To be on the safe side, keep things as simple as possible. The straightforward rendering-with-vertices and basic features like Z buffering are most likely to be hardware accelerated, so if you can stick to that with the minimum state changing, you'll be most likely to keep things hardware accelerated. This is also the way to maximize performance of hardware-accelerated rendering - graphics cards like to stay in one state and just crunch a bunch of vertices.

Durarte answered 26/4, 2010 at 13:39 Comment(3)

@Ashleys, I did consider the vast different implementations of OpenGL, and where to draw that line for this question. I guess I'm after a majority-vote, here. Thanks for the tip for keeping things simple! – Kizer 26/4, 2010 at 13:43

Perhaps you could phrase the question around ATI/nVidia/Intel (shudder) cards or some other leading manufacturers, since I'd guess that's the real-world case where acceleration matters. – Durarte 26/4, 2010 at 13:49

I could. But I think these brands are popular enough to not warrant such clarification, I hope? – Kizer 26/4, 2010 at 13:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags