Instanced drawing of dynamic models in OpenGL
Asked Answered
A

1

8

I am currently developing a framework that allows me to conveniently render a larger number of animated models.

A model is organized as a simple hierarchy of bones, with the root being the torso/pelvis, generally:

My simple modeling hierarchy

So, as pseudo code, I am currently rendering a model like this:

RenderBone(Bone b, Mat4x4 currentTransform){
    Mat4x4 pos = currentTransform * b.boneTransform;
    SetUniform("transformation", pos);
    Draw(bone.mesh);
    for each Bone bc in b.children do{
         RenderBone(bc, pos);
    }
}

So for a single actor that uses a model with n bones I need n SetUniform (not counting stuff like setting textures) and n draw calls.

Trying to reduce that overhead, and render all actors using the same model at once, I thought about switching to instanced rendering.

However, all information and tutorials I could find are about drawing cubes, spheres or similar simple objects. Nowhere I could see some simple, comprehensible information about how to use instanced drawing to render models where each part (bone) requires a different transformation matrix to be given to the shader.

So, the problem: Using glVertexAttribDivisor or gl_InstanceID I can only specify an instance-related matrix, not a bone-realted matrix. How do I apply my bone transformations then?

The only feasible solution I could think of is - instead of instancing the entire model - I can instance each bone. Thus drawing all instances of one bone type, then another one, etc. But then I would still have to update the buffer with the transformation matrices relatively often, and it's more housekeeping code.

So is this best best option? Or, more generally, are there better not-too-complicated ways of rendering? Or does instanced rendering only really shine when using it with static geometry?

Apocrine answered 15/8, 2012 at 16:3 Comment(8)
"Trying to reduce that overhead" Stop right there. Why are you trying to reduce the overhead? Do you have actual profiling data from platforms of interest that show that this overhead specifically is a problem? If not, then you're optimizing prematurely.Ignatius
@NicolBolas I don't have. I know that you can't simply put a price tag on a OGL call but people seem to agree that reducing number of function calls is a good idea, aye? Currently I have no need of optimizing anything considering I can reach 1000 FPS with my sticks-and-stones rendering implementation. But if there is a way of rendering that's generally well performing then why not go with it to begin with? When I arrive at a point where my problem set is sufficiently complex that I actually can profile it I can still see where the problem lies and change the implementation if need be.Apocrine
"But if there is a way of rendering that's generally well performing then why not go with it to begin with?" Because it takes longer to implement and may not be of any value whatsoever in the end. It makes the code harder to debug and make development take longer, for possibly no actual measurable gain. Optimization should never be something you do just because you can. Oh, and no, reducing the number of function calls is not a priori faster; that's another reason not to optimize until you have performance data.Ignatius
Thanks for writing up your answer, it sheds some more light for me. I'm aware that I'm basically optimising for optimisations sake. But I'm not trying to build an efficient engine for real use, rather to have a playground where I can experiment and learn things even if they aren't always the best ideas. I think that is a better way of learning OpenGL and having some fun than trying to make a game that - in the rare case that you actually finish it - leaves you disappointed because it's so much worse than the awesome idea you've had in your head when you first started :)Apocrine
How is 1000 FPS possible with OpenGL? Isn't the FPS limited by the refresh rate of the display you are using?Garth
@Garth Practically yes, but that doesn't mean that you can't let your program churn out frames at maximum rate. Your monitor just isn't fast enough to display all frames, so they're wasted.Apocrine
@Apocrine I was thinking of this: glprogramming.com/red/chapter01.html See "The Refresh that Pauses" section.Garth
@Garth the important part from that link is "For some OpenGL implementations" which means some - maybe most - implementations don't wait for the screen refresh rate. Which means you can swap more than once per displayed frame.Apocrine
I
18

Instancing is something you use when you need to draw thousands of copies of the same model. In general, meshes with bones are not the kinds of things you need to draw thousands of.

Instancing is an optimization, and one that doesn't always pay off. You shouldn't bother trying to employ it unless you know that you need it (by profiling and seeing if you're hitting performance targets). And even then, it can be very touchy as to when it is an actual performance improvement.

Sometimes, it just doesn't help. But here are some general rules of thumb:

  1. Instancing is not worthwhile unless you're rendering thousands of instances.
  2. Instancing shouldn't be used with meshes that have too many vertices or too few. 100-1,000 or so.

Remember that these are general rules, not absolute laws. They're also hardware-dependent.

So, the problem: Using glVertexAttribDivisor or gl_InstanceID I can only specify an instance-related matrix, not a bone-realted matrix. How do I apply my bone transformations then?

You're thinking far too much in terms of what examples you've seen or what you've seen other people doing. Think like a programmer.

gl_InstanceID is not "an instance-related matrix"; it is an index. What you do with that index is entirely up to you. Most examples you've seen use this index to lookup an array of matrices, likely stored in a uniform block or a buffer texture. This matrix is the transform you use for rendering. Each index represents the transform for a single instance.

Each of your instances has multiple matrices, multiple transforms. But each instance has the same number of bones (otherwise it wouldn't be instanced rendering). Let's say you have 5 bones.

Again, each index is the transform for a single instance. The difference between your case and the standard is how much information is needed per-instance. The regular case needs one matrix; you need five. But the idea is the same either way.

If you need bone index 3 for your current instance, you simply access your matrix array with this expression: (gl_InstanceID * 5) + 3, where 5 is the number of bones per instance.

The rest is a simple matter of using a per-vertex attribute to pass the bone index to be used to transform each vertex.

Ignatius answered 15/8, 2012 at 16:45 Comment(2)
Yet are not particle syustems one of the most common uses for instancing? They typically use 1-4 vertices, not 100-1000.Preschool
@Jackalope: "Yet are not particle syustems one of the most common uses for instancing?" No, they're not.Ignatius

© 2022 - 2024 — McMap. All rights reserved.