How to convert large arrays of quad primitives to triangle primitives?
Asked Answered
B

2

5

I have an existing system, which provides 3D meshes. The provided data are an array of vertex coordinates with 3 components (x, y, z) and an index list. The issue is that the index list is a consecutive array of quad primitives.
The system has to be make runnable with a core profile OpenGL Context first, and later with OpenGL ES 3.x, too.

I know that all the quads have all the same winding order (counter clockwise), but I have no further information about the quads. I don't know anything about their relation or adjacencies.

Since I want to use core profile Context for rendering, I cannot use the GL_QUAD primitive type. I have to convert the quads to triangles.

Of course the array of quad indices can easily be converted to an array of triangle indices:

std::vector<unsigned int> triangles;
triangles.reserve( no_of_indices * 6 / 4 );
for ( int i = 0; i < no_of_indices; i += 4 )
{
    int tri[] = { quad[i], quad[i+1], quad[i+2], quad[i], quad[i+2], quad[i+3] };
    triangles.insert(triangles.end(), tri, tri+6 );
}

If that has to be done only once, then that would be the solution. But the mesh data are not static. The data can change dynamically. The data do not change continuously and every time, but the data change unpredictably and randomly.

An other simple solution would be to create an vertex array object, which directly refers to an element array buffer with the quads and draw them in a loop with the GL_TRIANGLE_FAN primitive type:

for ( int i = 0; i < no_of_indices; i += 4 )
    glDrawElements( GL_TRIANGLE_FAN, 4, GL_UNSIGNED_INT, (void*)(sizeof(unsigned int) * 4) );

But I hope there is a better solution. I'm searching for a possibility to draw the quads with one single draw call, or to transform the quads to triangles on the GPU.

Boiardo answered 6/3, 2018 at 18:17 Comment(1)
"Since I want to use vertex buffers and vertex array objects for rendering, I cannot use the GL_QUAD primitve type." It's the core profile that ditched GL_QUADS. If you're willing to use the compatibility profile, then you can use them all you want.Bacchanalia
B
6

If that has to be done only once, then that would be the solution. But the mesh data are not static.

The mesh data may be dynamic, but the topology of that list is the same. Every 4 vertices is a quad, so every 4 vertices represents the triangles (0, 1, 2) and (0, 2, 3).

So you can build an arbitrarily large static index buffer containing an ever increasing series of these numbers (0, 1, 2, 0, 2, 3, 4, 5, 6, 4, 6, 7, etc). You can even use baseVertex rendering to offset them to render different serieses of quads using the same index buffer.

My suggestion would be to make the index buffer use GLushort as the index type. This way, your index data only takes up 12 bytes per quad. Using shorts gives you a limit of 16384 quads in a single drawing command, but you can reuse the same index buffer to draw multiple serieses of quads with baseVertex rendering:

constexpr GLushort batchSize = 16384;
constexpr unsigned int vertsPerQuad = 6;
void drawQuads(GLuint quadCount)
{
  //Assume VAO is set up.
  int baseVertex = 0;
  while(quadCount > batchSize)
  {
    glDrawElementsBaseVertex(GL_TRIANGLES​, batchSize * vertsPerQuad, GL_UNSIGNED_SHORT, 0, baseVertex​ * 4);
    baseVertex += batchSize;
    quadCount -= batchSize;
  }
  glDrawElementsBaseVertex(GL_TRIANGLES​, quadCount * vertsPerQuad, GL_UNSIGNED_SHORT, 0, baseVertex​ * 4);
}

If you want slightly less index data, you can use primitive restart indices. This allows you to designate an index to mean "restart the primitive". This allows you to use a GL_TRIANGLE_STRIP primitive and break the primitive up into pieces while still only having a single draw call. So instead of 6 indices per quad, you have 5, with the 5th being the restart index. So now your GLushort indices only take up 10 bytes per quad. However, the batchSize now must be 16383, since the index 0xFFFF is reserved for restarting. And vertsPerQuad must be 5.

Of course, baseVertex rendering works just fine with primitive restarting, so the above code works too.

Bacchanalia answered 12/1, 2020 at 20:20 Comment(13)
6 shorts per-quad gives you a drawing limit of 65536/6 quads? wrong. there's only 4 unique indices per-quad. there's 65536 unique indices (65535 PR) that can be used for indexing. so 65536/4 quads. (65535/4 PR) you're not limited to a count of 65536 elements just cause you're using shorts. lol which for some odd reason, you're using your quad counts over element counts. also the base vertex always shifts over 4 vertices per-quad. so even if you were shifting by your element count rather than your quad count, you'd still be shifting too much. whether 5 or 6 elements per-quad.Soredium
@Puddle: "there's only 4 unique indices per-quad. there's 65536 unique indices (65535 PR) that can be used for indexing." Fair enough. "seems when you closed my question you rushed to try copy my own solutions." No, it's still a duplicate of this question.Bacchanalia
i'll agree it's a duplicate of the same issue. but you still did pretty much rush it. either that or you never understood/tested it before. (to explain all the flaws) i mentioned the primitive restart because i'd assume they'd have the foresight for an every N indices version. without needing an index buffer. it'd be very handy. reduce even more memory. hence why i said it was pointless. they went half way with it.Soredium
also glMultiDrawElementsIndirect would be much better. reduce all the overhead of api calls, and have all the arguments stored on the gpu.Soredium
@Puddle: "also glMultiDrawElementsIndirect would be much better" Would it be "much better"? OpenGL's API overhead is primarily around state changes between draw calls; the execution time of two back-to-back draw calls is extremely small. Multi-draw indirect could cover the quads rendered by the loop, but the tail batch needs to have a variable number of vertices. And you don't want to be uploading to the indirect buffer for every quad batch. So unless the number of quads being rendered is in the hundreds of thousands, the above mechanism will perform just fine.Bacchanalia
if you're doing the same call multiple times, you might as well use multi draw. if you're building static data, you might as well use indirect rendering. why do you think these methods exist? i knew i should've mentioned static vs dynamic. obviously if you're just doing it once, then you'd just use glMultiDrawElementsBaseVertex since you won't need those drawing commands on the gpu anymore.Soredium
@Puddle: Using non-indirect multidraw requires allocating storage for an array. Allocating memory is not something you should do in the middle of rendering operations if you like CPU performance, which is what this whole conversation is about. Also, the entire draw call is not static; that's why I talked about the tail buffer (the one that isn't 16384 quads in size).Bacchanalia
@Puddle: Indirect rendering primarily exists to allow GPU operations to generate rendering commands. Also, if you have a better answer, you are free to add one. You don't need to post these comments.Bacchanalia
Let us continue this discussion in chat.Bacchanalia
or no multi, no base, no indirect. just glDrawElements. just extend the uint ibo enough? but again, seems overkill when unnecessary.Soredium
p.s. you're still using quad count instead of element count. (i do need to post these comments)Soredium
@Puddle: Fixed.Bacchanalia
you did base vertex wrong. my very first reply explains it. "the base vertex always shifts over 4 vertices per-quad" have you not used base vertex before? there's 4 verts per quad. you should make a elements per quad. since that'd also be 5 with PR.Soredium
B
2

First I want to mention that this is not a question which I want to answer myself, but I want to provide my current solution to this issue. This means, that I'm still looking for "the" solution, the perfectly acceptable solution.

In my solution, I decided to use Tessellation. I draw patches with a size of 4:

glPatchParameteri( GL_PATCH_VERTICES, self.__patch_vertices )
glDrawElements( GL_PATCHES, no_of_indices, GL_UNSIGNED_INT, 0 )

The Tessellation Control Shader has a default behavior. The patch data is passed directly from the Vertex Shader invocations to the tessellation primitive generation. Because of that it can be omitted completely.

The Tessellation Evaluation Shader uses a quadrilateral patch (quads) to create 2 triangles:

#version 450

layout(quads, ccw) in;

in TInOut
{
    vec3 pos;
} inData[];

out TInOut
{
    vec3 pos;
} outData;

uniform mat4 u_projectionMat44;

void main()
{
    const int inx_map[4] = int[4](0, 1, 3, 2);

    float i_quad = dot( vec2(1.0, 2.0), gl_TessCoord.xy );
    int   inx    = inx_map[int(round(i_quad))];

    outData.pos = inData[inx].pos;
    gl_Position = u_projectionMat44 * vec4( outData.pos, 1.0 );
}

An alternative solution would be to use a Geometry Shader. The input primitive type lines_adjacency provides 4 vertices, which can be mapped to 2 triangles (triangle_strip). Of course this seems to be a hack, since a lines adjacency is something completely different than a quad, but it works anyway.

glDrawElements( GL_LINES_ADJACENCY, no_of_indices, GL_UNSIGNED_INT, 0 );

Geometry Shader:

#version 450

layout( lines_adjacency ) in;
layout( triangle_strip, max_vertices = 4 ) out;

in TInOut
{
    vec3 pos;
} inData[];

out TInOut
{
    vec3 col;
} outData;

uniform mat4 u_projectionMat44;

void main()
{
    const int inx_map[4] = int[4](0, 1, 3, 2);
    for ( int i=0; i < 4; ++i )
    {
        outData.pos = inData[inx_map[i]].pos;
        gl_Position = u_projectionMat44 * vec4( outData.pos, 1.0 );
        EmitVertex();
    }
    EndPrimitive();
}

An improvement would be to use Transform Feedback to capture new buffers, containing triangle primitives.

Boiardo answered 6/3, 2018 at 18:17 Comment(6)
What is your evidence that either of these solutions (both of which employ tools that aren't exactly known for being speedy) is faster than the obvious method?Bacchanalia
Btw, if you use GL4.x level features, you could also use compute shaders to generate the new index buffer directly. Since each block of 4 indices will be generate 6 (or 5 for strips with primitive restart) new indices, you can easily split the task into a convenient number of threads and workgroups, without the need to communicate or synchronize the threads at all.Quartern
@Boiardo What are your objections to Nicol Bolas' answer?Icing
@Icing The idea is genius. It is simple and requires little programming effort and no extra shaders. I don't know why I didn't think about that. All my basic approaches have been too complicated. Of course the approach only works if the quad primitives are not defined by a indices. In that case the solution would be to generate a new index buffer, possibly by a compute shader, as well as derhass has suggested in his comment.Boiardo
@Boiardo then why don;t you accept his answer? I am confusedIcing
@Icing The answer is accepted.Boiardo

© 2022 - 2024 — McMap. All rights reserved.