I decided to build my engine on triangle lists after reading (a while ago) that indexed triangle lists perform better due to less draw calls needed. Today i stumbled on 0xffffffff, which in DX is considered a strip-cut index so you can draw multiple strips in one call. Does this mean that triangle lists no longer hold superior performance?
It is possible to draw multiple triangle strips in a single draw call using degenerate triangles which have an area of zero. A strip cut is made by simply repeating the last vertex of the previous and the first vertex of the next strip, adding two elements per strip break (two zero-area triangles).
New in Direct3D 10 are the strip-cut index (for indexed geometry) and the RestartStrip HLSL function. Both can be used to replace the degenerate triangles method effectively cutting down the bandwidth cost. (Instead of two indices for a cut only one is needed.)
Expressiveness
Can any primitive list be converted to an equal strip and vise versa? Strip to list conversion is of course trivial. For list to strip conversion we have to assume that we can cut the strip. Then we can map each primitive in the list to a one-primitive-sub-strip, though this would not be useful.
So, at least for triangle primitives, strips and lists always had the same expressiveness. Before Direct3D 10 strip cuts in line strips where not possible, so they actually were not equally expressive.
Memory and Bandwidth
How much data needs to be sent to the GPU? In order to compare the methods we need to be able to calculate the number of elements needed for a certain topology.
Primitive List Formula
N ... total number of elements (vertices or indices)
P ... total number of primitives
n ... elements per primitive (point => 1, line => 2, triangle => 3)
N = Pn
Primitive Strip Formula
N, P, n ... same as above
S ... total number of sub-strips
o ... primitive overlap
c ... strip cut penalty
N = P(n-o) + So + c(S-1)
primitive overlap describes the number of elements shared by adjacent primitives. In a classical triangle strip a triangle uses two vertices from the previous primitive, so the overlap is 2. In a line strip only one vertex is shared between lines, so the overlap is 1. A triangle strip using an overlap of 1 is of course theoretically possible but has no representation in Direct3D.
strip cut penalty is the number of elements needed to start a new sub-strip. It depends on the method used. Using strip-cut indices the penalty would be 1, since one index is used to separate two strips. Using degenerate triangles the penalty would be two, since we need two zero-area triangles for a strip cut.
From these formulas we can deduce that it depends on the geometry which method needs the least space.
Caching
One important property of strips is the high temporal locality of the data. When a new primitive is assembled each vertex needs to be fetched from GPU memory. For a triangle this has to be done three times. Now accessing memory is usually slow, that's why processors use multiple levels of caches. In the best case the data needed is already stored in the cache, reducing memory access time. Now for triangle strips the last two vertices of the previous primitive are used, almost guaranteeing that two of three vertices are already present in the cache.
Ease of Use
As stated above, converting a list to a strip is very simple. The problem is converting a list to an efficient primitive strip by reducing the number of sub-strips. For simple procedurally generated geometry (e.g. heightfield terrains) this is usually achievable. Writing a converter for existing meshes might be more difficult.
Conclusion
The introduction of Direct3D 10 has not much impact on the strip vs. list question. There is now equal expressiveness for line strips and a slight data reduction. In any case, when using strips you always gain the most if you reduce the number of sub-strips.
On modern hardware with pre- and post-transform vertex caches, tri-stripping is not a win over indexed triangle lists. The only time you really use tri-stripping would be non-indexed primitives generated by something where the strips are trivial to compute such as a terrain system.
Instead, you should do vertex cache optimization of indexed triangle lists for best performance. The Hoppe algorithm is implemented DirectXMesh, or you can look at Tom Forsyth's alternative algorithm.
© 2022 - 2024 — McMap. All rights reserved.