). At some point during the development of our engine we decided to measure which was the fastest way to draw geometries on the platform.

We started with a very simple experiment, where we measured the rendering time of some test geometries, which were converted to different “primitive types”. Our geometry would be converted and stored in 4 different ways:

  • One Triangle List (3 vertices per triangle.)
  • One Indexed Triangle list (Variable number of vertices per triangle. Duplicated vertices were removed and reused.)
  • Many Triangle Strips (2 vertices + 1 vertex per triangle. Many draw calls.)
  • One Triangle Strip (2 vertices + 1 vertex per triangle. Contains degenerate triangles – triangles with zero area used to glue strips together.)

Our test geometries were grids and 3D cubes with different levels of detail, where the number of triangles in each geometry was something around: 10, 50, 100, 200, 400, 600, 800 and 1000. We also certified that the screen area occupied by each geometry didn’t change over its different LODs. Note that on the Zeebo we were able to render at most ~10k triangles.

So we measured the rendering time, and our first results showed something like this: (Note that the numbers below are not the real results – as I obviously don’t remember them – but some numbers to illustrate what they looked like at that time).

~500 Triangles

  • One Triangle List: 4 ms
  • One Indexed Triangle List: 5 ms
  • Many Triangle Strips: 6 ms
  • One Triangle Strip: 2 ms

At first I was quite disappointed with our indexed results, I don’t remember how many duplicate vertices we had, but I remember that we were using 16 bit indices and 80 bit vertices (3x16b POS + 2x16b UV) in our tests. In the other side, I was quite impressed with the single triangle strip results – again I don’t remember our degenerate triangle ratio.

Our results with indexed triangle lists and triangle strips made me think that we didn’t had a proper cache in the platform but some kind of registers (or other trick) to reuse vertices on strips.

After that first experiment we got the impression that we should always use triangle strips. Then we started our second experiment, now using real game models (environment and character models). The results that we got for a few models were the same as our first experiment but in “bad” cases we got something like this: (Again, not the real results as you would expect)

  • One Triangle List: 4 ms
  • (We removed indexed test)
  • Many Triangle Strips: 11 ms
  • One Triangle Strip: 7 ms

These results showed us that for some real game models the ratio of degenerated triangles in a strip was so high that it was better to just draw a normal triangle list. We tried to improve our strips using different approaches but it wasn’t enough. Our problem was really being caused by different UVs and colors (yeah, we had per-vertex color in this experiment) being applied to neighbor vertices, generating many small strips of four vertices.

At that point we noticed that we would be better splitting our geometry into a triangle list and a triangle strip, where big strips would be merged in a single strip and small strips would be converted back to triangles and merged in a single triangle list. The only question that we had to answer was: after how many vertices is a strip big enough? And it was something around 7 vertices.

So in the end, our build pipeline looked something like this:

  1. Generate triangle strips for the geometries.
  2. Glue all strips with 7 or more vertices into one big strip.
  3. Convert all remaining strips into a single triangle list.

Therefore, for few objects we would just render a single triangle list or strip but for the majority of them we would render both a list and a strip of triangles.

To conclude, although all I said here was based on the Zeebo console, it’s likely that it applies to some mobile platforms (especially the ones that use Qualcomm chipsets with Adreno GPU). In other platforms, like PC or consoles, you would probably use indexed triangles and reorder indices for cache coherency (where in the best case you would have 2 vertices + ~0.5 vertex for each triangle).