In my not so long game developer career, I encounters lots of particle systems, either middleware or custom made. But most of them have the same issue : Memory patterns. And today’s system are really sensible to bad memory pattern. So let’s try to fix them.

Most particle system include a particle structure, containing all the data needed to update and represent a particle.

struct Particle
 
  {
 
      bool isAlive;
 
      Vector3D Position;
 
      Vector3D Speed;
 
      ...
 
  };

Most systems are made of processing nodes that are pipelined to modify properties of particles : a gravity node, a color change node, … All processing node loop over the particle table and only apply the changes if the particle is alive.

for each particle in particle_table
 
  {
 
      if( !particle.isAlive )
 
          continue;
 
   
 
      // Modify particle here
 
  }

Let’s run this code and profile it, result are quite clear :

Cache miss

Let’s analyse what happens with this pseudo code :

1. fetch the particle data memory
- if the data is not in cache, L2 miss
- data is there, ok continue
2. test the liveness
- if dead, skip to the next
- else process particle.
What happens to the cache line you just fetch if you skip the particle? Just wasted!

Even worst if each node test for liveness of the particle, 4 nodes, 4 wastes.
So what can we do to avoid the problem? Separate data frequently read from data that might be skipped. So instead of having a table of structure, we have a structure of table :

struct ParticleTable
 
  {
 
      vector<bool> LivenessTable;
 
      vector<Vector3> PositionTable;
 
      vector<Vector3> SpeedTable;
 
      ...
 
  };

Now, you don’t fetch data you will not use. Bool vectors even use a single bit per item. It also profits node, as most of them use a small part of the particle data. Gravity node won’t change the color!

But it can still be better. With the current structure, there will still exist a lot of waste. If you need a single position in a line cache, you still need to fetch the entire line. Assuming a 128 bytes cache line, a line contains about 42 positions. So to skip the line, you need to skip those 42 particles. It won’t happen that much. To improve the cache usage, the idea is to group data that are processed together.

Position, speed, angles,… are often used at the same time. Color information, tint are likely to be used by the node in question. So a new structure is created :

struct PositionData
 
  {
 
      Vector3 Position;
 
      Vector3 Speed;
 
      float ScreenSpaceAngle;
 
      ...
 
  };
 
   
 
  struct ParticleTable
 
  {
 
      vector<bool> LivenessTable;
 
      vector<PositionData> PositionDataTable;
 
      vector<ColorData> ColorDataTable;
 
      ...
 
  };

Depending on your game, those structures might change. Adapt them to your access patterns. And if possible align those structure to the SIMD alignment.

For example, it might be interesting to organize the PositionData structure like this :

struct PositionData
 
  {
 
      Vector3 Position;
 
      float ScreenSpaceAngle
 
      Vector3 Speed;
 
      float ScreenSpaceAngleSpeed;
 
      ...
 
  };

This way all position process could be done in SIMD with no difficulty. Position and speed can be loaded by a single SIMD aligned load instruction.

A particle system is just a data transform, so all care should apply on setting up the data. Most performance issues are just dependent of this setup, as usual!