I wanted to share a pair of tricks I picked up a couple of years ago, and while they’re both quite simple, they deserve to be remembered. They’re both a form of low-pass filtering, and while the common usage of low-pass filtering is probably a lot cooler, like controller input smoothing or texture compression, I’ll talk about monitoring and statistics. Oddly enough, there will be a reference to backup-tape schedules in here.
Looking at statistics over time is easy, just store everything on your hard-drive with gigabytes of free space, or upload it to the cloud, or… But what if that’s not an option? Let’s say you think you’re experiencing a slow but steady decrease in frame rate and you want to examine the frame rate history over the last couple of hours. You can’t afford to save that to disk, as that on its own would probably either kill the frame rate, which you’re trying to monitor in the first place, or fill up your memory trying to buffer it all. If you’re on a mobile device, you’re probably in an even worse situation. Not to mention that you’d probably have to start over an run the test again for a couple of hours with the monitoring turned on.
The moving average
One of the most well known filtering techniques is the moving average. The most naive approach is to simply store a number of values and keep averaging. The drawback of this approach is obvious, if you want a very large window, you’ll have to store a lot of values, and the averaging is going to take some time as well. The common solution to this is to use a exponential window, which can be achieved by weighing the previous result with the latest value
The idea here is that the previous results contains information on all previous results, with an ever decreasing factor. This becomes obvious if we expand one step
Expanding further we’ll see that is of course between 0 and 1, otherwise it’d be a strange weight.
Ok, so that works. By having a small , we can increase the influence of previous values and widen the window we’re averaging over. Yes, theoretically old values will never cease to influence the result, but in practice, the weight will eventually cause it to be insignificant (and at some point it’ll fall victim to the limited precision of our computers).
Load average
Let’s say we’re averaging frame times using the moving average above, we’ll calculate a new value every frame. This will give you a value, and it’ll most likely be good enough for whatever you’re testing, but it will probably not be strictly what you think it is.
We’re adding frame times every frame, and displaying the inverse to get a frame rate. The important thing to note here, is that the averaging window is defined in terms of number of frames, rather than in seconds. So, if we have a window of 60 frames, at 60 fps it’ll be a one second window, but at 6 fps, it’ll actually be a 10 second window, tricking our mind.
What our mind prefers to work with is a fixed time window. There are two ways to go about this, you either calculate which coefficient we want to use and add every frame to the result as before. This usually ends up evaluating some exponential function, which isn’t much fun. :) Or, you start counting frames and make sure you reset the counter at fixed intervals and adding that value to the result. Incidentally, you’re now counting frames per second, rather than seconds per frame, funny how that works out sometimes. This is actually how the Linux (and probably other *nix variants as well) calculates load average, with the notable addition that it’s doing it using fixed point arithmetic, I seriously suggest checking it out, as it is simple yet quite brilliant.
Round robin databases
This technique dawned on me while I was working as a system administrator several years ago. It was actually quite ingenious, we had four ‘week’ tapes, one for each weekend of the month, 11 month-tapes for each first weekend of each month from February to December. In January, we’d use a Year-tape (rather than a month tape). The thing is, at any given time, we’d have one backup a week for the last four weeks, one backup a month for the last year, and one backup a year after that. You don’t need to “filter” the backups the same way you do numbers, as any given backup contains all changes made to the file system up to that point, but the same concept holds.
How it works with numbers
If you haven’t seen it before, check out Wikipedia page. A round robin database is basically a set of circular buffers, one for each level or ‘decade’, and when you need to reclaim previously written data, you take a whole chunk of it, aggregate it, and insert it into the level/decade above. For each level, the stored value corresponds to a larger time span, which means that in the near past, you’ll have the highest precision, and as you look further back, the data will become more and more blurred. This means you cannot go back and examine your data up close, but the resolution will hopefully be enough to see how your frame rate steadily dropped over the last hour.
So why do it this way? Well, let’s assume we count frames per 100 ms, we store 32 values on each level (which don’t need to be the same size of course and you may tweak it to your hearts content, but I’ll use the same size here), and you combine four values whenever you aggregate. In the first level, we’ll have 3.2 seconds worth of 100ms counts, in the second level we’ll have 12.8 seconds worth of 400ms counts, in the third we’ll have 51.2 seconds worth of 1.6s counts. At the sixth level we have almost an entire hours worth of history, by storing a mere 192 values, compared to the 36 000 values required to store it in raw, and you still have the last couple of seconds at full resolution. Granted, the resolution at the sixth level is 102.4 seconds, but if you’re graphing the entire hour on screen, that might not be so bad (especially on a mobile device). :) At 102.4 seconds, there might also be plenty of time to queue it to be written to storage as well, without much penalty.
Aggregation
You need to pay attention to what you’re measuring. If you’re tracking a rate (like frames or packets per second), you’ll need to average the values. If you’re only tracking a count, you’ll need to sum them up. If you’re tracking the worst frame rate, you’ll need simply grab the largest value, if you’re tracking the best frame rate, you take the smallest. You see where I’m getting at, there’s more to it than simply storing the history, using this technique you can store the average, along with best and worst values, and still only use a mere fraction of what was required if you were to store the raw data. It might even be worth having it activated at all times, so you don’t have to rerun that hour long test to get the results.
Wrap-up
So what am I using this for? This and that actually, usually outside game development, but the reason I brought it up is that I find it rewarding to see player progression over time, and I wanted to have it built into this little gadget I’m working on, where I don’t have the luxury of being able to store massive amounts on disk (mobile device) or online (low budget).
Note: Cross posted on my own blog, excu.se