Ever wanted to know which parts of your codebase that keeps you from reaching the nirvana of 60 fps smoothness? Tired of not being able to pinpoint and replicate the use-case which causes your game to display strange frame time patterns on your artists’ machines? Look no further.

This is the first post in a series of four where I will go through the steps of building a simple but useful profiling library for multithread/multicore games. It’s not a profiler on instruction level like VTune and similar tools, but rather a library for measuring and capturing profiling data on a block/function level. It will also provide the functionality to record and analyze performance remotely through a network or by capturing the data stream to a file. The reasoning for reinventing yet another wheel is that I wanted something unobtrusive that was easy to use and had no dependencies.

In general, I want to ask fellow programmers out there who make their code available to the public to please make the effort to cut down on the dependencies. Using other libraries is all well and good, but it makes using your code so much more painful, especially when these libraries in turn have other dependencies. I’ve seen countless examples of useful bits and pieces of code on the net with horrible dependency chains (Boost being my pet peeve) which makes them more or less useless in my book. After all, if you released the code you probably want people to use it, so why make it more painful than it has to be?

Ok, rant over. My goals for this exercise:

  • Simple and easy to use API
  • Written in C90 for portability and ease of use in other languages
  • No external dependencies except the standard C runtime and OS provided libraries

This first part of the series will focus on solving the core issues of any profiling code, which is the problem of measuring elapsed time. All code for this post (and eventually the entire profiling library) will be released to the public domain through a github repository available at