A few weeks ago I was trying to find one of the bugs related to memory corruption on the game we are working. Actually, the bug was first checked by a co-worker, but, as I’m a bit more experienced on that kind of bugs, I ended up being the guy working on it. He asked me how I did usually find those kind of bugs, so instead of going through a lengthy explanation, I promised I would write it down here, so more people would benefit from it, and it would probably be better organized.
The game crashes in random code spots
This is one of the usual indicators that something is either not initialized, or being corrupted by some code writing were it shouldn’t. I know it’s not a very precise indicator, but a good clue is that the code is crashing somewhere where it should never do. To make it simpler, somewhere where it doesn’t make sense. At all. It’s what one of the worst things you get when using C/C++. First, it’s interesting to actually distinguish between memory corruption and memory that is not initialized, as to which behaviors I expect to see. Take this with a grain of salt:
- Using memory not initialized: Data with unexpected values, even at game start up, that don’t seem entirely possible but don’t actually fit into another type. For example, INF / NAN values on floating points values after a few operations. If you’re working with Visual Studio / different executable setups, a clear indicator is the game behaving differently on debug/release builds.
- Memory corruption: Data with unexpected values, usually after a few game loop iterations. A good way to identify it, is to check specific portions of data on a memory viewer to see if a strange value is written over data, or if it’s being changed when it shouldn’t.
The first thing I like to do, is to check if I’m able to reproduce it. Most of this kind of bugs never reproduce, or reproduce in a spot not related to where the problem is. Even if that, that can give valuable clues, like which values are on memory when it crashes, or which behavior to expect on correct / incorrect execution. This kind of bugs are really hard to chase properly, so every bit of information you get is valuable.
I need tools!
So, can we use some tools to make the process easier? Yes, of course. What I recommend checking:
- Enable static analysis on your compiler: It can catch lots of stupid bugs automatically, and it’s a non-consuming time way to reduce the amount of problems you might find in the future. If your compiler doesn’t support the feature, or you want a second opinion, I recommend the excellent cppcheck.
- Valgrind: If you’re using Linux / MacOS, is a good option, or so I’ve heard. Never used it myself, as I primarily develop on Windows. I tried to use Valgrind + WINE, as explained here, without much success. Any way, I’ve heard so much good stuff about it, that I recommend checking it :)
- Dr Memory: I actually tried using it on one of the most evasive bugs we had on our latest project. You can find it’s web here. Detects un-initialized memory, erroneous access, and a few more. From the few days I used it, it seems a good tool, in combination with a few scripts / GUI to actually check the results properly. I expect to use it again in the future :)
- DUMA: I’ve tried to use it several times in the past, without success. The idea is simple, mark memory that you shouldn’t be accessing as protected, so you get immediate errors when accessing that memory. You can get it here, if you actually get it working (or know a good replacement), I’ll be glad to hear.
- gFlags: A coworker (hey, Jorge!) told me about this a few weeks ago. The Windows Debugging Tools (32 bit version here), include a little application that can set up your game’s heap as protected, and detect accessing parts of the heap you shouldn’t. It quite useful as you can get the exception while debugging from Visual Studio, pointing directly were you screwed up.
Desperate ideas
Another “technique” I’ve seen used a lot (and used it myself), is just dividing you application in half, and checking if the bug still reproduces (check thoroughly, memory bugs are difficult to reproduce). Of course dividing it isn’t as trivial as it might seem, but it’s doable. This can go from just avoiding to load certain type of assets, for example, don’t load the physics library and all related geometry, to skipping certain parts altogether, for example, if the game crashes when going from level 3 to 4, does it happen if you go directly from 2 to 4? Look for patterns!
It’s also quite handy to check memory with a tool that let’s you see what’s really there. Knowing a bit about debugging magic numbers helps. A lot.
Busy, busy, busy
So I’m quite busy now, hope this was a good starting point for those starting the exciting world of debugging memory errors </irony>.