In this post I’m going to show how you can start using Heap Profiler to examine the way your Windows applications use memory. Since as a tool it’s still in development, we’ll probably run across some rough edges, but I hope you’ll be able to see how it can be useful.
What Heap Profiler does is monitor a running application and record every call it makes to Windows’ heap management functions, to allocate and deallocate blocks of memory. Each allocation is associated with a stack trace, so using that you can typically identify the source of the allocation. Heap Profiler attempts to provide some sophisticated analysis tools to help you break down all the data and examine the way specific subsystems in your application use memory. In this post, we’ll be profiling a relatively sophisticated application (Firefox) while it is sitting idle on a webpage many of you might visit – BBC News. We’ll try to identify a few allocations that most people might call memory leaks – in this case, allocations that go away properly on exit, but accumulate over time while the browser is idle, wasting CPU cycles and potentially evicting more useful data from RAM and into swap.
To get started, first we’ll boot up Heap Profiler and configure it to profile the browser.
We fill in the location of the app we want to run, and provide any arguments and override the working directory if necessary. In this case, I specify that a specific testing profile should be used (instead of my main browsing profile), and that any existing Firefox session shouldn’t receive messages from this new session I’m starting. It’s important to avoid letting existing data and other applications interfere with your profiling data, and these two parameters accomplish that. For many applications, you’ll also want to configure your symbol server settings from the Options menu; since I compiled Firefox myself, I already have symbols in the right place, so that isn’t necessary this time. I’ve also turned on Auto Capture, so that Heap Profiler will take snapshots of the application’s heap usage periodically without me having to manually take snapshots.
At this point we’re ready to start the browser, so we click Launch.
We wait a moment for the browser to start up, and we can browse to the page we want to profile. While this is happening, Heap Profiler is paying attention to Firefox’s behavior, and taking snapshots when appropriate, so we’ve already begun to get data on how it’s using the heap. Firefox is a bit slower than usual because we’re profiling it, but it’s more than fast enough to browse around, so if we wanted to, we could profile ourselves going about our daily business in the web browser.
You may notice that the Heap Profiler window frequently says ‘Caching Symbols’. Heap Profiler makes an effort to capture all the information you need when profiling an application; this way, you can load up a saved profile weeks or months later and examine the data, even though you may not have the application anymore (or its symbols may have changed).
Since we’re looking to identify leaks when BBC News is loaded, we can let the browser run for a while now. Make sure you save the recording somewhere from the File menu, so that you have access to it later – they get stored in your Temp folder by default. For the purposes of this post, I let the BBC News site run inside Firefox for around 3 hours the other day, and I saved the recording, so I’ve got a huge (1.1GB) heap recording on my desktop that we can dig into. Let’s open it up:
As you can see, we’ve got a large number of snapshots, so we have a general idea of what Firefox’s memory usage looked like during the profile run. In this case, we’re viewing the # of heap bytes being used – the process was actually using lots of memory outside of the heap as well, for things like shared memory, hardware textures, etc – but we care about the heap right now. You can see how the memory usage slowly climbs – with occasional large spikes – and then periodically plummets dramatically. We can speculate that those huge drops are from the browser’s garbage collector freeing up unused objects that have accumulated, but let’s confirm that. Drag-select from one end of a drop to the other, and then click Diff Selection.
After a few moments, the diff viewer will open up. You’ll have access to some detailed information on how the heap was used within the selected period of time, but in our case, we want higher level information. Let’s switch to Treemap mode, and group things by function.
The allocations and deallocations are now organized into a treemap, grouped by the individual functions in their stack traces. The functions nested within those stack traces are nested as well, so you can see cases where a single function calls out to multiple functions that all allocate memory. In this diff, almost all of our heap activity is deallocations, so things are clustered in the bottom left. The biggest deallocators are large enough to have labels, and the smaller ones have labels that will appear if we mouse over them.
We can expand the window to get a better look, or use the filtering control up top to narrow the set of allocations being rendered. We can also click on a single block to zoom in on it and look at it and its nested allocations, like this:
In this case, the function we clicked on is just calling other single functions to get its work done, so we can easily follow that chain down to figure out what kind of allocation we’re dealing with. We can also see from the size information that this function was responsible for many small allocations that added up to around half a megabyte of memory.
For the purposes of this post, let’s dive into the biggest apparent leak we’ve got in this profile – the one at the bottom right, str_indexOf:
In this case, we see that a huge amount of individual allocations are accounting for around 3 megabytes of memory that was freed by this garbage collection. This suggests to us that for some reason, a frequent operation is generating tiny amounts of garbage that add up. The name of the function, and the functions it calls, give us a good idea of where to start looking, so let’s dive into the source code:
We can use viewing the source code. In this case, we can quickly find the two function calls that are allocating memory, and it makes perfect sense that they would do so:
Neither of these function calls are conditional, so that would suggest that every call to str_indexOf creates garbage! Hopefully that’s not right. Let’s dive into one of these functions to try and figure out what they do:
We find that ArgToRootedString will, if the argument is a rope string, call JSRope::flatten, and if we examine the treemap a little more, we can tell this function is responsible for all the garbage:
At this point, we now have a pretty good idea of what’s going on: The BBC News website is frequently calling String.indexOf, and because they’re using rope strings, this operation creates garbage. If you dig around in the Mozilla source code (or ask someone who knows more about the JavaScript runtime), you’ll find out that rope strings are an optimization that makes concatenation of strings (“a” + “b”) faster by representing that concatenated result as a ‘rope’ of multiple strings, one after the other. This means that the concatenation operation is faster and produces less garbage – but the downside is that some operations, like indexOf, cannot operate on ropes, and we must convert the rope into a regular, linear string. As it turns out, this one is an open bug!
I hope this gives you a bit of an idea how Heap Profiler can be used to understand memory usage in windows applications. If you have any questions or comments, feel free to share them with me. If you’d like to try it out, all you need is the free version of Visual Studio 2010 and the source code from GitHub.