There’s an oft-overlooked problem: when you build a game for actual release, you invariably get to the point (towards the end of the project) where you have to prepare many different builds: milestones, games shows, demos, digital downloads, QA builds, the final master, and so on.
Every single build is likely to be subtly different, with varying packs of levels, content, and so on.

As part of the (hopefully automated) process that prepares these builds, you need to work out which assets (aka files, things like textures, levels, shaders, whatever) are needed for each version of the game.

What we want is…

Ideally, you’d like to include the minimum required set of assets: to avoid bloat – which would lead to longer downloads, overflowed DVDs, bursting USB sticks, and tortuous build times. On the other hand, missing one tiny file (perhaps one that is only used in one rare circumstance, in the GDC europe german build, after 5 hours of gameplay) would be disastrous – some 14 year old in the test department is going to get SERIOUSLY STROPPY with you when the game crashes just as he’s acing the last level.

So you can’t afford to make a mistake. And of course, since this is year 3 of your magnum opus, every member of the team has been furiously checking in thousands of files, and your source control system is an absolute mess. Everyone has seen files like final_level_final_version_really_final_3_b.level checked in alongside final_level_4.level. Which one is it? I’ve no bloody idea, and it’s 4am!

What do you do?
Good question. I have no idea what YOU do, but I’ll just describe a simple trick we used on LittleBigPlanet (1 and 2).

As with all good things, the trick comes last in this post. And to get to the trick, I have to take you through a lot of pre-amble. Bear with me; every studio and project is different, and yet the same. We are all snowflakes, so 6 sided, yet so unique. I’ll just quickly try to cover the basics of what we do, so we can get to the meat.

Preamble: Setting the stage with LBP-style dependency tracking

In LittleBigPlanet, every asset we use, stores in a little footer, a list of all the assets it depends on. So a level might depend on a mesh might depend on some shaders, which in turn depend on some textures, and so on. Given a single asset, a trivial program (and we have many implementations of this routine, from the game itself, to our build tools, to our servers, etc) can find out what other files are needed when loading the top level asset. A tree, (you hope – cycles be damned!) of dependencies.

So far so good.

But now we have a ‘bootstrapping’ problem similar to garbage collection: given a desired build of the game, how do we determine which ‘root’ assets are loadable? For us, the root-most asset of all is the game executable itself; and it loads just a few key assets – the roots – soundbanks, fonts, the master level list, and so on; which in turn fan out, through the dependency footers just described, into the thousands of assets that make up a game. (and in the case of LittleBigPlanet, the terabytes of user generated content, which incidentally is handled using the same system).

Finding the roots, attempt 1

Initially, we maintained a list of these ‘root’ assets by hand, in the code. In fact, people could just write code sort-of-like:

Texture *my_awesome_texture = LoadTexture("mega_cool.dds");

How do we get a reliable list of root assets at build-time? Find-in-files? grep?
NO.
Because lines like thatwould be commented out, or the filename would be sprintf’d from some hideous template, or they wrapped LoadTexture into LoadTextureACoolerWay; and so on until the end of time. It was very, very hard to work out what assets were ‘reachable’ from the code, in an automated way.

Attempt 2: IDs

The next iteration, was to do away with the filenames, and give every file a numerical ID. (Actually we did this from the beginning, but that spoils my story, and anyway, filenames kept creeping into the source. No! bad programmer!). Eventually we banished all filenames entirely, and got down to IDs. Bliss. You could base the IDs on the hash of the name, or the hash of the content, or use a global numbering scheme; we do a mix of all 3, the details of which are not pertinent to this article, but are in fact very interesting. Another time.

Suffice it to say, the line above became

Texture *my_awesome_texture = LoadTexture(FILE_MEGA_COOL_DDS);

and then elsewhere, we would have a nice enum of all the files that people wanted to load:

enum
{
FILE_MEGA_COOL_DDS = 1003;
FILE_SACKBOY_MESH = 1004;
...
};

as I say, how you track the correspondence between integer ID’s and actual files in your source control is a whole-other-topic, and there are many solutions. As long as you’re consistent, you’re golden. Switching to this kind of ID scheme is a bit of a faff, but it means that you don’t have tonnes of filenames swilling around – in your code, and in the dependency footers described above. And your CPU will love you for not doing strcmp all the time.

Back to build packaging: Now we have a nice enum listing all the root assets, we can instruct our build system to take the enum, parse it, and build a disk image with every asset listed, along with all the assets that are required as dependencies.

Every game developer who has shipped a large game has probably been following along, nodding their head sagely, disagreeing on finer points left right and center, but hopefully mostly seeing that we’re headed in a sensible direction. Good.

Only now, we have the opposite problem: our builds are guaranteed to contain everything that they need – but also, they carry along a load of unnecessary junk.
You see, the root file list enum was hand-maintained, and just kept growing and growing, added to on an ad-hoc basis whenever any coder needed to load a file.

Doing a quick hack for TGS? Lovely! I’ll just add in FILE_TGS_SUPER_DEMO_LEVEL = 10035, load it in main.cpp and we’re good to go! Except now, we’re doomed to package that level and all its associated content, in every build ever after… because nobody will ever both to go in and ‘clean up’ later. It just doesn’t happen, does it?

Clearly, an automated solution is/was needed. (argh epic tense confusion! This code is still used, so I’m allowed!) Simple regex/grep based parsing of the source-code, looking for actual uses of particular enum values, sounded complicated and error prone (for example, #defines can really mess things up), but luckily there is a simple trick that works beautifully – or at least, it did for us. And this is the reason for this blog post.

Attempt 3: Profit! Payoff! Etc!

The trick is, we replaced the enum with integer variables, like this:

int FILE_MEGA_COOL_DDS = 1003;
int FILE_SACKBOY_MESH = 1004;

(side note: actually, we used a custom type rather than ‘int’, so that our LoadAsset() function wouldn’t just accept any-old int, it had to be an AssetID type (which is just an int) – an example of using C++’s static type system to your advantage, to help catch silly errors at compile time. But I digress…)

So now, we have a bunch of global ints. So what? well, now we can use the linker to tell us what root assets are needed! All you have to do is ask it to output a map file, and make sure you have dead code/data elimination switched on. What that does is, any symbol that is never used by any code, gets stripped out of the final executable; and the map file lists out just what is left. In other words, our build tool just needs to parse the map file, looking for FILE_xxx symbols. All the ones referring to files that are never used in this particular build, will have vanished; all the ones that could ever be reached by code, will be there. Magic!

As a result of this simple bit of linker-abuse, it’s possible to merrily #ifdef chunks of code in and out of use, and always be confident that the build tools will be able to find exactly the right set of assets to bundle into a particular build; never too little, and with very few ‘false-positive’ wastage.

I’ve only scratched the surface of a large and often religious topic; There are lots of obvious extensions, some appropriate for your game, some not. We implemented a few – for example, differentiating between strong and weak references to files (‘I want to check for the existence of the german version of the video, but dear build system, that does NOT mean you have to package german stuff in here, ok?’), and had other wrinkles related to UGC that made things more complex – but at its heart, using the linker in this way is a simple and fairly widely applicable idea. I hope you find it useful – we certainly did (and do).