Please wait, pausing…
With the ongoing march of technology, the CPU power and quantity of RAM available to modern games has increased exponentially with each generation of hardware. However, each time the “data transfer gap” has widened – CPU speed outstrips the ability of memory to supply data, and in turn memory capacity overwhelms the transfer rate of bulk storage media. And the end result of the latter is our good friend Mr Loading Screen. He’s one of those “friends” that turns up uninvited, hangs around longer than you’d really like, and then just when you think you’ve seen the back of him, there he is smiling into your face as you choke back bitter tears of deep-seated resentment and sorrow.
So, what can we do about that?
Well, the first port of call when optimising a load process is to look at what is actually getting done. The best type of loading is that which doesn’t happen, so a good first step is to check that all the data being pulled in is actually needed – it’s remarkably easy to accidentally re-load assets which you threw away mere moments ago (to make room for them being loaded again), and one game I worked on would even do the opposite – loading in assets only to discard them before the level even began! (This was a consequence of the loader looking at a list of objects in the map file and grabbing all of them, unaware that the designers were placing objects for several different scenarios in the same world zone and then deleting the objects that they didn’t need for the current stage from a script…)
Once you’ve isolated what data is getting pulled in, you can then do a very simple calculation – given the amount of data, and the speed of your media, you can tell what the minimum possible load time is. Often, that’s quite a scary number, because it’s both a lot higher than your goal and a lot lower than where you are now. It’s a good number to have, though, since you know that without reducing the data volume somehow (through compression, for example), or shifting the load to another source (such as a hard disc cache, if your source media is optical), you’ll never get below that. Equally, though, given a decent run at the optimisation process you should be able to get pretty close.
To get close, then, what do you need to look for? In my experience, the two biggest time-wasters during loading are seek delays, and (unnecessary) processing overhead.
Seek delays come about pretty much exclusively on spinning media (hard discs, optical drives), and are caused by the fact that the disc head has to physically move to a new location to read data which is not directly following whatever it read last. The distance the head has to move determines how long this takes, and the worst-case times can be utterly, utterly horrific – on one popular platform the worst-case disc seek is a over a quarter of a second! There are lots of nasty little gotchas involved in predicting seek times, too – for example, on optical drives with dual-layer discs switching layers involves a laser refocus, which can be very expensive, but at the opposite end of the spectrum very small seeks can actually sometimes be done without a head movement at all, making them fast… up until a certain threshold where suddenly performance drops off a cliff.
Another awkward factor with spinning disc media is that all modern systems are Constant Angular Velocity (CAV) based. This means that as the disc spins at a constant angular speed, the speed of the disc surface relative to the head (the linear speed) gets faster the further you get from the centre of the disc. This gives an increase in read speed, and also an increase in data density relative to lateral head movement (since the radius of the tracks increases) – so a fixed-size seek will require a smaller head movement at the edge of the disc compared to the centre.
If you have the time to spare, empirical testing is definitely the way to go here – make a test program which performs a pattern of reads and seeks across the whole disc, and then chart the data you get (preferably from a number of different machines – optical drives in particular can get much slower as they get older, and quite a few consoles have shipped different revisions with different drive mechanisms, each with their own unique performance characteristics). Good data will show all sorts of interesting things, such as the threshold at which head movements become necessary, how big the on-drive cache is, and what happens during a layer change (on some setups, swapping layer without seeking can be faster than a medium-sized seek, if you can arrange for the data on the two layers to be lined up correctly).
That said, the general rules of thumb are pretty common – put data which is going to be loaded together all in the same place (ideally packed into a single file so that it can be read with a single read call), and put the data which is most commonly-used or slowest in practice at the outer edge of the disc. Drew Thale’s excellent post yesterday on “Real Unusual I/O Slowdowns” has not only some awesome diagrams demonstrating the dangers of seek times, but also advice on layouts for hard disc caching – go and read it now!
In an ideal world you want to rearrange your actual loading process to slurp up data in large chunks, but if this is impossible (especially in the last months of a project, which is often when load times first start becoming a worry), then a decent substitute is to beg/borrow/steal some spare memory and build a caching system. If you know the order files are going to be loaded in, then you can stream them from disc into a block of memory, and then redirect the normal load requests to your cache code, simply memcpy()ing (or even better, decompressing – if you have CPU time spare, reading compressed data and decompressing is usually faster than reading raw bytes) the data when it is requested by the game.
It’s possible to do this even if you don’t know the ordering of loads, as long as it is (vaguely) deterministic – one game I worked on many years ago had a completely intractable mess of a loading system (and 2 minute+ load times on a good day), and a certain insane genius coworker sped up the process with a two-phase system. Firstly, he built the game normally, but with a hack that printed out every file access to a text file. Then he would load each level in turn, and copy/paste the list of files. With those, his streaming system could prepare the data so that it had exactly what the game was going to ask for loaded in the order it was needed… and he could make a second build using that which actually loaded at a (vaguely) sensible speed.
I can’t say I’d recommend this system unless you have no other choice, though – quite aside from the insane build workflow, we had a persistent bug where the game’s NPC system would crash almost 100% repeatably on one disc at random but not others (or indeed even a rebuild of the same disc). Eventually we tracked it down to the fact that the system spawned male and female NPCs at random – and occasionally when building the stream file it would happen to fill the level with nothing but men or nothing but women… with the result that the meshes for the other gender were never loaded, and hence didn’t appear in the stream data at all, causing a crash when next time round the game did try to reference them!
Aside from data layout, the other big issue that seems to come up frequently with connection to load times is that of processing overhead. This can either be essential things (unpacking data, allocating memory, etc), or unessential (the sheer number of games I’ve seen in which a significant chunk of the loading time is due to drawing the load screen is quite insane). Unessential stuff is easy – get rid of it, or at least reduce it to a bare minimum. Essential stuff is harder, obviously – often fixing behaviour here means moving processing offline and getting data as close to the in-memory format as needed. Parsing text files, constructing large object graphs and swizzling data are all things which can eat huge amounts of time, and should be done in the build pipeline if at all possible.
Another angle worth exploring with processing time is if systems are properly optimised for the loading process. A lot of the time performance trade-offs which make sense in-game can really hurt you during loading – for example, uploading textures to the GPU one-by-one as they are modified saves management overhead for the handful of places where it happens during gameplay, but batching as many as possible and transferring them in one go is likely to be significantly faster when loading. Another common cause of pain is memory management – often allocators are tuned to avoid performance spikes at the cost of a higher average call time, whereas during the load process it is only the total time taken that you care about. Oh, and watch out for things which try to synchronise with game frames – it’s all-too-common to find that huge amounts of time are being spent waiting for vsync, or because some system has a cap of doing at most <n> operations in a frame (again, a sensible measure to avoid in-game spikes, but deeply unhelpful during loading).
Finally, the best load screen is one the player doesn’t notice – if you can hide loading underneath something else (during a menu transition, for example, or a mission briefing), then it won’t matter how long you take, providing it fits. Another approach is to make the load screen interactive in some way – Bayonetta is a good example of this, giving the player the opportunity to practice moves and learn combos during loading. Remember the good old days of Space Invaders on the loading screen and be creative!
And where does the title of this post come into it? Well, whatever game you’re working on now, please take solace in the fact that it cannot possibly be as bad as a project I worked on a long, long time ago in which that particular horrific message was our last-ditch (and shipping) “solution” to the fact that even the pause menu was taking an unacceptably long time to load…