I’m continuing my series of actual problem I/O patterns that I’ve seen, heard about, or debugged in games or game-related situations. These are all real-world situations, they often cause unusual or unexpected behavior, and sometimes they’re just real unusual!

This time we’ll be talking about a weird edge case which I’ve seen affect several games, which is usually noticed by a particularly observant tester.

Previously:

The Problem

A major console game provides a menu to allow you to go back and replay cutscenes that you’ve already seen. Pretty standard, right? Well, a tester has noticed that he can make these movies stutter and halt pretty much at will.

The steps to reproduce are:

  • go into the cutscene menu
  • pick the first movie, and play it all the way through
  • pick the second movie, and it will pause and stutter a few seconds into playback.

It doesn’t appear to be a problem with the movies themselves; both the first and second movies play normally during gameplay. This only happens when playing from the menu.

At first glance

A stutter this big is normally attributable to one of two things: a huge seek, or drive spin-up. Since there’s no particular reason for the drive to seek, it’s probably a spin-up.

So, let’s bring it over to the AV guy and bounce it off him. “Yes,” he says, “but the weird thing is that movie playback is buffered with a pretty large buffer. A couple of megabytes. Before playback begins, we pre-spool the video into a pretty large buffer. The player doesn’t start playing the video until the buffer is completely full.”

Which means that before the movie playback would start, the drive was in fact being accessed… and furthermore, it was returning data. Several megabytes worth, in fact.

Huh. So maybe it didn’t spin down after all? Was it in fact a seek?

Taking another look

Let’s dig a little deeper.

Talking to the tester and running a few reproductions of our own, we can see that it only happens for videos which are right after each other. If you play 1 followed by 3, no stutter. If you play 2 followed by 4, no stutter. If you play 2 followed by 3, stutter.

Okay. Now let’s look at the physical layout of the disc. Turns out that the cutscenes are all stored at the end of the disc, out of line with the game data, and they’re stored in the same order they’re listed in the menu.

Data layout of the disc

Data layout of the disc. Game data is at the beginning, and cutscenes are at the end.

At this point we’ve got enough information to figure it out. The drive is indeed spinning down, but there’s an extra complication which makes it particularly unusual. Let’s answer the two main questions:

Why does the drive spin down?

Movie playback goes through a big buffer. At the end of the movie, there is a stretch of time where the buffering system has read all of the data and there’s nothing left for the disc to do.

Then there’s a little bit more time for the tester (who’s probably seen the movie hundreds of times, and might have been off doing something else) to pick up the controller and navigate through the menu to select the next movie.

This length of time (draining the buffer, plus idle time, plus menu navigation) was long enough to cause the drive to spin down.

Why doesn’t the drive spin back up when the next movie is read?

This one is interesting because it gets all the way down into the optical drive’s firmware, and how reads are processed.

Let’s try to diagram it out. Here’s a conceptual model of how the spice data flows:

Data flow during movie playback

Data flow during movie playback.

Great! Now let’s animate it and watch as the first movie is played.

(For the purposes of this diagram, I’m showing the drive cache as entirely readahead; drive caching schemes can be much more complicated, but this portrayal is more-or-less accurate in this case since the drive has been playing a linear stream of data.)

Animation of the data flow during movie playback

Animation of the data flow during movie playback.

Check out the state at the end of the movie:

State at the end of playback

State at the end of playback. Notice the contents of the drive cache!

The drive cache is filled up entirely with data from the second movie, and the drive has spun down. That leads us to…

The root cause

The drive wasn’t spinning up during the initial spooling of the second movie, because playing the first movie caused the drive’s on-board cache to fill up with read-ahead data from the second movie.

When the second movie started playing, the pre-spool hit entirely in the cache… which, in this case, meant that the disc didn’t spin back up until after the movie had started playing.

Why didn’t it happen if the user selected any other movie? In that case, the contents of the drive cache (the beginning of the second movie) would not have been useful, so the drive would need to spin up during the initial buffer fill.

Why didn’t it happen during normal gameplay? During the game, there’s a lot of other stuff going on immediately before the cutscene plays (level loads, texture streaming, etc) so the drive doesn’t have a chance to spin down. The problem only occurs in the special case of playing movies from the game menus when nothing else is touching the optical drive.

Solutions

Now that we understand the root cause, there are a couple of solutions possible.

First, if there is an API to explicitly spin up the drive, you could call it before playing a movie from the menu. This is by far the simplest answer, and it would work great if the OS actually provides an API for drive power control. But alas, many operating systems do not provide this feature.

Second, you could do the same thing as #1, but indirectly forcing a drive spin-up. Unfortunately there’s almost certainly no good way for you to know when the drive has spun up… so you will probably be left to create some ugly empirical algorithm. Here’s one I would suggest:

  • grab a starting timestamp
  • repeat until at least 200ms have passed:
    • open a random file
    • read a random byte
    • close the file

If the algorithm picks a byte that is in the drive cache, it will come back in less than 200ms and you’ll repeat. If it picks a byte that isn’t in the drive cache, the drive will have to spin up to read it. In all cases, the algorithm gives up after 200ms no matter what. (This timeout is important; consider what might happen if your game was running in emulation on some future console with its disc image mounted from solid state storage!)

Third, if you have space free on the disc, you could simply insert padding between the movies in your ISO layout. The padding can be real data or it could be a file of zeroes; it doesn’t matter. It just needs to be “big enough” to prevent the second movie from getting into the drive cache at the end of the first movie. 16MB is sufficient for the current generation, though I’d probably at least double that to 32MB if it was feasible.

In the games I’ve worked with I’ve seen both the second and third approaches. Both worked fine and solved the problem.

What do you think?

Have you ever encountered this problem? I wouldn’t be surprised if it’s fairly common. How would you deal with it if it happened to you?