A few years ago, I have read the best programming-related advice I ever came across in a blog post from Jamie Fristrom: “The Curtain Problem“. Ever since I have it stuck in my mind, and it is not some kind of semi-philosophical mantra, it’s actually quite concrete: everything that can be switched by different parts of the game has a high potential for bugs, and bugs of the worse kind, those that keep coming back.

“Pushing state is duplicating data”

Jamie’s example was the fade-in/fade-out curtain in the game Spider-Man 2. When the player died, the screen would fade to black, then fade back in when the character had respawned. The same would also happen when changing levels, or starting at cut-scene. Different parts of the game had the responsibility to lower the curtain, and raise it up later. Things started to get ugly when parts of the code (and scripts) had buggy code paths that could cause the curtain to stay down forever, or when things would overlap and raise the curtain while something else still needed it down.

Strictly speaking the curtain system was bug-free but using it was bug-prone.

I have seen that problem way too often, with parts of the GUI that should be visible or not (especially the mouse cursor), with player input that should be temporarily disabled, with background music that should play or not (or change depending on the location), and so on. If multiple parts of the code have full control over that kind of things, problems are bound to happen.

Even worse, you might end up with stuff that checks the state before changing it, and/or restores it when it’s done. At first thought it sounds good, but consider the following example: you unlock a room and enter it, someone else enters while you are there, and you leave before this person. You won’t lock the room because there is someone inside, and the other person will leave the room unlocked because it was not locked when she arrived. It happens to me all the time.

A sensible solution is reference counting. To continue the room example you can set a rule that says “the last one to leave locks the door”. In the physical world that works quite well, and on the programming side it solves the overlapping issues. But that doesn’t help much with the bugs, if you forget to release a reference, or if you add one twice, you are still screwed and debugging reference counting gone wrong is a lot of fun. You could make it automatic and bulletproof with RAII, but like with all automatic systems, at some point a special case will require specific handling.

So what?

Just like Jamie suggests: poll the state, don’t push it. Make the curtain determine by itself if it should be up or down, and make the door lock itself if nobody is in the room. Triggering the right transition at the right time then becomes a matter of comparing the currently required state with last frame’s.

That raises two problems, one is dependency, and the other is performance.

Dependency: what used to be an isolated service is now potentially polling things from everywhere in the game. The GUI might be calling the game to know if we are by foot or riding in a car… that sounds like the wrong way to go. But the controlling code can be decoupled from the GUI itself, as long as there is only one authority polling the state of the game for controlling that part of the GUI we are fine. So it’s not quite as bad as it sounds.

Performance: polling the state might not be trivial, and can very often be a branching mess, and that process is continuously repeated frame after frame. In contrast, when the state is triggering callbacks, the processing only occurs when something actually changes. That might become an issue, but very often those checks only need to be done once per frame, even if they look heavy and wasteful, they might not even show up on a profiler. And even if they do, remember that it’s always easier to optimize code that works than fix optimized code that doesn’t. On top of that, logic designed for state polling is a good candidate for careful memoization.

But the biggest advantage of it all is that you keep all the logic in one place, with a clear view of the order in which the checks are performed, and that makes debugging it almost trivial. Instead of looking for something missing, you look for something wrong, and that makes a huge difference. And another added benefit is that testing that kind of code won’t require mocking callbacks.

That being said, I just noticed I forgot to lock my car again.