The Falling Frame

Jill answered the knock at the door to find Al behind it. “Oh, hey Al, come on in. Tim’s in the living room.”

Al went back to where his friend Tim was squatting on the floor, inserting a bit into a variable-speed drill he held in his hands, fiddling with the chuck. “Oh, hey Al. What brings you by?”

“Oh, I just wanted to borrow your… hey, what are you doing?”

“Well Al, I’m going to mount this frame on the wall. It keeps falling down but Jill really likes it here.”

“By driving screws through the frame? Seems a little extreme.”

“Well, I have this putty, I think I can make it match…”

“Care if I take a look?” Al inspected the nail currently in the wall and saw the multiple holes there, along with one that looked like the nail had been pulled down on. “What’s the story here? Looks like you’ve tried a few things.”

“Well, I used to have it hanging just on a brad, but every few weeks I’d come through here on my way for morning coffee and see the picture on the floor. So I switched to a bigger nail and it’s come down again. I figure I’ll fix it for good.”

“Wait, hold on. You say you always find it this way in the morning.”

“As far as I remember.”

“And you found it on the floor *this* morning?”

“Yes.”

“Why didn’t you see it last night?”

“Well, I went up to bed through here, sure, but it was dark. I had watched an old war movie on the tube and then went up to Jill.” He fired up the drill to check the bit. “Ug ug ugh.”

“Just a second, there, maniacal dentist. Isn’t this wall shared with the family room?”

“Yeah, why?”

“Hold on just a second.” Al walked around through the threshold to the room beyond. “Tim, I think you ought to come in here for a second.”

Tim followed his friend through. “This is the wall in question, right Tim?”

“Yep.”

“And this is where you were watching that old movie last night?”

“Yep”

“Probably one with lots of explosions and such?”

“Yep. The Dirty Dozen, in fact.”

“And would you say that the subwoofer was thumping along pretty good amongst all the noise?”

“Yeah, I guess so.”

“This subwoofer here. The one touching the wall. The wall on which a somewhat heavy frame keeps falling off?” Al dragged the subwoofer an inch away from the wall. “Why don’t you pop another nail in that wall and rehang the frame on it, see what happens.”

“Well, I’ll be darned…”

Today’s little parable comes to remind us of one fundamental fact about programming, or more specifically, about debugging: you haven’t fixed a bug until you understand its cause.

Over the course of my career, I’ve run into dozens of types of bugs, and have probably committed quite a fair number of them. There are off-by-one errors, underflow errors, uninitialized data errors, constructor argument dependency errors (rare, but they happen), plain old logic mistakes, simple typos, and any number of issues which compilers have grown better at catching over the years, such as unintentional assignments inside of conditionals.

New hardware has brought all manner of different, fun new bugs to learn to diagnose and discover, and new opportunities to foul up, like DMA chaining issues or the host of hard-to-find Heisenbugs that occur when you’re working with multiple processors or threads sharing resources, such as unusual race conditions.

Now, I’ve seen lots of techniques to deal with bugs, some quite… eccentric. I think my favorite was a guy who, encountering unusual data inside of his local variables, would literally rearrange the order in which functions appeared in his source file until the bug disappeared — certainly the least likely to be effective method I’ve ever heard of for what were most likely uninitialized variables. Almost all of these methods proceeded from the same flaw; that the disappearance of the symptom indicated a solution to the problem.

In the end, I’ve come to believe that there’s literally only one thing that is guaranteed to be the same in any successful solution to a bug: you have to understand the cause of the bug first. Even as hardware changes and our approaches to developing also change, this is one I think that will remain constant throughout my engineering career: to solve a bug, you need to comprehend its root cause. In this case, as so often in engineering, it is better to proceed from knowledge, than from hope.

To do otherwise… well, to do otherwise you risk being a fool any day of the year.

Brett Douville is the lead systems programmer at Bethesda Game Studios, maker of Fallout 3 and the forthcoming TESV: Skyrim. He blogs occasionally at @brett_douville.