Why Names Matter

Instapaper Text

Why Names Matter

A common motif in many fantasy settings is the idea that naming something gives you power over it. And nowhere is that more true, in my opinion, than when writing code.

Names in code are simultaneously the most and least important thing. After all, a large part of the compiler/assembler’s job is to take all of the names of variables, functions, classes and get rid of them in favour of address, indices or register numbers. It’s like humans giving names to cats – no self-respecting cat is going to ever actually use the name a mere human gave them, even if occasionally they will deign to show interest when called by it.

So, the names don’t matter. Except when they do.

The reason they do, of course, is because only a relatively small proportion of the average program is written for the compiler’s benefit. The rest – the comments, the whitespace, the helper functions… are there for the programmer. They’re there to help you. They are your friends.

So why is it that, as programmers, we tend to treat our friends so badly? After all, if we were more caring, things like this wouldn’t have happened to me (or happened to other people because of me):

A large codebase where both “matrix” and “Matrix” (entirely different classes) existed in the global namespace, and did approximately, but not exactly the same thing, in almost entirely different ways (most notably, one was 16-byte aligned and the other… well… was 16-byte aligned approximately 1/4 of the time, shall we say? You can imagine the hilarity that caused).
One project which had world cells and streaming cells and audio cells and collision cells and no-one could ever tell them apart in conversation without detailed cross-examinations of every statement made.
A codebase where every single object had its own namespace, and inside that namespace was a static function simply called Init(). Throw a hefty dose of using statements into the mix and the only realistic way to figure out what any given call to Init() actually did was to breakpoint it in the debugger, step in and see where you landed.
A large function consisting of about 7 nested loops each using a set of variables that started out in the outermost loop with names like “x” and “y” and then acquired (approximately) one extra letter per level of indentation, giving “x”, “dx”, “dx2”, “dddx” and so on… which I had to convert to MIPS assembler.

The big problem with this is that in the short term, and in some case the medium term, it’s easy to get away with it. When you’re working on the code you can relatively easily remember what goes where, and if something clashes with a module elsewhere then… well, as long as it isn’t a compile error it’s easy to overlook. The problem arises when someone else needs to look at the code, or when in a year’s time you come to work on it again, or even simply when a designer comes around to ask if the “cell” object in the editor is used for streaming or collision or both? And only then does the true scale of the problem become apparent – all those little questions and doubts and fears slowly build up until you realise that you’re wasting huge amounts of time looking things up, guessing, clarifying documentation… the only way you will ever feel clean again is to take the code in question and Kill it with Fire.

So, what can be done to avoid flamethrowers in the office? Well, my belief is that as with many code design-related problems, the only ultimate solution is to keep worrying about it. Not constantly, not to the point where it becomes a factor in stress-related hair loss, but when designing any new system, or even just writing a new function, I think to myself:

1) Is it unique?

This generally trumps all other concerns. If two things have the same name, they get mixed up. Things getting mixed up is about as bad as it can get, and opens the door to All Sorts of Secondary Unpleasantness to boot. On a couple of projects I’ve worked on we actually had a naming whiteboard, on which people would write any new terms they came up with (and a one-line definition) – this worked wonders in keeping people away from overlapping names, and avoided too many “excuse me but WTF does this thing do?” moments.

2) Does this name make sense?

Ideally, people seeing the name should understand what it refers to… however, this comes second to uniqueness – an invented or arbitrary yet unique name (see the use of “Widget” by UI toolkit for an example of this) is almost always preferable to a correct yet overused one.

Similarly, consider the use of language – for example, function names should generally be verbs, whilst variables and types are nouns (or at a stretch adjectives). Sometimes a tiny subtlety can make all the difference – for example, Vector.Normalise() is a verb and therefore logically performs an action on the object itself, where as Vector.Normalised() is an adjective and should probably return the normalised vector, leaving the object itself unaltered.

Arguably, GetNormalised() might well be better in the general case (as it is more obvious at a glance) – however maths classes are one place where readability of the resulting code also very strongly impacted by the length of names, as people will often stack many operations onto a single line of code, and hence shorter may well be better, even at the cost of some linguistic ambiguity.

3) Does it fit with any existing naming schemes?

principle of least astonishment

Mixing metaphors is also to be avoided – an object you Engage() but then Deactivate() is just confusing all-round. Get() and Set() should do what they say on the tin without side-effects. Likewise I always try to steer clear of negatives when describing boolean variables or results, because once someone has to write things like “if (!IsNotValid())” you can guarantee there’ll be a bug as a result.

4) Is there any additional information I should provide here?

I generally think of the name as the thing that people will see in the IDE, and the sole bit of information they will (hopefully) read – especially in the modern age of autocomplete, where people often hunt for functions and variables simply by trawling the list of names in a menu. Is there anything really, really important to tell that person? “size_in_bytes” is wordier but so much more useful than simply “size”, or how about the oft-used but still-invaluable SomeFunction_DoNotEVERCallThis() pattern? (otherwise known as “the closest games programmers ever get to using private: properly”)

As a side-note, whilst not wanting to start a religious war about notations, my personal opinion is that with the current state of autocomplete/Intellisense/etc in IDEs, putting actual type information in variable/function names is redundant, and simply asking for pain when it becomes necessary to change the type later.

5) Am I going to regret it later?

There’s always some point where I’ll end up bashing my head off a table because of a naming decision, and since I know that really hurts, it’s worth avoiding where possible. Assuming that the above are all in order, most of these will be technical issues – like the pitfall of using “Handle” or “Socket” in C++ as the name for just about anything, for example (in particular, Windows has a really, really nasty habit of using #defines on names like these, resulting in the strangest compile errors imaginable). Or in C# naming a property with a name that has funny qualifiers and/or only makes sense to someone reading the code – lots of things (such as the XML Serialiser and PropertyGrid components) will use that as the name to expose to the end-user.

If I was a better person, I’d include “using UK English spellings” here, as they just seem to end up causing trouble (especially “colour”, which I suspect many of my current and past teammates would cheerfully punch me over the use of). However, I’m not a better person, and (largely) unrepentant on this one.

I’m sure there are more good rules-of-thumb that people have come up with (comments thread, anyone?), but that’s my shortlist.

#AltDevBlogADay

Ben Carter

Why Names Matter