How many times have you seen names like:

actor_bone_arm_left_elbow_upper01
terrainChunk01
level1section2Zone5

I bet you’ve not really stopped and actually looked at those names for more than a single second. But if you had, you might have noticed something, and it’s really quite obvious when you see it.
All those names have their most unique information towards the end of the string.

I’ve always been fascinated by this, especially as I’ve seen more and more strings used as instance names. We know that almost every string compare function in existence works from left to right, so placing our most unique information at the far right is only increasing the amount of redundant work we’re doing to throw out incorrect matches.

“We should just reverse all our strings!” I hear some people say. Well, that wont always help. What we’re interested in is the distribution of unique information in the string. That unique information is determined not by the string itself, but by the data set the string is used to search. Taking the “level1section2Zone5″ string for example. If our data set contains more instances identified by the string “Zone5″ than those that are identified by “Section2″ or “level1″ we may not see an improvement by simply rearranging the chunks of information the string contains.

The solution is to sit down with whoever will be consuming these strings (artists/animators/programmers) and discuss what kind of names they need. Work with them to design names that both make sense from their perspective, but also yield the lowest average number of compares before a rejected match. Keeping the strings readable, but as short as possible.

Not every use of string identifiers needs a tricky naming convention. In a lot of cases the actual number of compares, or the size of the data set, is small enough to not matter. It’s a handy trick to have for when you need the extra performance but don’t want to drop back to using CRC’s.

As long as you can afford the memory, strings are the preferred choice for instance names. They make debugging extremely simple and you wont ever have to worry about a hash collision or keeping the order of your information deterministic. All you need to do is be careful about how you construct your names