Instapaper Text

What documentation?

Imagine a situation where you have just been employed by a game development company with a large code base. You might setup your computer, adjust your chair, get some coffee and then start synchronizing your source code repositories. This might take a long while so you decide to take a look at the company documentation while your downloading. So you roll up your sleeves and start digging. You dig and you dig, but you just can’t seem to find anything useful. Puzzled you go to ask either the CTO or the technical director whether they would have some information as to were the documentation is lurking. So you enter the room of either person and candidly ask your question, only to be met with a blunt answer stating that there is none.

You return to your desk, shoulders slumped, and sip some of that now cold coffee. You decide to wait for the source code update to finish. It now seems that you have to rely on code commenting to figure out how the system works.

Finally the update finishes and you start digging the source code. To your amazement and horror you notice that not only is there no separate documentation on the system but there’s also no documentation to be found in the code either!

Let us push that nightmare aside for a while and discuss what is going on in here. The situation is common not only in game development but in other kind of software development as well. It comes down to company culture in many cases. The companies I have worked for expect you to develop quality features sometimes in a tight schedule. Leaders are often focused on implementing features themselves and don’t necessarily oversee things such as code quality. Those are implicitly expected of all skilled developers.

In addition to code quality we have the issue of documentation. A lot of my colleagues of past and present might shift uneasily at their desks now. Let’s face it, most developers do not like writing documentation. Even Scrum has built into it the idea of only producing the minimal amount of documentation possible. While this does mean minimal in that it should still serve its purpose, it’s easily taken as a reason not to write documentation at all. You have code changing all the time anyway, so why should you write documentation that will be outdated the next day?

Why should we document?

The story I told is not uncommon by any means. It’s been a relatively constant topic of discussion for the whole of my working life, in fact. Why I think it’s so important is that you have people coming in and going out over the lifespan of a company. You could say it’s the people that make up the company but it’s code that is left behind by those people that forges the future of the company. This is because you can’t do a rewrite of the whole product every time you have people leaving or joining the company. So the company is stuck with what the person did before leaving.

Every time you write a single line of code you do it with a plan inside your head. You have an idea of what constructs are required and how they relate to one another. You basically form a whole imaginary world inside your head. This world is then projected into text using the programming language grammar.

Such descriptions using a programming language often becomes quite complicated. This is why you should use methods and well named variables to break down the individual expressions and constructs into understandable bits. For instance, you could write the whole simulation code of a car as a single block but it would make more sense to split it into smaller methods, each of which define a single step in the algorithm.

Such well structured code that is split into sensible bite sized portions that are labeled in a coherent manner can be called clean code. I like to think of you having to understand the main points of what the code does on the first read through.

However, clean code is not always enough. If you were making modifications to the code then clean code will help you considerably, but often you are just using the code in some way. This is especially evident when using a 3rd party library that has a specific set of entry points and data structures. You might not even have the code for it. You have no practical way of finding out what the system does without having samples on every single possible thing you can do with it or having comprehensive documentation.

The same goes for internal technology that is used in a 3rd party like manner. This is the case when using an internal engine, for instance. Documentation allows for people without deep knowledge of the engine internals to just dive in and start using it. Without documentation you could hope someone else already has an answer to your question or you could just start trying things out. I have also noticed that often the methods and classes really can be used in the way that feels intuitive, but you’re never really sure. This tends to create a lot of stress. Another thing that is made more difficult by missing documentation is feature design and work estimation. For every feature, you have to wade through the code trying to find the correct classes and methods to use.

As I explained above, you project your mental image of the system using the grammar of a programming language. The programming language is good at expression what is being done but what it’s not so good at is explaining why. Often you make choices that might not seem to make sense to the reader of the code. This might even result in the person “fixing” the code. Having the reasoning of that choice documented in some way can be very important.

We also often tend to forget about documentation having the ability to aggregate information and make it simpler to read. It’s often not just about whether the code is documented but also about how fast you can find the information you need. Say, for instance, you are in the situation I described above. You are a newcomer and you want to start doing something productive. The thing you could do is either go find some documentation, which is often scarce, or you could find someone who knows about how the system works. In the second option this person might then explain you the main architecture of the code and the guidelines for you to follow when developing or using the code, if that was what you asked. You feel satisfied and start working. Then a new guy comes in and does the same thing…

What I think is worth remembering here is that most people will ask the same questions, if that piece of information is missing. Being asked questions often means the required information is cumbersome to find or it does not exist at all. In this case you are using about X times the time you would have spent writing down the information you are being asked for, where X is the number of people asking.

Different types of documentation

You should not write documentation just for the sake of documentation. A piece of documentation has to always have a reader, otherwise it is useless. I like to categorize documentation into two groups: architecture and detailed. Note that we are talking about documentation targeted to the developer and not documentation like tool manuals and marketing material.

I would classify architecture documentation as the most important one because it describes the shape language for the code: the different sub-systems, coding conventions, threading rules, testing guidelines, patterns to use, non-functional requirements, etc. Overall, it works both as an introduction to the system as well as a guideline to using and modifying it. Architecture documentation is high level documentation used for controlling the overall direction of the project. It forms the boundaries, if you will, within which development is done. This is precisely why I think it’s the most important piece of documentation: How can you be sure people go in the right direction if there is no direction defined?

Whether you require detail documentation is a bit more complicated. This includes documentation about individual classes, methods and sometimes algorithms too. My view on this is that you should not overshoot but neither should you neglect it.

Whether or not you require detail documentation depends on the type of code you are writing. If you are modifying an algorithm and you can be expected to understand its inner workings and you are able to understand the algorithm by reading the code, then you do not necessarily need additional documentation. Algorithms are usually contained in some manner and have few entry points. Some times, as was mentioned above, however, you might have made a choice when implementing the algorithm that might not initially make sense to the reader. This is where detailed documentation is important, to explain the “why” of those choices.

Now, if you are accessing an entry point of an algorithm and you have to understand the algorithm to know how to use the entry point then either you don’t have clean code or you are missing documentation. To determine which is the case you should analyze whether it is possible to re-factor the entry point so that it is self explaining. It this proves too difficult, the entry point is missing documentation. There might also be some simple methods that get or set data whose names would become too complicated in which case documentation is also in order.

I have also personally heard comments about code documentation requiring too much discipline to work. It’s just something that comes in your way when you implementing something and you have no time for anything extra. What is often forgotten, however, is that code is most of often shared property and multiple people are going to work on the code you have written. Implementing a new system requires design and implementation. Working on an existing one means understanding how it works, then designing and implementing the modification. Let me put that into two equations:

X + Y = time

X + Y + Z = time

The second equation is the one for modifying an existing feature and is very much impacted by the component Z that describes the time you have to use to learn the system. The more technical debt you leave behind the bigger Z will be for the next person coming to work on your code.

If we now break down Z into components Z1 and Z2. Z1 stands for the time used to learn what the actual code you are modifying does while Z2 describes the time you need to study related APIs. Clean code is the best way to minimize Z1 but Z2 might come from re-usable APIs like core engine code that you are incorporating into the existing code. To minimize Z2 you have to minimize the time it takes to learn what those APIs do.

The documentation process

Documentation is best written in small pieces. The difficulty in writing good documentation comes from the fact that you as the implementer are most likely not going to spot where there is information missing. This is where we come to the concept of code review.

Code reviews are the best method for finding places where documentations is required as the person reviewing the code should be able to understand what he/she is reading. If that is not the case, additional documentation or code cleanup is required. Personally I think both peer review and reviews done by the lead programmer are in order. In peer review you also share knowledge minimizing silent information, where some people know something others have no clue of.

There are multiple tools for technically performing code review. I have experience using a product called Review Board, which I can generally recommend. Regardless of the tools you use, code review should be built into the company culture. Choosing the tools is easy, getting the process running is the hard part.

Another thing that is a clear indicator of missing information is different people asking you the same questions time and time again. Documenting your answers will both save you the time of answering the same questions multiple times but also makes it possible for people to get to that information even when you are out of reach.

How should documentation be done, then? Personally I like wikis and code commenting. Wikis are good for documenting software architecture but code commenting is good for detailed information like API documentation. Also, you should exhibit some discipline. Documentation should be built into your way of working. Every feature should be reviewed and missing information should be incorporated either by cleaning the code, commenting the code or writing about it in the architecture documentation. The order is important. You should first consider cleaning and only do commenting if it proves too cumbersome and defeats the purpose. If the information is missing on a larger scale (which systems take part in managing the game world, for instance) then the architecture documentation should be updated.

Conclusions

Documentation requires discipline, but so does programming itself. It is indicated by the very process of using a well structured programming language to describe a system that is to work in a deterministic fashion to perform a certain set of tasks. Documentation should not be thought of as a necessary evil that is to be defeated by forcing everyone to write silly amounts of it. It should contain information that is difficult to describe otherwise, information that people need in order to be effective at what they do.

In my view, there is no single right way to perform documentation. Some people say it should all be incorporated into the code, some say there should be no commenting at all and clean code is enough while others like to have the whole set. My tip is that you should do what works best for you as a company. The bottom line being that documentation is about improving the efficiency of everyone in the company, it should not be there just for your nuisance.

#AltDevBlog

Timo Heinapurola
Follow @heinapurola

What documentation?

Why should we document?

Different types of documentation

The documentation process

Conclusions

#AltDevBlog

Timo Heinapurola Follow @heinapurola

What documentation?

Why should we document?

Different types of documentation

The documentation process

Conclusions

Timo Heinapurola
Follow @heinapurola