Friday, June 24, 2011

Against manual formatting

Programming languages are practically always expressed in text, and thus they provide the programmer with a lot of flexibility in the exact sequence of characters used to represent any one program. Examples of manual formatting are:
  • How many blank lines to put between different program elements
  • Whether to put the body of an if on the same line or a new line
  • Whether to sort public members before private members
  • Whether to sort members of a class top-down or bottom-up, breadth-first or depth-first
I've come to think we are better off not taking advantage of this flexibility. It takes a significant amount of time, and for well-written code, its benefits are small. First consider the time. The first place manual formatting takes time is in the initial entry of code. Programmers get their code to work, and then they have to spend time deciding what order to put everything in. It's perhaps not a huge amount of time, but it is time nonetheless. The second time manual formatting takes time is when people edit the code. The extract method refactoring takes very little time for a programmer using an IDE, but if the code is manually formatted, the programmer must then consider where to place the newly created method. It will take much longer to rearrange the new method than it did to create it.

Worse, sometimes the presentation the first programmer use doesn't make sense any longer after the edits that the second programmer made. In that case, the second programmer has to come up with a new organization. As well, they have to spend the time evaluating whether the new organization is worthwhile at all; to do that, they first have to spend time with the existing format trying to make it work. There is time being taxed away all over the place.

Meanwhile, what is the benefit? I posit that in well-written code, any structural unit should have on the order of 10 elements within it. A package should have about 10 classes, a class should have about 10 members, and a method should have about 10 statements. If a class has 50 members, it's way too big. If a method has 50 statements, it, too, is way too big. The code will be improved if you break the large units up into smaller ones.

Once you've done that, the benefit of manual formatting becomes really small. If you are talking about a class with ten members, who cares what order they are presented in? If a method has only 5 statements, does it matter how much spacing there is between them? Indeed, if the code is auto-formatted, then those 5 statements can be mentally parsed especially quickly. The human mind is an extraordinary pattern matcher, and it can match patterns faster that it has seen many times before.

I used to argue that presentation is valuable because programs are read more than written. However, then I tried auto-formatting and auto-sorting for a few months, and it was like dropping a fifty pound backpack that I'd been carrying around. Yes, it's possible to walk around like that, and you don't even consciously think about it after a while, but it really slows you down. What I overlooked in the past was that it's not just lexical formatting that can improve the presentation of a program. Instead of carefully formatting a large method, good programmers already divide large methods into smaller ones. Once this is done, manual formatting just doesn't have much left to do. So don't bother. Spend the time somewhere that has a larger benefit.

1 comment:

Stephen Haberman said...

Makes sense. It reminds me of the stories I've heard of Smalltalk, where the program was never in a ".file" form, and always an AST in the image. Even their version control was about the changes to the AST.

Which makes some of the oddities of file-based diffs go away--e.g. merely reordering methods no longer shows the entire method body as having been added/removed.