Wednesday, December 28, 2011

All software has bugs

Johm Carmack has a great article up on his experience with bug-finder software such as Coverity and PC-Lint. One of his observations is this:
The first step is fully admitting that the code you write is riddled with errors. That is a bitter pill to swallow for a lot of people, but without it, most suggestions for change will be viewed with irritation or outright hostility. You have to want criticism of your code.
He feels that the party line for bug finders is true, that you may as well catch the easy bugs:
The value in catching even the small subset of errors that are tractable to static analysis every single time is huge.
I agree. One of the ways it is easier to talk to more experienced software developers is that they take this view for granted. When I talk to newer developers, or to non-engineers, they seem to think that if we spend enough time on something we can remove all the bugs. It's not possible for any body of code more than a few thousand lines. Removing bugs is more like purifying water. You can only manage the contaminants, not remove them. Thus, software quality should be thought of from an effort/reward point of view.

I also have found the following to be true:

This seems to imply that if you have a large enough codebase, any class of error that is syntactically legal probably exists there.
An example I always come back to is the Olin Shivers double word finder. The double word finder scans a text file and detects occurrences of the same word repeated twice in a row, which is usually a grammatical mistake in English. I have started running it on any multi-page paper I write, and it almost always finds at least one such instance that is legitimately an error. If an error can be made, it will be, so almost any automatic detector is going to find real errors. Another one that jives with me is:
NULL pointers are the biggest problem in C/C++, at least in our code.
I did a survey once of the forty most recently fixed bugs on the Squeak bug tracker, and I found that the largest single category of bugs was a null dereference. They were significantly higher than type errors, bugs where one type (e.g. string) was used where another was intended (e.g., open file).

I do part ways with Carmack on the relative value of bug finders:

Exhortations to write better code plans for more code reviews, pair programming, and so on just don’t cut it, especially in an environment with dozens of programmers under a lot of time pressure.
If we were to candidly rank methodology for improving quality, I'd put write better code above use bug finders. In fact, I'd put it second, right after regression testing. I could be wrong, but my intuition is that there are a number of low-effort ways to improve software before it is submitted, and the benefits are often substantial. Things like use a simpler algorithm and read your diff before committing add just minutes to the time for each patch but often save over an hour of post-commit debugging from a repro case.

All in all it's a great read on the value of bug finding tools. Highly recommended if you care about high-quality software. HT John Regehr.

No comments: