Tuesday, March 29, 2011

When stateful code is better

One branch of functional-programming enthusiasts have long strived to eliminate state from programming. By doing so, you end up with program code that supports equational reasoning. If you know a=b in one part of the code, then you can freely replace a by b and b by a anywhere else in the code. Since there's no state, the program will still behave the same. It's good stuff.

Nonetheless, state is essential for good code in most languages. You don't want to live without it. Take a moment to consider the more practical of functional programming languages, and see how programmers in those languages have voted with their feet. ML, the most popular typed, strict functional language, includes ref cells in the core language. Haskell, the most popular typed, lazy functional language, includes not just the state monad but also UnsafeIO. Lisp and Scheme, the most popular untyped functional languages, shamelessly include state everywhere. It's a clean sweep. Functional programmers are using languages that have state.

Why is this? Let me describe a couple of programming problems that any practical language needs to be able to solve. Both of these problems are easy with state and hair-pulling without. Any language without state will give programmers a tough time with these problems, so such languages don't become popular. The two problems are: logging, and the model-view architecture.

With logging, what you'd like to do is write your program as normal and then insert log messages here and there in the code. As much as possible, you want to avoid disturbing the core logic of the program with the logging behavior. Solving this problem using state is so easy it's hard to even talk about. All you do is use that state, typically an external file system. Every time you want to log something, write the message to the state.

What if you want to log without state? In that case, you have to pass around the log as a parameter to every function in the program that might want to log anything. Every function gets an extra parameter which is the state of the log before the function call. Such functions must also return that extra parameter, after updating, when they finish. This approach has two large problems. First, it requires pervasive changes throughout the code base to pass around the latest version of the log. Second, it's highly error prone. For example, the following function logs two messages but accidentally discards the first one:
def doTwoThings(log) = {
val log1 = write(log, "about to do first thing")
doFirstThing()
val log2 = write(log, "about to do second thing")
doSecondThing()
return log2
}
This kind of problem can probably be prevented with linear types, and many functional language researchers would observe this and go do a bunch of research on linear types. Until they come up with something, your best bet is to use state.

A second example is event handling in the model-view architecture that is so pervasive in practical code. In the model-view architecture, you write the program in two layers: one layer for the core model of the software and one layer for the view. Views have a pointer to their models, and whenever the model changes they update themselves. This way, the model code stands alone and can be analyzed and unit tested without needing a user interface. The view, meanwhile, focuses on user interfaces, and can be tested on its own if you stub out the model. It's a fine architecture, well worth its popularity. Here's the challenge for stateless programming: how do you update the model in response to an event?

In a stateful language, what programmers can do is mutate the model in place. Every pointer from the view to the model will still be valid, so the view doesn't need any changes to its structure. Again, it's so simple it is hard to even talk about.

Now consider a stateless language. In a stateless language, you must not only update the model, but must also update any view object that refers to any part of the model that changed. Likewise, you have to update any view object that has a reference to any such view object, transitively. There's no theoretical bar from programming like this. However, your process-event object ends up taking a view as an argument, just so that it can update all the pointers from the view to the updated model. This approach is tedious and error-prone in the same way as with stateless logging. It's very easy to leave some parts of the view pointing to old parts of the model. If you do that, things will mostly work, but there will be stale data in parts of the view.

In general, stateless code is usually better. However, I can't escape believing that for logging and for the model-view architecture, it's the stateful version that is best. These problems share the aspect that there are references from one component (main application, view) to a separate component (log file, model), and the second component is undergoing change. By letting the reference be stateful, the two components can work with each other at arm's distance. Contrary to its usual reputation, state in such cases is a help rather than a hindrance for building useful abstractions.

1 comment:

Runar said...

Or you could just use a monad. With the State monad, you manipulate (immutable) state, without having to pass it around explicitly.

With the ST monad, you can indeed use a mutable log while staying purely functional.

Stateful doesn't necessarily imply side-effects.