Lex Spoon: December 2012

Saturday, December 29, 2012

Does IPv6 mean the end of NAT?

I frequently encounter a casual mention that, with the larger address space in IPv6, Net Address Translation (NAT)--a mainstay of wireless routers everywhere--will go away. I don't think so. There are numerous reasons to embrace path-based routing, and I believe the anti-NAT folks are myopically focusing on just one of them.

As background, what a NAT router does is allow multiplexing multiple private IP addresses behind a single, public IP address. From outside the subnet, it looks like the NAT router is a single machine. From inside the subnet, there are a number of machines, each with its own IP address. The NAT router allows communication between the inside and outside worlds by swizzling IP addresses and ports as connections go through the router. That's why it is a "net address translator" -- it translates between public IPs and private IPs.

My first encounter with NAT was to connect multiple machines to a residential ISP. It was either a cable company or a phone company; I forget which. The ISP in question wanted to charge extra for each device connected within the residential network. That is, if you connect two computers, you should pay more than if you connect one computer. I felt, and still feel, that this is a poor business arrangement. The ISP should concern itself with where I impose costs on it, which is via bandwidth. If I take a print server from one big box and move it onto its own smaller computer, then I need a new IP address, but that shouldn't matter at all to the ISP. By using NAT--in my case, Linux's "masquerading" support--the ISP doesn't even know.

This example broadens to a concern one could call privacy. What an organization does within its own network is its own business. Its communication with the outside world should be through pre-agreed protocols that, to the extent feasible, do not divulge decisions that are internal to the organization. It shouldn't matter to the general public whether each resident has their own machine, or whether they are sharing, or whether the residents have all bought iPads to augment their other devices.

For larger organizations, privacy leads to security. If you want to break into an organization's computer infrastructure, one of the first things you want to do is to feel out the topology of the network. Unless you use NAT at the boundary between your organization's network and the general internet, then you are exposing your internal network topology to the world. You are giving an attacker an unnecessary leg up.

You could also view these concerns from the point of view of modularity. The public network protocol of an organization is an interface. The internal decisions within the organization are an implementation. If you want everything to hook up reliably, then components should depend on interfaces, not implementations.

Given these concerns, I see no reason to expect NAT to go away, even given an Internet with a larger address space. It's just sensible network design. Moreover, I wish that the IETF would put more effort into direct support for NAT. In particular, the NAT of today is unnecessarily weak when it comes to computers behind different NATing routers making a direct connections with each other.

It is an understatement to say that not everyone agrees with me. Vint Cerf gave an interview earlier this year where he repeatedly expressed disdain for NAT.

"But people had not run out of IPv4 and NAT boxes [network address translation lets multiple devices share a single IP address] were around (ugh), so the delay is understandable but inexcusable."

Here we see what I presume is Cerf's main viewpoint on NAT: it's an ugly mechanism that is mainly used to avoid address exhaustion.

One of the benefits of IPv6 is a more direct architecture that's not obfuscated by the address-sharing of network address translation (NAT). How will that change the Internet? And how seriously should we take security concerns of those who like to have that NAT as a layer of defense? Machine to machine [communication] will be facilitated by IPv6. Security is important; NAT is not a security measure in any real sense. Strong, end-to-end authentication and encryption are needed. Two-factor passwords also ([which use] one-time passwords).

I respectfully disagree with the comment about security. I suspect his point of view is that you can just as well use firewall rules to block incoming connections. Speaking as someone who has set up multiple sets of firewall rules, I can attest that they are fiddly and error prone. You get a much more reliable guarantee against incoming connections if you use a NAT router.

In parting, let me note a comment in the same interview:

Might it have been possible to engineer some better forwards compatibility into IPv4 or better backwards compatibility into IPv6 to make this transition easier? We might have used an option field in IPv4 to achieve the desired effect, but at the time options were slow to process, and in any case we would have to touch the code in every host to get the option to be processes... Every IPv4 and IPv6 packet can have fields in the packet that are optional -- but that carry additional information (e.g. for security)... We concluded (perhaps wrongly) that if we were going to touch every host anyway we should design an efficient new protocol that could be executed as the mainline code rather than options.

It is not too late.

Thursday, December 27, 2012

Windows 8 first impressions

An acquaintance of mine got a Windows 8 machine for Christmas, and so I got a chance to take a brief look at it. Here are some first impressions.

Windows 8 uses a tile-based UI that was called "Metro" during development. As a brief overview, the home page on Windows 8 is no longer a desktop, but instead features a number of tiles, one for each of the machine's most featured applications. Double-clicking on a tile causes it to maximize and take the whole display. There is no visible toolbar, no visible taskbar, no overlapping of windows. In general, the overall approach looks very reasonable to me.

The execution is another story. Let me hit a few highlights.

First, the old non-tiled desktop interface is still present, and Windows drops you into it from time to time. You really can't avoid it, because even the file browser uses the old mode. I suppose Microsoft shipped this way due to concerns about legacy software, but it's really bad for users. They have to learn not only the new tiles-based user interface but also the old desktop one. Thus it's a doubly steep learning curve compared to other operating systems, and it's a jarring user experience as the user goes from one UI to the other.

An additional problem is a complete lack of guide posts. When you switch to an app, you really switch to the app. The tiles-based home page goes away, and the new app fills the entire screen, every single pixel. There is no title bar, no application bar, nothing. You don't know what the current app is except by studying it's current screen and trying to recognize it. You have no way at all to know which tile on the home page got you here; you just have to remember. The UI really needs some sort of guide posts to tell you where you are.

The install process is bad. When you start it, it encourages you to create a Microsoft account and log in using that. It's a long process, including an unnecessary CAPTCHA; this process is on the critical path and should be made short and simple. Worse, though, I saw it outright crash at the end. After a hard reboot, it went back into the "create a new account" sequence, but after entering all the information from before, it hits a dead end and says the account is already being used on this computer. This error state is bad in numerous ways. It shouldn't have evened jumped into the create-account sequence with an account already present. Worse, the error message indicates that the software knows exactly what the user is trying to do. Why provide an error message rather than simply logging them in?

Aside from those three biggies, there are also a myriad of small UI details that seem pointlessly bad:

The UI uses a lot of pullouts, but those pullouts are completely invisible unless you know the magic gesture and the magic place on the screen to do it. Why not include a little grab handle off on the edge? It uses a little screen space, and it adds some clutter, but for the main pullouts the user really needs to know they are there.
In the web browser, they have moved the navigation bar to the bottom of the screen. This breaks all expectations of anyone that has used another computer or smart phone ever in their life. In exchange for those broken expectations, I can see no benefit; it's the same amount of screen space either way.
The "support" tile is right on the home page, which is a nice touch for new users. However, when you click it the first time, it dumps you into the machine registration wizard. Thus, it interrupts your already interrupted work flow with another interruption. It reminds me of an old Microsoft help program that, the first time you ran it, asked you about the settings you wanted to use for the search index.

On the whole, I know I am not saying anything new, but it strikes me that Microsoft would benefit from more time spent on their user interfaces. The problems I'm describing don't require any deep expertise in UIs. All you have to do is try the system out and then fix the more horrific traps that you find. I'm guessing the main issue here is in internal budgeting. There is a temptation to treat "code complete" as the target and to apportion your effort toward getting there. Code complete should not be the final state, though; if you think of it that way, you'll inevitably ship UIs that technically work but are full of land mines.

Okay, I've tried to give a reasonable overview of first impressions. Forgive me if I close with something a little more fun: Windows 8 while drunk.

Saturday, December 15, 2012

Recursive Maven considered harmful

I have been strongly influenced by Peter Miller's article Recursive Make Considered Harmful. Peter showed that if you used the language of make carefully, you could achieve two very useful properties:

You never need to run make clean.
You can pick any make target, in your entire tree of code, and confidently tell make just to build that one target.

Most people don't use make that way, but they should. More troubling, they're making the same mistakes with newer build tools.

What most people do with Maven, to name one example, is to add a build file for each component of their software. To build the whole code base, you go through each component, build it, and put the resulting artifacts into what is called an artifact repository. Subsequent builds pull their inputs from the artifact repository. My understanding of Ant and Ivy, and of SBT and Ivy, is that those build-tool combinations are typically used in the same way.

This arrangement is just like recursive make, and it leads to just the same problems. Developers rebuild more than they need to, because they can't trust the tool to build just the right stuff, so they waste time waiting on builds they didn't really need to run. Worse, these defensive rebuilds get checked into the build scripts, so as to "help" other programmers, making build times bad for everyone. On top of it all, even for all this defensiveness, developers will sometimes fail to rebuild something they needed to, in which case they'll end up debugging with stale software and wondering what's going on.

On top of the other problems, these manually sequenced builds are impractical to parallelize. You can't run certain parts of the build until certain other parts are finished, but the tool doesn't know what the dependencies are. Thus the tool can't parallelize it for you, not on your local machine, not using a build farm. Using a standard Maven or Ivy build, the best, most expensive development machine will peg just one CPU while the others sit idle.

Fixing the problem

Build tools should use a build cache, emphasis on the cache, for propagating results from one component to another. A cache is an abstraction that allows computing a function more quickly based on partial results computed in the past. The function, in this case, is for turning source code into a binary.

A cache does nothing except speed things up. You could remove a cache entirely and the surrounding system would work the same, just more slowly. A cache has no side effects, either. No matter what you've done with a cache in the past, a given query to the cache will give back the same value to the same query in the future.

The Maven experience is very different from what I describe! Maven repositories are used like caches, but without having the properties of caches. When you ask for something from a Maven repository, it very much matters what you have done in the past. It returns the most recent thing you put into it. It can even fail, if you ask for something before you put it in.

What you want is a build cache. Whereas a Maven repository is keyed by component name and version number (and maybe a few more things), a build cache is keyed by a hash code over the input files and a command line. If you rebuild the same source code but with a slightly different command, you'll get a different hash code even though the component name and version are the same.

To make use of such a cache, the build tool needs to be able to deal sensibly with cache misses. To do that, it needs a way to see through the cache and run recursive build commands for things that aren't already present in the cache. There are a variety of ways to implement such a build tool. A simple approach, as a motivating example, is to insist that the organization put all source code into one large repository. This approach easily scales to a few dozen developers. For larger groups, you likely want some form of two-layer scheme, where a local check-out of part of the code is virtually overlaid over a remote repository.

Hall of fame

While the most popular build tools do not have a proper build cache, there are a couple of lesser known ones that do. One such is the internal Google Build System. Google uses a couple of tricks to get the approach working well for themselves. First, they use Perforce, which allows having all code in one repository without all developers having to check the whole thing out. Second, they use a FUSE filesystem that allows quickly computing hash codes over large numbers of input files.

Another build tool that gets this right is the Nix build system. Nix is a fledgling build tool built as a Ph.D. project at the University of Delft. It's available open source, so you can play with it right now. My impression is that it has a good core but that it is not very widely used, and thus that you might well run into sharp corners.

How we got here

Worth pondering is how decades of build tool development have left us all using such impoverished tools. OS kernels have gotten better. C compilers have gotten better. Editors have gotten better. IDEs have gotten worlds better. Build tools? Build tools are still poor.

When I raise that question, a common response I get is that build tools are simply a fundamentally miserable problem. I disagree. I've worked with build tools that don't have these problems, and they simply haven't caught on.

My best guess is that, in large part, developers don't know what they are missing. There's no equivalent, for build tools, of using Linux in college and then spreading the word once you graduate. Since developers don't even know what a build tool can be like, they instead work on adding features. Thus you see build tool authors advertising that they support Java and Scala and JUnit and Jenkins and on and on and on with a very long feature list.

Who really cares about features in a build tool, though? What I want in a build tool is what I wrote to begin with, what was described over a decade ago by Peter Miller: never run a clean build, and reliably build any sub-target you like. These are properties, not features, and you don't get properties by accumulating more code.

Lex Spoon