Wednesday, November 24, 2010

Floating-point computation: if you don't care what you compute

There's an old saw among programmers that "I can make this program run arbitrarily fast, so long as you don't care what it produces". When you put it like that, it's obviously a bad idea: you always care about the output, even for a random number generator. In practice, this question comes up all the time, just not stated as starkly. Nowhere is this situation more common than with floating-point computations.

William Kahan has written a windy tour of error accumulation for floating-point and posted it online. The content is much like a blog, but he's publishing it as an ever-growing PDF file. Here are a few of the several interesting take-aways:
  • There's no "mindless" way to remove floating point error from your program. It takes some work.
  • Using high-precision floating-point is remarkably effective in practice. Take whatever size floating-point number you think you need, double the number of bits, and then compute over those.
  • Running your program in different rounding modes is remarkably effective at detecting whether your program is unstable. If rounding at the 53rd bit of precision makes any difference at all in your program's output, then your implementation is so unstable that you are probably generating bad results.

Thinking on these problems, I have a new-found love of interval arithmetic. As Kahan discusses in detail, interval arithmetic gives you a proven bound on how large your error can be. As he also discusses, though, if you mindlessly implement a mathematical formula exactly the way it is written, your intervals tend to be outrageously large. It is common, for example, for the proven error bars to double after every computation.

My question is why mainstream languages don't support interval arithmetic? If they did, then programmers could include asserts in their code about the size of the error bars at each step of a computation. Initially, these assertions would all fail, due to the intervals being way too large. However, with some cleverness and patience, programmers could get the intervals much smaller. Over time, programmers would build up a toolbox of techniques that apply in most situations. It would fit right in with how software engineering is performed nowadays.

Programming this way sounds like it would take longer to implement numerics than what I am used to. What's the alternative though? Graphics programs aside, don't we want to implement correct output, not just output that looks good? Imagine if a tech-savvy client asked how accurate our program's output was? We'd have to say, "I don't really know".

Friday, November 19, 2010

Can the FCC touch cable?

Brian Stelter says no:
There is little the Federal Communications Commission can say about Fox News or MSNBC since the channels are on cable, not delivered over the broadcast airwaves.

True, but don't rest too easy. The FCC has whatever authority that U.S. Congress gives it. It's been lobbying for explicit authority over cable and the Internet, and they'll get it if the public wants them to have it.

We have a chance to leave behind the bad old days where U.S. citizens are "protected" from seeing anything that D.C. folks would consider objectionable. Getting there requires that the FCC not regulate the new networks, and they really want to.

Tuesday, November 16, 2010

Mac computers for programmers?

I temporarily switched to an Apple Mac laptop, but after about two years I switched back to Linux. Here's why.

My original reasoning was that Macs support all the Unix goodness I value for programming, and furthermore they have better shell features such as network configuration and file managers. True, they have a higher price, and they are closed source, but it's worth spending money on good tools, and I don't plan to hack the kernel or the graphics engine, right? After a long sequence of individual observations, I came to realize this summation is not very accurate.

Software support. The software support can be summed up as: everything I need is available on Macs, none of it works better, some of it works worse, and a lot of it needs to be installed manually. In detail:
  • The X server is crashy. I lost a lot of time trying to make X apps work decently, but they just don't.
  • Eclipse works identically to Linux.
  • Java works identically, though for the longest time they were stuck at version 1.5 when the rest of the world was using 1.6.
  • There's no Debian or Ubuntu, but there is Fink. Fink has several essential packages, most importantly Latex. However, it's an order of magnitude smaller than the Linux distributions it mimicks. I like software distributions. They provide a great way for software developers to help each other. With Apple, you get what Apple gives you and the third-party market is much skimpier.
  • Gnu Emacs works fine. I experimented with Aquamacs for a long time, to try and do better than on Linux, but I found it had a lot of quirky problems.
  • Scripts work just like in Linux.
  • Scala and Scheme Shell work fine.
  • I tried the built-in calender, email, and contacts support. Ultimately I switched to Google equivalents, however, which are much better.
  • The built-in word processor and spreadsheet are just OpenOffice, the same as on Linux.
Operating system shell. Apple is famed as building great user interfaces. I'm coming to ask why that is, though. Sometimes they make something easy, and sometimes they make it impossible and then try to pretend like you'd never want to do it. It's as if they optimized for demos and didn't do any user studies. Over time, I found myselef replacing one chunk of the shell after another. Specifically:
  • Display management I will concede: it works great on a Mac. If I unplug an external monitor in a Mac, the display mode readjusts to display just on the built-in monitor. If I unplug an external monitor on Linux, it stays in that state, and it can be really difficult to get the display fixed back up.
  • Suspending to disk and to ram is about the same on both. I've had times where my Mac wouldn't resume, and I've had times when resuming a suspend-to-ram took longer than fully booting a Linux machine from scratch.
  • VPN software works better on Linux, in my experience. The Internet was developed on ancestors of Linux, so it has good networking in its genes.
  • The file manager on Linux is superior. I pulled my hair out trying to find a way to manipulate archive files (tar, zip, jar) with the Mac finder. There's nothing built in for it on a Mac, and I tried 2-3 extensions but none of them worked well. They act like archive files don't exist.
  • The default terminal program on a Mac is terrible. There's a decent replacement named iTerm, and iTerm is about par with gnome-terminal.
  • The window manager doesn't do multiple desktops. I hear there are extensions for that, but I never braved trying one.
  • The window manager shifts focus at the granularity of an application, and it has a notion of "application" that is terrible for programmers. If you click on any terminal window, they all come forward. If you click on any browser window, they all come forward. Likewise for hiding: you hide one terminal, they all go away. In general, the window manager is optimized for the way a graphics designer or an audio engineer works: 3-4 apps at a time at the most. When I program, I have 3-4 browser windows and 3-4 terminal windows, each with their own tabs internally.
  • The app launcher bar is okay, but I prefer how the Gnome launcher has a row of mini-icons plus a nice hierarchical menu of the whole system's software. I have tons of software installed, so a row of gigantic icons doesn't work.
  • Selecting wifi networks works well, but then it also does on Linux nowadays.
  • Network sharing is better on Macs. I use network sharing on a laptop maybe once a year, but when I do, it's easier on a Mac. On Linux, I have to install some extra software, pray my wifi driver has AP Mode, and read some docs on how to configure it all.

I would sum up the system software as sometimes much better and often much worse. Several aspects that work great out of the box on Linux, I have to install upgrades on a Mac to match it.

When I add up all the above individual things, I notice that there is hardly anything I particularly like about Macs, and there is quite a lot where I have to replace things to get it up to par. The applications I use aren't any better, and are sometimes worse. Surprisingly, the system software isn't particularly good, either. I vastly prefer Ubuntu to Fink -- it's just a larger, richer community. Also, while I don't hack the kernal or graphics software myself, it is nice to be able to apply patches that third-party people have written.

Macs are good computers for programmers, to be sure, but I'd still give Linux the edge.

Thursday, November 11, 2010

Two good reads on digital copyright

David Friedman raises an excellent thought experiment: what if the web had come first, rather than printed documents?
If the web had come first, issues of copyright and credit would have applied only to the rare case where someone chose to copy instead to link. Indeed, the relevant laws and norms might never have developed, since the very fact that what you were reading was a quote rather than a link, written by the quoter rather than the quotee, would be sufficient reason not to trust it.

I agree. The model we have is at odds with what makes sense on the Internet, and the Internet is already a much more important vehicle of communication than any print media. We should adjust our law to make sense for the Internet and let print gracefully decline as the preeminent way to share content.

Friedman's post is apropos for my own blog surfing, because I just now read Lawrence Lessig's For the Love of Culture that he posted back in January. It's a rich subject, so let me give two punchlines. Here's one:
Before we continue any further down this culturally asphyxiating road, can we think about it a little more? Before we release a gaggle of lawyers to police every quotation appearing in any book, can we stop for a moment to consider whether this way of organizing access to culture makes sense? Does this complexity get us something we would not get under the older system? Does this innovation in obsessive control produce any new understanding? Is it really progress?
Whether he is overstating things depends on your point of view. If you are Google, then all current law is just a hand shake with the president away from being changed to something else. It took Google to pull off Google Print. Larry and Sergei couldn't have done it alone when they were students, because it violated a thick cobweb of law, regulation, and copyright agreements. It's a disturbing state. Google Print involves an incomprehensible mess of legal agreements, but worse, the next hundred bright ideas about content sharing just aren't going to get off the ground.

How to arrange things differently is a big topic. Lessig has an important starting point in this comment:
We are about to change that past, radically. And the premise for that change is an accidental feature of the architecture of copyright law: that it regulates copies.
Focusing on copies is awkward when, on a computer, copies are ubiquitous. Computationally, copies are actually cheaper than actually displaying the content.

There are a lot of alternate approaches we could use than controlling the right to copy. As two examples, charging for performances and charging for access to a large archive are both possibilities. The first step, though, is to recognize that we have a problem.

Funmath notation for "calculating" on paper

I was recently pointed to the Funmath project. Funmath is a mathematical notation that is meant to make manipulation on paper be fast and reliable. It is headed by Raymond Boute.

I don't understand the notation well enough to say how well it works, but I most certainly appreciate the goal. If you learn the notation and rules of algebraic and calculus notation, you can fearlessly race through rewrites as fast as you can move your pencil. The notation means exactly what it looks like, and the rewrites can be done through simple mental pattern matching. To contrast, all manner of other parts of math notation don't work so well. If a formula involves existential or universal quantifiers, set comprehensions, summations, lists, or even functions, then there is a lot of "it means what I mean" in the notation people use. Each step of a derivation has to be very carefully considered, because you can't just pattern match. You have to do a deep parse and figure out exactly which of the possible options each bit of notation really means.

Boute has an interesting historical explanation of this difference. Algebraic notation developed before the dawn of computers, so people computed on paper all the time. As a result, the notation evolved to support paper computation. Logical notation, on the other hand, hasn't been as pressing for computation on paper. Interest has risen in the last few decades, but almost everyone involved is entering their formulas directly into computers. There was never a period of time when logical notation was more heavily used on paper than on computers, so a step was skipped.

In attempting to fill in that gap, Boute has applied design rules that are familiar in programming language design. Particularly interesting to me is that he handles all variable-binding forms by using a single notation for function literals. This is exactly like the function literals that get bandied about in programming-language design circles. In math notation, as in programming languages, having a lightweight syntax for function literals means you can cut out a lot of syntactic forms. In Funmath, summation and quantifiers don't get special syntax. They are library functions that take other functions as arguments. Summation takes a function and maps it to the sum of its range. Quantifiers take a function and map it to a truth value.

If Funmath is as advertised, then it is as much a step up for logic notation as Arabic numerals were over Roman. I wonder if anyone outside the core research group has given it a try?

Type checkers aren't just error detectors

Type checking is frequently put forward as a way to find errors that otherwise would have slipped through testing. I've never found this a very compelling argument, though. Before software is released, or even a patch committed to the shared repository, the developers go through various kinds of effort to convince themselves that the latest changes to the software behave correctly. There are a myriad of ways to do that, ranging across code inspection, testing, and formal methods, but whatever ways are chosen, the software doesn't move forward until it's been verified. At that point, not that many type errors can realistically remain.

There are larger advantages to having a type checker. Let me describe three.

Data Modelling. A lot of what programmers do is develop a model of the data their programs work on. All modelling approaches I know, bar none, involve using types. For example, ORM, UML, XML DTDs, Protobuf schemas, and SQL schemas all use types prominently. If you have types in the programming language, then the program itself can embed parts of the data model explicitly.

A Second Perspective. In all software groups I have been a part of, most of the software written gets far less code review than would be beneficial. Software gets much better just by having people scan their eyes across it and think about it, but practicing engineers dread spending the time to do it. A type checker forces programmers to understand their code two ways: the logical behavior, and the type signatures. By doing so, they force programmers to do a little bit more code review.

Adding rigidity. Typed languages have a higher percentage of sensible programs in them. Just like it's easier to dial a TV than a radio, it's easier to code in a typed language. You can turn a TV dial to just any old location; you have to choose among a small, discrete number of settings. When writing new program code, you can write a rough sketch and then use the type checker to guide you on the details. For refactoring, you can make an inital change and let the IDE guide you to the other ones. It's much faster than if you have to run the test suite more often or have to filter through raw text searches.

Wednesday, November 3, 2010

Which city will first have auto autos?

Automatic automobiles, that is. Robin Hanson asks:
So a huge upcoming policy question is: when will what big cities manage to coordinate to change road law to achieve these huge auto-auto economic gains? Thirty years from now we may look back and lament that big city politics was so broken that no big cities could manage it. Or perhaps history will celebrate how the first big city to do it dramatically increased its importance on the world scene.

A good question. The commenters point to Brad Templeton's site, where he has done a lot of work to analyze just that question.

I'll ask a milder question than Robin. Where's the first city where we can even drive [sic] one of these cars at all? Ted Turner, are you reading? Wouldn't you like to ride around Atlanta in an auto auto?

Copyright law versus audio archives

The U.S. Library of Congress writes:
"Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal," the study concludes, "Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted."
More at OS News, which has a link to the 181-page study by the Library of Congress.

Hat Tip to James Robertson.

I'd be a lot more comfortable if the U.S. Congress simply passed reasonable legislation to begin with. I don't hold out hope for it. What does give me hope, however, is that cheap technology indirectly allows all sorts of common-sense copying activity to become de facto allowed.

Whatever paper fantasies Congress puts out, they aren't really going to lock up everyone who makes a mix tape or sets up a home media server. Historically, the tape recorder, the photocopier, and the VCR did wonders for fair use. Going forward, DRM-free Linux and Android computers can work similar magic for digital content.