Wednesday, December 22, 2010

That's not how it works

James Robertson shares this depressing quote from the FCC:
"A commercial arrangement between a broadband provider and a third party to directly or indirectly favor some traffic over other traffic in the connection to a subscriber of the broadband provider (i.e., 'pay for priority') would raise significant cause for concern," the Commission then elaborates. This is because "pay for priority would represent a significant departure from historical and current practice."

Follow the link for analysis.

Let me focus just on this part. The FCC, here, joins the ranks of those who think the Internet is a star topology. The apparent model is that there's an Internet, and then everyone plugs their computer into the Internet. When one user routes a packet to another user, it takes two hops: one to the center node, and one to the other user. Everything that happens within this mythical center node is abstracted away.

As an aside, the FCC also presents a view of the Internet where a handful of providers are sending broadcasts to the masses. Individuals don't contract for Internet services. They are "consumers", and they "subscribe" to the feeds. Leave that aside for now.

The Internet is not a star topology, but a general network. When you send a packet to someone else, it usually takes a dozen or two hops to get to them. How fast it gets to them depends enormously on the intermediate nodes that are taken along the way. I used to play around with traceroute and watch just what routes the packets take under various circumstances. I saw some particularly striking examples when I worked on a Department of Defense bulletin board and watched how packets route between a university network and a DoD machine. Let's just say the routes favored security over latency. They'd go a LONG way in order to go through carefully controlled choke points.

Because the Internet works this way, people who provide Internet services work hard to make sure their servers are well connected with respect to their users. For example, if you want to provide service to British folks, then you really want to get a server up on the island. It wasn't so long ago that all major ftp sites had clones in the UK. Sending data across the English Channel, much less the Atlantic Ocean, was just horrendously slow. When you install an extra server in the UK, you must pay for it.

Relocating a server is just one option. It's also possible to lease network connections between where your server is and where you want the IP traffic to route to. When you do that, you will have to pay whomever you are leasing the bandwidth from.

In short, if you want better connectivity, you have to pay for it. The more you pay, the better the connectivity you get. What the FCC calls a disturbing development is a hair split away from how things already work. They seem to be riding on the notion of whether you pay a broadband provider or some other entity. I fail to see what a big difference it makes.

Let's try a few thought experiments and compare them to the star-topology model. Suppose Netflix pays Comcast to let them install some servers in the same building as a major Comcast hub. Is anything wrong with that? I don't see why. They'll get better bandwidth, but they're paying for all the expenses. Similarly, suppose Netflix, on their own dime, installs new network fiber from their data center to a major Comcast hub. Is there anything wrong with that? Again, I don't see it. After Netflix lays that network, would there be anything wrong with Comcast plugging into it and routing traffic to and from it? Again, I can't see how it would help users for them to decline.

Where the FCC seems to draw the line is when you go past barter and use more fungible resources. What if, instead of Netflix installing new network fiber itself, it pays Comcast to do it. And what if, instead of Comcast laying new fiber for each customer, they split the cost over different customers, giving more access to those who pay more. From the FCC's view, this goes from totally normal to something they've never seen in the past. From my view, this is how things work already. You pay more to get more bandwidth.

I wish the FCC would just abandon trying to regulate Internet service. I want a neutral network, but I don't see how the FCC is going to anything but hurt. I want the Internet we have, not something like broadcast TV, cable, wired telephony, or cellular telephony. I don't think it is a coincidence that the Internet is both less regulated and far more neutral than these other networks.

Friday, December 17, 2010

Every paper and book on our laptops?

Dick Lipton speculates on that question:
Today there are applications like Citeseer that contain about one million papers. The total storage for this is beyond the ability of most of us to store on our laptops. But this should change in the near future. The issue is that the number of papers will continue to grow, but will unlikely grow as fast as memory increases. If this is the case then an implication is that in the future we could have all the technical papers from an area on our own devices. Just as realtime spelling is useful, realtime access to technical papers seems to be a potentially exciting development.[...]
Right now there are too many books, even restricted to a subfield like mathematics, to have all of them stored on a single cheap device. But this large—huge—amount of memory could easily become a small one in the future.

I agree for Citeseer, and I agree for the local library. Very soon, if not already, we will have laptops that can hold the entirety of Citeseer and the entirety of the local library's collection of books. I was impressed when I looked at the file sizes for Project Gutenberg. Shakesperean plays take a few tens of kilobytes, and the largest archive they supply is a dual-sided DVD with over 29,000 books. I still remember the shock when I looked at a directory listing on their web site and the file sizes looked so small I thought the software must be broken.

As an aside, I wish I could say that the Association for Computing Machinery thought this way. Their current thinking that they'll have an online digital library that they take a toll on. If they really wanted to help science, they'd mail you a pre-indexed thumb drive you can load into your laptop and have all papers up until that date. I would bet that someone in physics works this out long before the ACM does. Who knows, though.

All this said, papers and books are backward looking. Nowadays, papers and books are developed as electronic content and then, only at delivery time, printed onto paper. An increasing amount of interesting material is simply never printed at all. Want a copy of the Scala Language Specification? It's essentially a book, but you won't find it at the local library. Over time, printed word is becoming a niche application. You only need it for reading something in depth, or if you want to physically hand it to someone. For the former, print on demand works more and more frequently, and for the latter, the number of times it happens is decreasing. As well, electronic ink just keeps getting better.

From the perspective of interesting words, as opposed to printed papers and books, it will take longer before personal computers can hold all the, ahem, material that is out there. It includes not just papers and books written by mathematicians, but also forum messages, blog posts, and even Facebook and Twitter messages written by all manner of people. Perhaps even then we are already at the point where our machines have enough storage, but it's certainly a lot more data than just for Citeseer and the library.

Of course, most people are only interested in a tiny fraction of all that information. Perhaps Dick Lipton really only cares about math papers from famous mathematicians. If the precise data interesting to someone can be identified, then the storage requirements for keeping a personal copy are much more reasonable, and in fact we probably are already there. However, identifying that subset of the data is, in general, entirely non-trivial.

Wednesday, December 15, 2010

Checked in debugging and profiling?

Engineers of all walks insert extra debugging probes into what they're building. The exterior surface of the artifact is often silent about what goes inside, so the extra probes give engineers important extra information. They can use that information to more quickly test theories about what the artifact is doing. They use these tests to improve performance, debug faulty behavior, or to gain extra assurance that the artifact is behaving correctly.

Software engineering is both the same and different in this regard. Software certainly benefits from extra internal probes, and for all of the standard reasons. A common way to insert such probes is to insert debug or trace messages that get logged to a file. If the messages have a timestamp on them, then they help in profiling for performance. For fine-grained profiling, log messages can be so slow as to affect the timing. In such a case, the timing data might be gathered internally using cheap arithmetic operations, and then summary information emitted at the end. This is all the same, I would imagine, in most any form of engineering.

What's different with software is that it's soft. Software can be changed very easily, and then changed back again. If you're building something physical, then you can change the spec very easily, but building a new physical device to play with involves a non-trivial construction step. With software, the spec is the artifact. Change the spec and you've changed the artifact.

As such, a large number of debugging and profiling probes are not worth checking into version control and saving. Several times now I've watched a software engineer work on a performance problem, develop profiling probes as they do so, and then check in those probes when they are finished. The odd thing I've noticed, however, is that usually that same programmer started by disabling or deleting the stuff checked in by the previous programmer. Why is that? Why keep checking in code that the next guy usually deletes anyway?

This question quantifies over software projects, so it's rather hard to come up with a robust reason why it happens. Let me hazard two possible explanations.

One explanation is that it's simply very easy to develop new probes that do exactly what is desired. When the next engineer considers their small build-or-buy decision--build new probes, or buy the old ones--the expense of rebuilding is so small that the buy option is not very tempting.

The other explanation is that it's a problem of premature generalization. If you build a profiling system based on the one profiling problem you are facing right now, you are unlikely to support the next profiling problem that comes up. This is only a partial explanation, though. I see new profiling systems built all the time when the old one could have been made to work. Usually it's just easier to build a new one.

Whatever the reason, I am currently not a fan of checking in lots of debugging and profiling statements into version control. Keep the checked-in code lean and direct. Add extra probes when you need them, but take them back out before you commit. Instead of committing the probes to the code base, try to get your technique into your group's knowledge base. Write it up in English and then post it to a wiki or a forum. If you are fortunate enough to have Wave server, post it there.

Tuesday, December 14, 2010

Typing arithmetic in Datalog

Unlike in imperative languages, arithmetic in Datalog can execute in different orders. If you write a formula z=x+y in an imperative language, then the meaning is that you compute x, you compute y, and then you add them together to produce z. If you want to understand the typing of such an expression, you first find the types of x and y, and then look at the numeric conversion rules to see what the type of z must be. If x is a 32-bit integer and y is a 64-bit floating-point number, then z will be a 64-bit floating-point number.

In Datalog, there are additional options for using the same formula. One option is to do as in an imperative language, compute x and y, and then use the arithmetic formula to compute z. However, you could just as well compute x and z and then subtract to find y. In total there are four different ways to use the formula: you could use it to compute x, y, or z, or if all three are already computed, you could add them up and verify that they are in a proper relation with each other. How can we reason about the typing implications about an arithmetic formula matching?

Call such a constraint an arithmetic type constraint. Such a constraint must treat x, y, and z symmetrically, so think of them as being in a set. That is, an arithmetic type constraint is defined by a set of variables. If the variables are x, y, and z, then the arithmetic constraint for them could be written arith({x,y,z}).

It appears to work well to treat an arithmetic constraint as equivalent to the following set of individual constraints:

  // constraints for arith({x,y,z}) are:
  x <= lub(y, z)
  y <= lub(x, z)
  z <= lub(x, y)
In these constraints, "<=" means subtype-of, and "lub" means least upper bound. The lub of two types is the smallest type that is a supertype of both of them. For numeric types, humor me and think of subtyping as including valid numeric upcasts, so int32 is a subtype of float64. This post is already too long to get into subtyping versus numeric conversion.

Try these equations on a few examples, and it appears to work well. For example, if x and y are already known to be int32, then these constraints imply that z is no larger than lub(int32,int32), which is just int32. As another example, if x is an int32 but y and z are completely unknown, then these constraints give no new bound to anything. As a final example, if all three variables already have a known type, but one of them is a larger type than the other two, then the third one can be shrunk to the lub of the other two.

In general, these constraints will only allow the types of three variables to be in one of the following four combinations:

  • All three types are completely unknown.
  • One type is known but the other two are unknown.
  • All three types are the same.
  • Two of the types are the same, and the third type is a subtype of the other two.
The first two cases will eventually lead to a compile-time error, because the programmer has introduced variables but not provided a way to reliably bind those variables to values. The third option is a normal, happy case, where there are no numeric conversions being applied. The fourth option is most interesting, in my view, because there is an operation between two types and the result is the lub of those two types. Notably absent from the list is any combination where all three types are different.

As one final wrinkle, it's arguably more programmer friendly to always perform arithmetic on at least 32-bit if not 64-bit values. Otherwise, when a programmer writes 127+1, they might get 8-bit arithmetic and thus an overflow. To make arithmetic work that way, it's possible to add a minimum type to an arithmetic constraint. For example, arith({x,y,z},int64) would be an arithmetic constraint over x, y, and z, and a minimum result type of 64-bit integers. This would be equivalent to the following combination of constraints:

  // constraints for arith({x,y,z}, int64) are:
  x <= lub(y, z, int64)
  y <= lub(x, z, int64)
  z <= lub(x, y, int64)

Thursday, December 9, 2010

Published literature as fencing?

"Any discourse will have to be peer-reviewed in the same manner as our paper was, and go through a vetting process so that all discussion is properly moderated," wrote Felisa Wolfe-Simon of the NASA Astrobiology Institute. "The items you are presenting do not represent the proper way to engage in a scientific discourse and we will not respond in this manner."
Felisa Wolfe-Simon is responding here to attacks on a paper she recently published. This is a widely held view, that science takes place on some higher plane of discourse. In this view, ordinary speech is not enough to move the discussion forward. You must go through the publication process just in order to state your counter-argument. Science then progresses by an exchange of one high-minded paper after another.

Hogwash. This romantic picture has no relation to science in the fields I am familiar with.

A killer mismatch between this picture and reality is that counter-arguments are not publishable. If someone publishes the results of a horribly botched experiment, it would serve science to dissect that experiment and show the problem. However, there aren't any peer-reviewed journals to publish it in. If you take the quoted stance seriously, then you must believe it's not proper to criticize published research at all.

A second mismatch is that, in the fields I am familiar with, nobody in the field learns a major new result through the publication process. When someone has a new idea, they talk to their colleagues about it. They speak at seminars and workshops. They write messages to mailing lists about it. They recruit students to work on it, and students post all over the place. Everyone knows what everyone is working on and the way they are doing it. Everyone knows the new ideas long before they have any evidence for them, and they learn about the new pieces of evidence pretty quickly as well.

Researchers debate all right, but not via publication. They email lists. They write each other. They give public speeches attacking each other's ideas. Others in the field do all of the same, and they are often more convincing due to being less invested in the conclusions.

In short, declining to participate in discussions outside the publication process is often presented as some sort of high ground. This is a backwards and dangerous notion. It means that you are not defending your ideas in the forums that convince the experts.

Wednesday, November 24, 2010

Floating-point computation: if you don't care what you compute

There's an old saw among programmers that "I can make this program run arbitrarily fast, so long as you don't care what it produces". When you put it like that, it's obviously a bad idea: you always care about the output, even for a random number generator. In practice, this question comes up all the time, just not stated as starkly. Nowhere is this situation more common than with floating-point computations.

William Kahan has written a windy tour of error accumulation for floating-point and posted it online. The content is much like a blog, but he's publishing it as an ever-growing PDF file. Here are a few of the several interesting take-aways:
  • There's no "mindless" way to remove floating point error from your program. It takes some work.
  • Using high-precision floating-point is remarkably effective in practice. Take whatever size floating-point number you think you need, double the number of bits, and then compute over those.
  • Running your program in different rounding modes is remarkably effective at detecting whether your program is unstable. If rounding at the 53rd bit of precision makes any difference at all in your program's output, then your implementation is so unstable that you are probably generating bad results.

Thinking on these problems, I have a new-found love of interval arithmetic. As Kahan discusses in detail, interval arithmetic gives you a proven bound on how large your error can be. As he also discusses, though, if you mindlessly implement a mathematical formula exactly the way it is written, your intervals tend to be outrageously large. It is common, for example, for the proven error bars to double after every computation.

My question is why mainstream languages don't support interval arithmetic? If they did, then programmers could include asserts in their code about the size of the error bars at each step of a computation. Initially, these assertions would all fail, due to the intervals being way too large. However, with some cleverness and patience, programmers could get the intervals much smaller. Over time, programmers would build up a toolbox of techniques that apply in most situations. It would fit right in with how software engineering is performed nowadays.

Programming this way sounds like it would take longer to implement numerics than what I am used to. What's the alternative though? Graphics programs aside, don't we want to implement correct output, not just output that looks good? Imagine if a tech-savvy client asked how accurate our program's output was? We'd have to say, "I don't really know".

Friday, November 19, 2010

Can the FCC touch cable?

Brian Stelter says no:
There is little the Federal Communications Commission can say about Fox News or MSNBC since the channels are on cable, not delivered over the broadcast airwaves.

True, but don't rest too easy. The FCC has whatever authority that U.S. Congress gives it. It's been lobbying for explicit authority over cable and the Internet, and they'll get it if the public wants them to have it.

We have a chance to leave behind the bad old days where U.S. citizens are "protected" from seeing anything that D.C. folks would consider objectionable. Getting there requires that the FCC not regulate the new networks, and they really want to.

Tuesday, November 16, 2010

Mac computers for programmers?

I temporarily switched to an Apple Mac laptop, but after about two years I switched back to Linux. Here's why.

My original reasoning was that Macs support all the Unix goodness I value for programming, and furthermore they have better shell features such as network configuration and file managers. True, they have a higher price, and they are closed source, but it's worth spending money on good tools, and I don't plan to hack the kernel or the graphics engine, right? After a long sequence of individual observations, I came to realize this summation is not very accurate.

Software support. The software support can be summed up as: everything I need is available on Macs, none of it works better, some of it works worse, and a lot of it needs to be installed manually. In detail:
  • The X server is crashy. I lost a lot of time trying to make X apps work decently, but they just don't.
  • Eclipse works identically to Linux.
  • Java works identically, though for the longest time they were stuck at version 1.5 when the rest of the world was using 1.6.
  • There's no Debian or Ubuntu, but there is Fink. Fink has several essential packages, most importantly Latex. However, it's an order of magnitude smaller than the Linux distributions it mimicks. I like software distributions. They provide a great way for software developers to help each other. With Apple, you get what Apple gives you and the third-party market is much skimpier.
  • Gnu Emacs works fine. I experimented with Aquamacs for a long time, to try and do better than on Linux, but I found it had a lot of quirky problems.
  • Scripts work just like in Linux.
  • Scala and Scheme Shell work fine.
  • I tried the built-in calender, email, and contacts support. Ultimately I switched to Google equivalents, however, which are much better.
  • The built-in word processor and spreadsheet are just OpenOffice, the same as on Linux.
Operating system shell. Apple is famed as building great user interfaces. I'm coming to ask why that is, though. Sometimes they make something easy, and sometimes they make it impossible and then try to pretend like you'd never want to do it. It's as if they optimized for demos and didn't do any user studies. Over time, I found myselef replacing one chunk of the shell after another. Specifically:
  • Display management I will concede: it works great on a Mac. If I unplug an external monitor in a Mac, the display mode readjusts to display just on the built-in monitor. If I unplug an external monitor on Linux, it stays in that state, and it can be really difficult to get the display fixed back up.
  • Suspending to disk and to ram is about the same on both. I've had times where my Mac wouldn't resume, and I've had times when resuming a suspend-to-ram took longer than fully booting a Linux machine from scratch.
  • VPN software works better on Linux, in my experience. The Internet was developed on ancestors of Linux, so it has good networking in its genes.
  • The file manager on Linux is superior. I pulled my hair out trying to find a way to manipulate archive files (tar, zip, jar) with the Mac finder. There's nothing built in for it on a Mac, and I tried 2-3 extensions but none of them worked well. They act like archive files don't exist.
  • The default terminal program on a Mac is terrible. There's a decent replacement named iTerm, and iTerm is about par with gnome-terminal.
  • The window manager doesn't do multiple desktops. I hear there are extensions for that, but I never braved trying one.
  • The window manager shifts focus at the granularity of an application, and it has a notion of "application" that is terrible for programmers. If you click on any terminal window, they all come forward. If you click on any browser window, they all come forward. Likewise for hiding: you hide one terminal, they all go away. In general, the window manager is optimized for the way a graphics designer or an audio engineer works: 3-4 apps at a time at the most. When I program, I have 3-4 browser windows and 3-4 terminal windows, each with their own tabs internally.
  • The app launcher bar is okay, but I prefer how the Gnome launcher has a row of mini-icons plus a nice hierarchical menu of the whole system's software. I have tons of software installed, so a row of gigantic icons doesn't work.
  • Selecting wifi networks works well, but then it also does on Linux nowadays.
  • Network sharing is better on Macs. I use network sharing on a laptop maybe once a year, but when I do, it's easier on a Mac. On Linux, I have to install some extra software, pray my wifi driver has AP Mode, and read some docs on how to configure it all.

I would sum up the system software as sometimes much better and often much worse. Several aspects that work great out of the box on Linux, I have to install upgrades on a Mac to match it.

When I add up all the above individual things, I notice that there is hardly anything I particularly like about Macs, and there is quite a lot where I have to replace things to get it up to par. The applications I use aren't any better, and are sometimes worse. Surprisingly, the system software isn't particularly good, either. I vastly prefer Ubuntu to Fink -- it's just a larger, richer community. Also, while I don't hack the kernal or graphics software myself, it is nice to be able to apply patches that third-party people have written.

Macs are good computers for programmers, to be sure, but I'd still give Linux the edge.

Thursday, November 11, 2010

Two good reads on digital copyright

David Friedman raises an excellent thought experiment: what if the web had come first, rather than printed documents?
If the web had come first, issues of copyright and credit would have applied only to the rare case where someone chose to copy instead to link. Indeed, the relevant laws and norms might never have developed, since the very fact that what you were reading was a quote rather than a link, written by the quoter rather than the quotee, would be sufficient reason not to trust it.

I agree. The model we have is at odds with what makes sense on the Internet, and the Internet is already a much more important vehicle of communication than any print media. We should adjust our law to make sense for the Internet and let print gracefully decline as the preeminent way to share content.

Friedman's post is apropos for my own blog surfing, because I just now read Lawrence Lessig's For the Love of Culture that he posted back in January. It's a rich subject, so let me give two punchlines. Here's one:
Before we continue any further down this culturally asphyxiating road, can we think about it a little more? Before we release a gaggle of lawyers to police every quotation appearing in any book, can we stop for a moment to consider whether this way of organizing access to culture makes sense? Does this complexity get us something we would not get under the older system? Does this innovation in obsessive control produce any new understanding? Is it really progress?
Whether he is overstating things depends on your point of view. If you are Google, then all current law is just a hand shake with the president away from being changed to something else. It took Google to pull off Google Print. Larry and Sergei couldn't have done it alone when they were students, because it violated a thick cobweb of law, regulation, and copyright agreements. It's a disturbing state. Google Print involves an incomprehensible mess of legal agreements, but worse, the next hundred bright ideas about content sharing just aren't going to get off the ground.

How to arrange things differently is a big topic. Lessig has an important starting point in this comment:
We are about to change that past, radically. And the premise for that change is an accidental feature of the architecture of copyright law: that it regulates copies.
Focusing on copies is awkward when, on a computer, copies are ubiquitous. Computationally, copies are actually cheaper than actually displaying the content.

There are a lot of alternate approaches we could use than controlling the right to copy. As two examples, charging for performances and charging for access to a large archive are both possibilities. The first step, though, is to recognize that we have a problem.

Funmath notation for "calculating" on paper

I was recently pointed to the Funmath project. Funmath is a mathematical notation that is meant to make manipulation on paper be fast and reliable. It is headed by Raymond Boute.

I don't understand the notation well enough to say how well it works, but I most certainly appreciate the goal. If you learn the notation and rules of algebraic and calculus notation, you can fearlessly race through rewrites as fast as you can move your pencil. The notation means exactly what it looks like, and the rewrites can be done through simple mental pattern matching. To contrast, all manner of other parts of math notation don't work so well. If a formula involves existential or universal quantifiers, set comprehensions, summations, lists, or even functions, then there is a lot of "it means what I mean" in the notation people use. Each step of a derivation has to be very carefully considered, because you can't just pattern match. You have to do a deep parse and figure out exactly which of the possible options each bit of notation really means.

Boute has an interesting historical explanation of this difference. Algebraic notation developed before the dawn of computers, so people computed on paper all the time. As a result, the notation evolved to support paper computation. Logical notation, on the other hand, hasn't been as pressing for computation on paper. Interest has risen in the last few decades, but almost everyone involved is entering their formulas directly into computers. There was never a period of time when logical notation was more heavily used on paper than on computers, so a step was skipped.

In attempting to fill in that gap, Boute has applied design rules that are familiar in programming language design. Particularly interesting to me is that he handles all variable-binding forms by using a single notation for function literals. This is exactly like the function literals that get bandied about in programming-language design circles. In math notation, as in programming languages, having a lightweight syntax for function literals means you can cut out a lot of syntactic forms. In Funmath, summation and quantifiers don't get special syntax. They are library functions that take other functions as arguments. Summation takes a function and maps it to the sum of its range. Quantifiers take a function and map it to a truth value.

If Funmath is as advertised, then it is as much a step up for logic notation as Arabic numerals were over Roman. I wonder if anyone outside the core research group has given it a try?

Type checkers aren't just error detectors

Type checking is frequently put forward as a way to find errors that otherwise would have slipped through testing. I've never found this a very compelling argument, though. Before software is released, or even a patch committed to the shared repository, the developers go through various kinds of effort to convince themselves that the latest changes to the software behave correctly. There are a myriad of ways to do that, ranging across code inspection, testing, and formal methods, but whatever ways are chosen, the software doesn't move forward until it's been verified. At that point, not that many type errors can realistically remain.

There are larger advantages to having a type checker. Let me describe three.

Data Modelling. A lot of what programmers do is develop a model of the data their programs work on. All modelling approaches I know, bar none, involve using types. For example, ORM, UML, XML DTDs, Protobuf schemas, and SQL schemas all use types prominently. If you have types in the programming language, then the program itself can embed parts of the data model explicitly.

A Second Perspective. In all software groups I have been a part of, most of the software written gets far less code review than would be beneficial. Software gets much better just by having people scan their eyes across it and think about it, but practicing engineers dread spending the time to do it. A type checker forces programmers to understand their code two ways: the logical behavior, and the type signatures. By doing so, they force programmers to do a little bit more code review.

Adding rigidity. Typed languages have a higher percentage of sensible programs in them. Just like it's easier to dial a TV than a radio, it's easier to code in a typed language. You can turn a TV dial to just any old location; you have to choose among a small, discrete number of settings. When writing new program code, you can write a rough sketch and then use the type checker to guide you on the details. For refactoring, you can make an inital change and let the IDE guide you to the other ones. It's much faster than if you have to run the test suite more often or have to filter through raw text searches.

Wednesday, November 3, 2010

Which city will first have auto autos?

Automatic automobiles, that is. Robin Hanson asks:
So a huge upcoming policy question is: when will what big cities manage to coordinate to change road law to achieve these huge auto-auto economic gains? Thirty years from now we may look back and lament that big city politics was so broken that no big cities could manage it. Or perhaps history will celebrate how the first big city to do it dramatically increased its importance on the world scene.

A good question. The commenters point to Brad Templeton's site, where he has done a lot of work to analyze just that question.

I'll ask a milder question than Robin. Where's the first city where we can even drive [sic] one of these cars at all? Ted Turner, are you reading? Wouldn't you like to ride around Atlanta in an auto auto?

Copyright law versus audio archives

The U.S. Library of Congress writes:
"Were copyright law followed to the letter, little audio preservation would be undertaken. Were the law strictly enforced, it would brand virtually all audio preservation as illegal," the study concludes, "Copyright laws related to preservation are neither strictly followed nor strictly enforced. Consequently, some audio preservation is conducted."
More at OS News, which has a link to the 181-page study by the Library of Congress.

Hat Tip to James Robertson.

I'd be a lot more comfortable if the U.S. Congress simply passed reasonable legislation to begin with. I don't hold out hope for it. What does give me hope, however, is that cheap technology indirectly allows all sorts of common-sense copying activity to become de facto allowed.

Whatever paper fantasies Congress puts out, they aren't really going to lock up everyone who makes a mix tape or sets up a home media server. Historically, the tape recorder, the photocopier, and the VCR did wonders for fair use. Going forward, DRM-free Linux and Android computers can work similar magic for digital content.

Saturday, October 30, 2010

Learning can be measured

I've served as a teacher in a number of roles. I've taught math in a private high school, and I've taught undergraduate and graduate computer science. I also tutored constantly throughout all of my own schooling. Based on this experience, I'd like to emphasize one stance in the discussions that are going around about education reform this (and every) election season.

Learning can be measured.

Teachers know how to test their students to see whether they're learning what is intended. When I taught trigonometry and linear algebra, it really wasn't that hard to figure out which students were able to do it and which weren't. I gave them sample problems, gave them an hour, and then look at how they did on it. This gave tremendous insight into what the capabilities of the people in the class were. Any teacher who can't do this is basically failing at their job. It's just part of what teachers do.

Standardized tests are also pretty good. Granted, they have their problems. The questions leave little room for the grader to use judgment, and the graders don't have any extra information about the students than what is on the test. However, standardized tests also have benefits. The questions are much better devised and worded. They probe the student's skills in more ways, and so that answers to the questions more clearly indicate how the student is doing. The test makers have a larger view of their field than any individual teacher, so they avoid the temptation to grind an ax about some particular sub-sub-sub-topic. Additionally, the same lack of judgment that the graders have means that the grades are more objective. It is a more subtle story than I should get into in this post, but suffice to say that an apple a day for your teacher really does make a difference. Standardized tests can pierce through the reputation bubbles within a school and see how each student is really performing.

As it works out in practice, I have to say that standardized tests are quite good at measuring knowledge level, possibly even better than the home-grown tests. Most of my experience with standardized tests is at the high school level, but in that experience they're pretty good. I and my fellow students got exactly the grades that would be expected based on what we knew: we did well on standardized tests in our best areas, and we did badly in areas we didn't know so well. Further, from the talking I've done with more experienced high school teachers, they believe the tests, too. They can, more often than not, guess the exact grade on a scale of 1-5 that any student will get on an AP exam.

In short, measuring learning isn't too hard if you are willing to use standardized tests. Look at how the students do at the beginning and end of the year, and you'll know how well the teacher taught them.

I believe most teachers would agree with all of the above, but they say the opposite when it comes to measuring teachers themselves. I suppose no one likes oversight.

Friday, October 29, 2010

Scientific medicine

Thorfinn of Gene Expression has a great post up on the difficulty of generating knowledge, even in a relatively hard science like medicine:
Doctors believe in breaking fevers, though there is no evidence that helps. Flu shots also don’t seem to work. I’ve also mentioned how uclers came to be declared a disease due to “stress”, when in fact they were clearly due to bacterial infection. Meanwhile, several large-scale tests of medicine use — from the RAND insurance study, or the 2003 Medicare Drug expansion — find minimal evidence that more medicine leads to better health.
I think our body of medical knowledge does illustrate how hard it can be to generate reliable knowledge, even in cases when we can easily run numerous experiments on a randomized basis.

Softer sciences have an envy of the hard sciences. Their researchers envy how reliable the experimental results are in a physics or chemistry experiment. In the hard sciences, it's possible to do controlled experiments where all of the relevant variables are controlled. Further, the models are simple enough that there aren't a host of alternative models that can explain any experiment. For example, if your theory is that the acceleration due to gravity is the same for all masses of objects, and your experiment is consistent with that theory, it's hard to come up with any simpler theory that would explain the same thing. "It doesn't matter" is already as simple as it gets.

I spent a lot of time with the Learning Sciences group at Georgia Tech. While they put an admirably high effort into careful experimental validation of their tools, methods, and theories, they were quite frank that the experimental data were hard to draw inferences from. They could describe a situation, but they couldn't reliably tell you the why of a situation.

The problem is that even with randomized trials, there are so many variables that it's hard to draw any strong conclusions. There is always a plausible explanation based on one of the uncontrolled variables. For learning sciences, a particularly troublesome variable is the presence of an education researcher in the process. Students seem to always do better when there's an experimenter present. Take away the experimenter, and the whole social dynamic changes, and that has a bigger effect than the particular tool. Seymour Papert's Mindstorms is a notorious example. Papert paints a beautiful picture of students learning deep things in his Logo-based classrooms, a picture that has inspired large numbers of educators. I highly recommend it to any would-be teacher. However, nobody can replicate exactly what he describes. It seems you need Papert, not just his tools, and Papert is darned hard to emulate.

All too often we focus on a small effect that is dwarfed by the other variables. The teacher, the software engineer, and the musician are more important than the tools. In how many other areas of knowledge have we fallen into this trap? We ask a question that seems obviously the one to ask--Logo, or Basic? Emacs, or vi? Yet, that question is framed so badly that we are doomed to failure no matter how good are experiments are. We end up comparing clarinets to marimbas, and from that starting point we'll never understand harmony and rhythm.

Thursday, October 28, 2010

Against an Internet Blacklist

There is a bill in the U.S. Senate to set up a blacklist for American citizens:
The main mechanism of the bill is to interfere with the Internet's domain name system (DNS), which translates names like "" or "" into the IP addresses that computers use to communicate. The bill creates a blacklist of censored domains; the Attorney General can ask a court to place any website on the blacklist if infringement is "central" to the purpose of the site.

To draw an analogy, this is like ordering someone's phone line to be disconnected based on a simple court order. It's not a good plan even if it were limited to sites that were clearly infringing copyright. Shouldn't the site owner get a day in court before their access is cut off?

Needless to say, I don't think we should have a DNS blacklist in America. We shouldn't adopt totalitarian information control just to prop up the current crop of companies that are in industry. Indeed, why should we work so hard to prop up yesterday's business models, anyway? We may as well try to bring back the horse and buggy.

Saturday, October 23, 2010

Visa restrictions strike again

The band Incognito is not visiting Atlanta this year:
We did all I could to make this happen, but my band and I were not given the deposits that were agreed and after much toing a froing severe delays to our arrangements and our visa applications has made our deadlines impossible.

American visa requirements are holding hostage all sorts of beneficial social activity. It stops economic, intellectual, and in this case cultural improvement.

Fay and I had a wonderful time hearing Incognito play in Switzerland. They are an extraordinarily international band, having members from several different continents. Maybe they'll have better luck next year getting past the American border control.

Tuesday, October 5, 2010

The simple way to use CUPS from Windows

In the hopes of steering someone else away from a tar pit, don't bother messing around with Samba if all you want to do is print over the network to a Linux box. Just do two simple things:

  1. Set the printer up using CUPS, the standard print software for Linux.
  2. In the Windows "Add Printer" dialog, click the option that allows a URL, and paste in the URL to your printer. It will be like: http://<computer-ip>:631/printers/<cups-printer-name> . When it asks for a printer maker, specify "Generic" "MS Publisher Imagesetter".

Gah! I lost over an hour fooling around with Samba docs, because I hadn't realized Windows supports CUPS directly.

Hat tip to the BSDpants blog. They write:

So, I know!, Windows means samba and there's a port named "cups-samba". This must be just what I'm looking for. ... Well, it was kind of what I was looking for. But, it was too much trying to do things the Windows way. A little painful. What I discovered in digging around was that I didn't need cups-samba or even samba. CUPS by default leaves a port open for printing and you can use IPP to print directly to a CUPS printer via the network, even from Windows, and using generic, Windows Postscript printer drivers already on your Windows machine (probably).

I never understood why Windows network printing was any more complicated than this. Just have a standard protocol, and then any client can speak to it. All you should have to do on the client is paste in a URL.

Even now, I note that Windows is dying to have you select a driver. What's up with "MS Publisher Imagesetter" ? I wish they'd let go of the concept. If the user has joined the 1990s and specified a URL for their printer, that should be all they have to say about the matter. Anything so specialized that a URL alone is not enough could use some other mechanism.

Thursday, September 30, 2010

In praise of foreign workers

It's election season in the U.S., and the television is filled again with advertisements discussing foreign workers.

This is a large subject, but let me emphasize one thing: the ads have it backwards for software jobs. I want foreigners working with me. They are valuable members of the teams I've been on, and they create as many jobs as the take.

Whenever one of my coworkers has visa trouble, it's a real harm to the team. We lose lots of time, often days if not weeks, just due to the person filing papers, making phone calls, and travelling to offices. The national offices involved are far from friendly about the whole thing, either. They often keep bank hours, and they sometimes require presence in person. If you make any little mistake, the letters don't say, "You seem to have forgotten to file form IS-1042-T. Could you please resubmit it?" They are more like, "Get out, you rotten terrorist scum! If you aren't gone by tomorrow, your assets will be seized." This all leads to a situation where the person isn't in the best frame of mind to do good work.

Supposedly the point of this is to protect American workers. The economics behind that doesn't apply to software, however. Most of the ones I've worked with have a backlog of 5-10 times the amount of work they are doing that would be valuable to do if only they had a clone. When a foreign worker comes to the U.S. to work on computers, they don't knock someone else out of a job. They do one of those things that was previously being left on the table.

Moreover, having more people in the industry means that we all get smarter. They enrich the intellectual community. Smarter programmers are more productive, and more productive programmers make higher wages. Without foreign workers, we aren't as capable as we could be.

In short, I truly wish that most all barriers to foreign workers would be dropped in my industry. They're based on xenophobia and bigotry, and I'm embarrassed every time one of my coworkers must deal with it. If someone can get a computer job in the U.S., then let them come. They expand the pie by far more than they consume.

Sunday, September 12, 2010

It really was just about Flash

There are a number of platform wars going on right now, on various classes of computers. One of them is over the market for applications on consumer mobile devices. At the OS level, there are Android, iOS, Windows, RIM, and others. There are also cross-OS platforms, such as HTML and Flash. It's a good time to be on the buying side of a mobile device. Extraordinary levels of effort are being put into making each platform appeal to users.

Sometimes, though, the moves are not in consumers' interest. Apple's ban of alternate programming languages on the iPhone is just such a move. Jobs can say all he likes that Flash apps are inherently bad, but few truly agree. A more precise statement is that Flash, many feel, isn't the best possible tool in general. Programmers, however, are more important than the specific tools. I'm sure that the best Flash apps that were banned are better than the worst apps currently being allowed. If the app store simply focused on quality itself, rather than implementation technology, then iPhone users would get an improved selection of apps to install.

Jobs knows this, and so he hasn't really been blocking all alternate languages from his platform. Just the Flash ones:
Other cross-platform compiler makers had had no such trouble, even during the monthslong stretch when the now-obsolete Apple policy had supposedly been in effect. Both Appcelerator and Unity Technologies, which sell iOS programming tools, stressed on Thursday that developers using their compilers had been able to get ported programs into the App Store since April.

Sick stuff. Happily, as word came out, the legality of the approach is starting to fray. Apple needs to either explicitly and specifically block Flash--thus facing anti-trust issues--or drop the bogusly general block. They've now chosen to drop the general ban, which is really the best thing for users.

Sunday, September 5, 2010

The most important problem in computer science

Richard Lipton chooses The Projector Problem:
I believe that we are sorely in need of an Edison who can invent a projector system that actually works.

Here here. What makes this so hard? I've had a lot of opportunity to muse on it while waiting at the beginning of talks while people fiddle with projector and laptop settings. Here are the main ones that have come to mind:

  • It takes an obscure manual intervention to turn on the projector output. Thinkpads have the best option, but it's not saying much: you hold down Fn and press F5. On other systems, you have to fish around in the UI for the screen settings dialog. Given the importance of this problem, I think it deserves a large button right next to the VGA port.
  • The laptop doesn't detect the projector's resolution. I'm sure it can, because CRTs have had this ability for eons. It's vanishingly rare that anyone will want a resolution other than the projector's max resolution, but for some reason the screen settings UIs don't just do that for you. In many cases they don't even gray out the settings that aren't going to work on that projector.
  • The settings UIs are universally terrible for switching to projection mode. On an Apple, you are given a list of 10-20 resolutions, almost all of which are bad ideas. On Windows, you have to click to a separate tab to even get to the place where you can turn on projector output and modify resolution. The NVidia UI on Linux, meanwhile, takes the horror to a new level. It would take a small novel to describe it all, so let me just mention that it involves knowing what "Twinview" is. Thanks, NVidia. You took what should be a trivial problem and instead of just making it work, you are making it a teaching moment for customers to learn your brand names.
  • If you use an Apple product, you additionally have the problem of finding the right dongle before you can plug in. These things are like pencils and pens: no matter how much you replenish the supply, they keep disappearing. Once one conference room loses one of its dongles, people start borrowing them between rooms, so they all share in the pain. I tried carrying my own, but that doesn't work, because eventually I loan it out and it disappears. There must be some alternate universe that is collecting all the Apple dongles from this one. They really just disappear.

This doesn't seem like a terribly challenging problem, really. It just takes a laptop maker to consider it a problem worth solving. A laptop maker could, if it chose, have a VGA port with a big "Project" button next to it. When pressed, it would switch to mirrored display mode at the max resolution the projector supports, and it would pop up a dialog asking if everything looks ok. If the user clicks Yes, that's it -- done. If the user clicks no, it would switch back to the previous resolution and drop the user into a settings dialog.

Would any laptop maker care to do this, or are you all going to keep working on those gimicky CD player buttons?

Monday, August 30, 2010

Patents as Mutual Assured Destruction

The best way I can understand the popularity of software patents is that they protect incumbent companies from newcomers. Large incumbent companies accumulate patents, they use them to litigate against small newcomers who have no patents of their own, and they form patent-sharing agreements with each other to prevent the same thing from happening to them. Occasionally it comes back to bite one of the incumbents, but for the most part they seem to believe it comes out in their favor.

One place this arrangement fails, though, is if one of the incumbents decides not to play for the long term. See, the reason incumbents don't sue each other over patents is that they fear the counter-suit. It's classic M.A.D.: mutual assured destruction.

However, what if an incumbent is on their way out of the computer business, either because they are shifting focus or because they are retiring? Well, in that case, the fear of a counter-suit would be nonexistent, wouldn't it? Count me in as one who thinks Paul Allen's recent actions suggest he is planning to retire, or at the very least get out of computers. My next best guesses are that he is trying to make some sort of point, or that he is simply unsavvy about the software industry. Neither of these sounds especially likely.

Wednesday, August 11, 2010

Rubik's Cube's difficulty cracked

I'm late to notice this, but recently a team proved that a Rubik's Cube can always be solved in 20 moves or less, regardless of its initial configuration. It had already been proven, back in 1995, that at least one initial configuration requires 20 moves to solve. Thus, all positions can be solved in 20 moves, and some positions require the full 20.

What is particularly interesting is that the these researchers found the 20-move bound by having a computer solve all of the 4E19 initial positions exhaustively. The best proof that didn't do an exhaustive search only proved an upper bound of 22 moves.

I wonder if a human-digestible proof will be found for the 20-move upper bound, or if we'll be left with computers generating the stronger proof? At any rate, this is yet one more problem where a mathematical result depends crucially on some very heavy computation.

Thursday, July 8, 2010


People participating in online forums are better off being identified by pseudonyms rather than by their legal names. This is pretty engrained in me after many years of participating in such forums, so it takes some soul searching to explain. Let me try and distill out three points.

First, people have multiple parts of their lives, and they don't want them to mix. There are many reasons why this is good, but at the very least let's observe that this is how most people arrange their lives. There's work, and there's play. On the Internet, pseudonyms allow these separate lives to be separated more effectively.

Second, it fights prejudice. What makes prejudice so bad is not just that people are judged wrongly, but that they are judged wrongly using information that really should be irrelevant. Using pseudonyms means that this irrelevant information can be completely non-present. If your name is Julie or Juan or Duk-Kwan, you can expect to get a different--unfairly different--reaction if people learn your name, and thus your probable gender or ethnicity.

Finally, let me emphasize that pseudonyms are not anonymous. They are actual names, and they accumulate a reputation just like any other name. "Tom Cruise" is a pseudonym, but it's a name that has a very strong reputation (of one sort or another). So it goes with online pseudonyms, as well.

Given this, readers won't be surprised that I oppose Blizzard's trend toward using a "real" ID, "real" meaning a name on the credit card that pays for an account. Already, if you want to participate in cross-server chat on their games, you have to expose your credit-card name to everyone on your cross-server friends list. Now they are talking about changing the official forums to use credit-card names rather than

The idea seams to be that if people post under their credit-card names rather than their Warcraft character names, then they'll post better content to the forums. I don't agree this is a sufficient reason for the change, and I don't even think they are going to get the result the hope for.

Aside from all this heavy stuff, why in the world is a fantasy online computer game going this way? Grey Shade says it best:
But that’s it, you get it? That’s why I play. That’s why my friends play. Because we like to come home from a long day of being John Smith or Jane Doe and get on the computer and MURDER SOME REALLY AWESOME INTERNET DRAGONS.

UPDATE: Blizzard cancelled enforced real names on the forums, and said they are going to strive to prevent real names leaking in-game for people who want that. Good choices! Crisis averted. Everyone can go back, now, to killing Internet dragons.

Thursday, July 1, 2010

A foreign film I'd love to see

Java 4 ever is a hilarious trailer for a made up movie. It has all the cliches from a Hollywood warm-human-tale kinda movie, but instead of being about forbidden cross-sect love, it's about open computer standards.

Beware that the trailer is R rated at one point.

HT Ted Neward

Tuesday, June 29, 2010

Wrapping code is slow on Firefox

UPDATE: Filed as bug 576630 with Mozilla. It would be great if this slowdown can be removed, because wrapping chunks of code in a function wrapper is a widely useful tool to have available.

I just learned, to my dismay, that adding a single layer of wrapping around a body of JavaScript code can cause Firefox to really slow down. That is, there are cases where the following code takes a second to load:


Yet, the following equivalent code takes 30+ seconds to load:

(function() {

This is disappointing, because wrapping code inside a function is a straightforward way to control name visibility. If this code defines a bunch of new functions and vars, you might not want them all to be globally visible throughout a browser window. Yet, because of this parsing problem on Firefox, simply adding a wrapper function might not be a good idea.

After some investigation, the problem only arises when there are a lot of functions defined directly inside a single other function. Adding another layer of wrapping gets rid of the parse time problem. That is, the following parses very quickly:

(function() {
(function() {
(function() {

Of course, to use this approach, you have to make sure that the cross-references between the statements still work. In general this requires modifying the statements to install and read properties on some object that is shared among all the chunks.

Example Code and Timings

I wrote a Scala script named genslow.scala that generates two files: test.html and module.html. Load the first page in firefox, and it will cause a load of the second file into an iframe. An alert will pop up once all the code is loaded saying how long the load took.

There are three variables the top of the script that can be used to modify module.html. On my machine, I get the following timings:

default: 1,015 ms
jslink: 1,135 ms
wrapper: 34,288 ms
wrapper+jslink: 52,078 ms
wrapper+jslink+chunk: 1,188 ms

The timings were on Firefox 3.6.3 on Linux. I only report the first trial in the above table, but the pattern is robust across hitting reload.

Wednesday, June 23, 2010

Mass Effect for Xb^H^H Windows

I just got Mass Effect for Windows, but after reading the README file, I fear for the computer it will be installed on:

Game Known Issues
In Mass Effect you will occsaionally find elevators that connect different
locations. While riding in an elevator the game is loading significant amounts
of information and modifying data. We recommend against saving the game after
an elevator is activated until the player departs the elevator. Saving during
elevator trips can occasional cause unusual behaviors.

Okay, I can see that being hard to fix. Load/save systems are often tricky, and being between zones would only make it worse. It goes on, though:

Mass Effect does not run on a system using a GMA X3000 video card, a general
protection fault error appear after double clicking the start icon.

Um, wow. That's it? It just doesn't work if you have this card?

Mass Effect does not run optimally on the Sapphire Radeon x1550 series of video
cards. We recommend that Mass Effect is not played on a system with this video

Or that one?

Mass Effect does not run optimally on the NVIDIA GeForce 7100 series of video
cards. We recommend that Mass Effect is not played on a system with this video

That one, either? Methinks they should list the cards it does work with, and on the box, not in a README file.

Mass Effect does not run optimally on a computer with a Pentium 4 CPU with a
FSB below 800 MHz under Windows Vista. We recommend that Mass Effect is not
played on a system with this CPU and operating system combination.

Err, okay. This kinda goes along with "minimum system requirements".

The the NVIDIA 8800 Series of video cards can require significant time (30
seconds or more) to change resolutions. This is due to a required
recalculation of thousands of video shaders.

"Required". As if they couldn't have precomputed shaders for the 10-20 most common resolutions. As if any other game has this problem.

After reading this, I wasn't confident. Sure enough, I get a General Protection Fault on startup. As extra weirdness, it reports a "file not found" exception from within some graphics library.

Overall, I guess what the developers did is make the Xbox version first, and then make a half-hearted attempt to port to Windows. If I'd realized how flaky this is, I probably would have passed it over.

Friday, June 18, 2010

Commutative? Associative? Idemflabitical?!

Guy Steele has a noble goal in mind for the Fortress library:

I have noticed that when I give talks about Fortress and start talking about the importance of algebraic properties for parallel programming, I often see many pairs of eyes glaze over, if only a bit. It occurs to me that maybe not everyone is conversant or comfortable with the terminology of modern albegra (maybe they had a bad experience with the New Math back in the 1960s, a fate I barely escaped), and this may obscure the essential underlying ideas, which after all are not that difficult, and perhaps even obvious to the average programmer when explained in everyday terms.

It's a good goal. Using technical terms often obscures the real point one is trying to make. Worse, technical terms are often dress up a claim to sound like it says much more than it does. For all kinds of reasons, it is better to well-known informal terms whenever they will work.

Nonetheless, I am not so sure about changing terms like commutative and associative in the Fortress library. It looks like a case where the hard part is not in the terminology, but in the underlying theory itself. Once a programmer understands the theory well enough to work with the library, they'll almost certainly know the standard formal terms anyway.

A similar issue come up for Scala, where writers of very flexible libraries end up working with rather complicated static types. In such cases, there is no getting around understanding how the Scala type system works. The Scala collections library is deep mojo.

That doesn't mean the complexity must leak out to users of the library, however. In both cases, the designers of core libraries must think hard, because they are working with deep abstractions. If the designers do well, then users of these libraries will find everything working effortlessly.

Saturday, June 12, 2010

Pinker defends electronica

Steven Pinker just wrote a great defense of electronic communication. The highlight for me:

For a reality check today, take the state of science, which demands high levels of brainwork and is measured by clear benchmarks of discovery. These days scientists are never far from their e-mail, rarely touch paper and cannot lecture without PowerPoint. If electronic media were hazardous to intelligence, the quality of science would be plummeting. Yet discoveries are multiplying like fruit flies, and progress is dizzying.

The same can be said for software engineering. Email and instant messaging give huge productivity increases. In a nutshell, they help people work together.

On the other hand, I don't agree with this part:

The effects of consuming electronic media are also likely to be far more limited than the panic implies. Media critics write as if the brain takes on the qualities of whatever it consumes, the informational equivalent of “you are what you eat.” As with primitive peoples who believe that eating fierce animals will make them fierce, they assume that watching quick cuts in rock videos turns your mental life into quick cuts or that reading bullet points and Twitter postings turns your thoughts into bullet points and Twitter postings.

I believe that what you spend your time mentally consuming strongly affects how you think about things and how you come at new things you encounter. However, we shouldn't blame the media, but the content. Watching Bruno and watching the Matrix get a person thinking about entirely different things, but they come through the same medium. Likewise, reading Fail Blog and reading Metamodern put the mind in entirely different places, even though they're both blogs.

Saturday, June 5, 2010

Evidence from successful usage

One way to test an engineering technique is to see how projects that tried it have gone. If the project fails, you suspect the technique is bad. If the project succeeds, you suspect the technique is good. It's harder than it sounds to make use of such information, though. There are too few projects, and each one has many different peculiarities. It's unclear which peculiarities led to the success or the failure. In a word, these are experiments are natural rather than controlled.

One kind of information does shine through from such experiments, however. While they are poor at comparing or quantifying the value of different techniques, they at least let us see which techniques are viable. A successful project requires that all of the techniques used are at least tolerable, because otherwise the project would have fallen apart. Therefore, whenever a project succeeds, all the techniques it used must at least be viable. Those techniques might not be good, but they must at least not be fatally bad.

This kind of claim is weak, but the evidence for it is very strong. Thus I'm surprised how often I run into knowledgeable people saying that this or that technique is so bad that it would ruin any project it was used on. The most common example is that people love to say dynamically typed languages are useless. In my mind, there are too many successful sites written in PHP or Ruby to believe such a claim.

Even one successful project tells us a technique is viable. What if there are none? This question doesn't come up very often. If a few people try a technique and it's a complete stinker, they tend to stop trying, and they tend to stop pushing it. Once in a while, though....

Once in a while there's something like synchronous RPC in a web browser. The technique certainly gets talked about. However, I've been asking around for a year or two now, and I have not yet found even one good web site that uses it. Unless and until that changes, I have to believe that synchronous RPC in the browser isn't even viable. It's beyond awkward. If you try it, you won't end up with a site you feel is launchable.

Friday, May 28, 2010

"Free" as in not really

Apple is apparently going to start rejecting GPL apps, but the reason isn't what I expected.

The sticking point was that the App Store's terms of service says that a piece of software downloaded from the store can only be used on five devices. But the FSF said that the terms of service impose numerous legal restrictions on the use and distribution of GNU Go that are forbidden by GPLv2 section 6:

So, the FSF is considering it not redistributable enough that an application is available for free via the iPhone store. The quibble is something about, the receiver of the software should be able to further redistribute the code for free, versus telling people that they can download it themselves from the app store.

In general, the GNU license isn't all that "free" in any common definition of the word. It seems pretty darned free to me if anyone who has an iPhone at all is able to download the software, run it all they like, and even go grab the source code. It's hard for me to see this as anything other than the FSF trying to get negotiating leverage and make itself more important. The problem is, their efforts to gain power are involving steps that are against their mission. To promote free software, they're seeking power, and to seek power, they're blocking the distribution of free software.

Open source is not for everyone. However, if you really want to give away software, it seems to me it should be given away in some simple, intuitive way. Either public domain it, or, if that seems too radical, use a clearly free license such as the MIT license.

Wednesday, May 19, 2010

Microsoft's revenge

CNET reports that Microsoft is suing an obscure company I have never heard of for stealing Microsoft's patented ideas. The patents in question?

The patents cover a variety of back-end and user interface features, ranging from one covering a "system and method for providing and displaying a Web page having an embedded menu" to another that covers a "method and system for stacking toolbars in a computer display."

These patents are about routine programming, not about novel ideas that deserve over a decade of exclusive use. They shouldn't have even been granted. Yet, not only have the been granted, but similarly groundless patents have been upheld in the past. Who knows? Maybe this case will hold up, too.

I challenge anyone to read up on how patents are supposed to help the public, and then compare that to how they are actually working.

Sunday, May 16, 2010

That's my niece

The TCPalm found little Carolina just as adorable as her family does. I'm not sure I get what a good vs. bad bug is, but who cares? Any excuse to play with lady bugs.

Tuesday, May 11, 2010

"Stringly Typed" Code

One of several programming slang terms posted on Global Nerdy:

A riff on strongly-typed. Used to describe an implementation that needlessly relies on strings when programmer- and refactor-friendly options are available.

I really hate stringly typed code. It's convenient for a while, but it grows all kinds of weird bugs over time. Almost every conceivable corner case in the format [sic] tends to be broken, because they have to be reimplemented in every bit of code that processes the string.

I usually call it out as Alan Perlis's "stark data structure", but "stringly typed code" has a much better ring to it.

Drawing a line in the sand

Australia continues to play with the idea of censoring the parts of the Internet its people can see:

Governments and organisations around the world are intently watching Australia as the Federal Government continues to peddle the proposed ISP-level Internet filter, former GetUp executive director and AccessNow founder, Brett Solomon, has revealed.

I don't know exactly how this will play out. It will be an ugly blow by blow as various parties wrest for influence over this new source of power.

The longer trend is clearer, however. There will be ever more ideas about what to filter, and only rarely will anything be taken off the list. Further, while it might initially take a large supermajority to get something on the list, eventually everyone will get used to the idea of a pervasive filter. At that point, more like a simple majority will suffice.

Whatever is being said now, there will be quite a lot of things people work to add to the filter list over time. Here are a few from the top of my head:

  • Distasteful content with no artistic merit.
  • Content felt to be harmful to children.
  • Activity of suspects in major crime.
  • Activity of people capable of committing a crime (i.e. everyone).
  • Motion of copyright-protected material such as Hollywood movies.
  • Activity deemed bad for society, such as gambling.
  • Below-quota viewing of material deemed good for society, such as acclaim for the wonders of one's own country.

There's an even worse problem than that the list tends to grow. As more and more people are needed to enforce these filters, the quality of the people involved will decline. When there are ten wire taps a year, and the court cases authorizing them are publicized, the public can scrutinize all the agents involved and hold them to a high standard. It's like the death penalty in U.S. states--every case is hyperanalyzed. However, once the filtering is routine and pervasive, it takes a small army of employees to implement it. It becomes more like the income tax service or the policy forces, where there are so many people involved that there are many bad apples.

This is a case where there's no good solution but to draw a line in the sand and stick to principles. Each extra step of filtering is okay and will have proponents saying it makes everyone better off. However, the end result is a relatively repressive regime, one where neighbors spy on each other and decide what is appropriate for each other to view. If you think that creativity, insight, and entrepreneurship are important for society, then this is an order where society is missing something vital.

The historical place we draw the line in the United States is that people have free speech. All citizens can say what they like, and other citizens can listen to whomever they like. It is not up to the government, much less to random neighbors, what material is fit for citizens to view. If I lived in Australia, I would fight for just that line. Speech over the Internet is still speech, and it should be sacrosanct.

HT James Robinson.

Saturday, May 8, 2010

Closed platforms are not illegal

I think open operating systems are good for the world. They unleash the creative energy of the world to create new applications that the core maintainers would never dream up. Wikis, blog software, web browsers, and TeX are just the tip of an iceberg of ground-breaking applications that wasn't developed, licensed, approved, or even known about by the maintainers of the operating systems they ran on.

However, quite a few closed operating systems have been good for the world, too. I dare say gaming consoles have made human life better. If that sounds too fun and not puritan enough, then think about GPS navigators. Heck, just think about consumer devices that have software. If you want to put software in a Ford, you can't just hand out memory sticks. You have to talk with Ford, and if they say no, that's that.

Additionally, closed systems are ordinary. Nobody has particularly up in arms about Sony disallowing arbitrary development for Playstations. Nobody has said that Ford is being anti-competitive to control the software on its system. So why Apple?

I can speculate about why people are upset about Apple, but there is a more important point to establish: it's their right to close their system if they want. It's difficult to predict whether the net result will be a better or worse system, but at any rate it's their system. If you think they should do something different, then try to persuade them. Let's stop shy of force, however. We shouldn't allow major government figures to just step in and rough up any company they please so long as they have a mob of people cheering them on.

This is not to say that I see no role for law in the software world. There are plenty of basic business principles that it is tremendously helpful for the government to enforce. Contract, ownership, non-disclosure agreements, copyright -- these are all useful institutions to build and enforce. Another are that is ripe for action is open document initiatives: insist that government agencies release documentation only in open formats that are easily reimplemented and widely available.

When it comes to application platforms, however, it's premature. The area is just too dynamic. Expect a decade at the minimum for a major government to work out a reasonable system of law and enforcement for something new. Minimum. I would expect more like 20-30, and ideally there would be several approaches tried in different parts of the world before a final choice is made about how the thing should be done. For now, leave application platforms alone. If the government wants to do anything, hold up platform owner's rights to be gatekeepers for the content on them. Platform owners can opt to do otherwise, but if they don't, everything will still be fine. Once in a while, we'll even see a platform like the Wii: thoroughly closed, yet very much good for the world.

Let the flowers bloom. The iPhone is still budding, and we don't even know yet how it's going to fare once the initial fad runs out.

Saturday, April 24, 2010

SEC interested in smart contracts?

According to Jayanth Varma, the SEC is considering specifying parts of contracts using programs:

We are proposing to require that most ABS issuers file a computer program that gives effect to the flow of funds, or “waterfall,” provisions of the transaction. We are proposing that the computer program be filed on EDGAR in the form of downloadable source code in Python....

It's vindication for agoric computing!

I share the same question as Prof. Varma, however. Isn't it problematic that Python is ill-specified? It is surely convenient to write and read, but a contract needs to be especially rigorously defined. Using a program to describe a contract has the potential to be very reliable, but only if the programming language is particularly well defined.

There are much better specified alternatives. Some have suggested Java as well-specified, and I'd agree, so long as the precise libraries used are also delineated. A more tempting candidate for me would be Scheme or SML. In addition to being well-defined, functional languages lack overriding and thus yield programs that are easier to analyze. Object-oriented languages, by allowing every method to be overridden, yield tremendous power and convenience but make it harder to prove definitive statements about a program. Just imagine trying to say something about E=mc^2 when all of E, m, c, squaring, multiplication, and equality can be overridden.

Some commenters in the above blog article suggest spreadsheets as being a closer match to how financial analysts would like to work. Spreadsheets, it is claimed, would allow the analysts to examine intermediate results. That's very interesting, but are there any spreadsheet-friendly languages that are particularly well specified? All the ones I've encountered are defined by their implementation.

Thursday, April 22, 2010

ACTA released

Americans tend viciously debate every little detail of our own national policy. Yet, I've encountered very little interest in ACTA, an international trade negotiation that is just as potent as the much-maligned DMCA.

For any interested, there's a draft of ACTA now available to the public. Techdirt is underwhelmed:

Most of the language that critics have grown familiar with (making the bypassing of copy protection illegal even in cases of fair use, making copies of a large quality of content illegal even if no money is exchanged, mandating that ISPs become copyright nannies) remain at the heart of the ACTA. The agreement's central thrust continues to be to foist clearly dysfunctional, unreliable, and draconian U.S. DMCA-style copyright enforcement policies upon other countries.

I have admit, I'd rather see something with some more principles to it, rather than simply all the major content holders pulling up a chair to the table and divvying up the world. A good one to start with is that if a citizen buys and owns a copy of something, they can make further copies for personal use. I don't see how that kind of right could ever even be proposed at a gathering like ACTA.

Monday, April 12, 2010

Meta-platforms make bad apps?

John Gruber has a claim that Steve Jobs appears to endorse.

I can see two arguments here. On the one side, this rule should be good for quality. Cross-platform software toolkits have never — ever — produced top-notch native apps for Apple platforms. Not for the classic Mac OS, not for Mac OS X, and not for iPhone OS. Such apps generally have been downright crummy. On the other hand, perhaps iPhone users will be missing out on good apps that would have been released if not for this rule, but won’t now. I don’t think iPhone OS users are going to miss the sort of apps these cross-platform toolkits produce, though.

I used to believe this. I thought about Tcl/Tk apps, which are always fumbly and unpolished, versus native Apple and Windows apps, which at least sometimes have slick UIs. Thereofore, cross-platform toolkits are bad. It was a hasty generalization, however. Applications based on web frameworks, Java, or .Net are much better than those based on Tcl/Tk. They look just as good as native apps, and they fit the native look and feel just fine. Thus it comes down to an issue of functionality, and on functionality the native apps don't compete very well.

I spent a couple of years recently trying to use a Mac laptop and live the Apple dream of everything being compatible with everything. One by one, I grudgingly stopped using almost all of the native Apple software in favor of cross-platform software. Specifically:

  • Mail, contacts, and calendar form a trio that take good advantage of interop. However, the mail program can't thread, which I ultimately admitted is a show stopper for anyone participating on mailing lists. The calender program is okay, but I can't share events on it with other people, which again is a really important bit of functionality. I switched to web-based versions of both of these, after which the built-in contacts program had nothing to interoperate with. Thus I switched to a web-based contacts solution as well.
  • For editing documents, I tried to love the built-in NeoOffice. It's not really any better than stock builds of OpenOffice, though. Ultimately I stopped using OpenOffice, because there's no good problem for it to solve. For simple documents I used web-based tools, with their improved sharing and publishing affordances. For documents where I work harder, I use Latex.
  • For software development in Java, Intellij, Netbeans, and Eclipse are the heavy hitters. There wasn't even a native app for me to try. Apple supplies XCode, but I never really tried it. I'm not sure it even supports Java at all.
  • For all other software development, Emacs is so good I never even considered an alternative.
  • For web browsing, I initially tried to like Safari, and it's pretty good. However, I ended up wanting to use Firefox for a lot of things, for reasons I can't remember now. Ultimately, I ended up using Chrome.
  • For command-line access, I initially used the built-in Apple terminal program. I later switched to a different Apple-specific program called iTerm.

As can be seen, in almost every case, the best of breed software for Mac Laptop is built on a meta-platform--exactly the kind of meta-platform Jobs is banning from iPhones. Good thing he hasn't done so for Apple laptops, or they'd instantly become third-rate for working professionals! The native apps are okay for very basic usage, and they are sometimes prettier, but they're woefully underpowered once you try to do anything with them.

This emperor has no clothes. Anyone who currently believes that native Apple-specific software is better, ask yourself: what software do you use day to day? Is Apple-specific software really any better, or is it just cutesy demoware that is good enough if the main thing you do with a computer is show it to people?

I'm checking out of that game. If you use your computer to get work done, then you need good application software, not just a good OS. For that, you want a large pool of candidate software to draw from.

Apple trying to block choice of source language?

Much is being said about Apple's newly launched attempt to stop programmers from writing in a language they don't approve. Lambda the Ultimate is a good place to find lots of links.

This is a large grab. Apply is trying to control not only the technology used in the distributed software for an application, but also the methods used to make it. The requirements on those methods are not even that they be good methods; restricting the programming language does little to enforce good programming by itself. And the languages allowed aren't especially great for good programming.

Jobs claims--in the few terse comments Apple has made about this move--that he wants to prevent the iPhone from being commodotized. This is a standard lock-in kind of approach to doing well in business, but it's slightly odd given Apple's reputation for distributing quality software. One thing we can be sure of: if programmers aren't allowed to use good tools, they aren't going to make good software.

It's unclear whether this will be enforceable. Among other reasons, practically all software makes some use of its non-primary language, but presumably Apple doesn't want to ban all software from their market. Any configuration language is a programming language. The expression language of a spreadsheet is a programming language. If you do a web search for "cell phones -iphone", you're writing the query in a programming language. Heck, if the developers use use ant or make or Maven to build the program, they're writing their build rules in a programming language.

Little languages are just part of how software is written.

Wednesday, April 7, 2010

Some sanity on network neutrality

PC World writes that the courts have ruled against the FCC for throttling BitTorrent connections:

The FCC lacked "any statutorily mandated responsibility" to enforce network neutrality rules, wrote Judge David Tatel.

As I wrote before, I didn't understand how the FCC had this authority:

Part of my curiosity is exactly how the FCC has the jurisdiction to do this. The last I heard, network neutrality was voted down in Congress. Did it come up again with no one noticing, or is the FCC just reaching again?

I want a neutral network, but I don't see much our governments can do to help. I'm glad to hear that at the very least, the FCC must await enabling legislation to do any of this nonsense. It means that U.S. agencies are still to some degree bound by law.

It's a good thing in this case, too. The FCC wanted to prevent Comcast from throttling traffic. How is this really going to work out profitably in the long run?

The problem is that plenty of throttling is not just acceptable, but a good idea. Eliminate the bad throttling, and lots of good will be thrown out, too. The clearest example is that the diagnostic packets used by ping and traceroute should surely be prioritized higher than packets carrying regular data. Another is that streaming video should likely get down prioritized so that it doesn't hog the whole pipe. Another is that any individual node that is spamming would be good to get throttled down. Another is that a large network might have some parts of its network better connected than others; to prevent the low-bandwidth areas from being clogged, they might want to throttle incoming data into the well-connected parts. There are really quite a lot of legitimate uses for throttling.

Those examples are all compelling and ordinary, but it gets worse when we consider possible future business models. For example, suppose an ISP of the future openly advertises that it prefers some traffic over others, e.g. that it allocates sufficient bandwidth for VOIP that there are never any glitches in a conversation held over their network. Such a company could provide a great service to the general public, but good luck getting it past an FCC that categorically disallows any form of throttling. Regulatory agencies tend to clamp down an industry to work the way it currently works; new ways of doing things can't get off the ground.

Again, though, I'm all for trying to make a more neutral network. It's just that the U.S. government is so terrible with everything computer that its solutions are worse than the problems. If I desperately had to think of something for our governments to do, I'd suggest looking into last-mile issues: decouple Internet service from the service of physically hooking up at the last mile.

IP-level neutrality is not the biggest issue, though. It's already pretty good. A far bigger concern is the constant race between walled gardens and open systems. It doesn't matter if you can send data to any IP, but the only IP that matters is Apple's. The best cure for that is for customers to pay attention to lock-in.

Thursday, April 1, 2010

April Fool Picks

Cappuccino has solved the JavaScript memory management problem:

Lately, though, we’re beginning to realize that we didn’t quite go far enough. Memory issues have long plagued JavaScript developers. Because the garbage collector is opaque to the developer, and nothing like “finalize” is provided by the language, programmers often find themselves in situations where they are forced to hold on to an object reference for too long (or forever) creating a memory leak.

The Topeka search engine has some new auto suggests.

The Google Web Toolkit can now compile Quake for your browser.

The U.S.A. is leading the charge on transparent governance and on intellectual property.

Wednesday, March 3, 2010

Content carriers should not be liable

The liability held by content carriers is a long-simmering political battle that the general public would do well to be aware of. We face a clear choice between establishing some basic principles and rights versus turning over an increasingly important part of our lives to be divided up by moneyed interests.

A content carrier is any business that specializes in the delivery of information from one person to another. They include low-level delivery of data, such as Internet service, radio operators, and postal service. Increasingly, they include higher-level services that organize data, such as search engines (Google, Bing), video sharing (YouTube), special interest groups (, and classified ads (Craigslist, ebay). All of these services share the property that the business transmitting or organizing the data has no affiliation with the people and organizations posting data into the service.

The question is as follows. If someone posts something illegal on any of these services, who is liable?

A simple example is that a contractor might bill a client for more than they advertised. A tree remover mass-mails an advertisement saying they will remove trees for $100/tree. You hire them, they remove a single tree, and then they tell you the bill is $200. This tree guy has just committed false advertising.

If that advertisement goes through the postal mail, then no one is liable for the false advertising except the tree guy. A laundry list of enabling companies is not liable at all:

  • The company that xeroxed the fliers
  • The company that made the copy machines used to make the copies
  • The postmen who delivered the mail from the post office to each addressee
  • The postal service itself
  • The company that made the mailboxes
  • The company that made the software he wrote the fliers with
  • The company that made the computer that software runs on

Obvious, right? We punish the person who committed the crime.

Now consider the same situation if the communication is electronic. Suppose instead of a mass email, it was a web site that listed the false price. Look at all the different groups involved that someone or another is trying to make liable:

  • The web site hosting the web page, for allowing the content to go up.
  • The ISP of the guy who posted the bad advertisement, for failing to cut him off.
  • The ISP of the web hosting service, for letting the web service stay online.
  • The ISP of the individuals who encountered the web site, because they did not filter out the bogus site.

We should stick to our principles and say no to all of this nonsense. Content carriers should emphatically not be liable for policing the content. The police work easily overwhelms the work needed to do the job properly, thus shutting down advanced content carriers in their infancy. Just as bad, we end up with a society where everyone is the police, and they are all policing each other all the time. It's the electronic equivalent of having postal mail opened before it is delivered.

Charging the person who perfomed the crime is plenty. We don't sue the postman for carrying false advertising, and nor should we sue Craiglist when a "woman" turns out to be a transexual. Whenever it's not possible to catch the guy that actually did the crime, let's not waste our time going after the content carriers. They are bringing us into a new era of human society, and they can't do it if they also have to be a police force.