Sunday, August 23, 2009

Ethics oversight of computer science research

Most people wouldn't be aware, but in American universities, computer science research gets substantial oversight to ensure projects treat human subjects properly. I have no objection to this in principle, but the way it is done in practice is beyond useless. It's downright harmful. The Institutional Review Blog covers this issue continuously, and is a regular read of mine. Let me summarize the issue for anyone new to it.

The core problem is that computer science, lacking its own ethics review boards, has ended up under institutes that were designed with medical research in mind. Three main requirements on any approved medical research project are: every participant gives informed consent, the project is designed such that it will further science, and the risks to the participants are in line with the scientific results. So far so good.

Added into the mix is discrimination law. That probably starts to sound strange, especially given that practically any science project will do better to draw from an understood subset of the population rather than the whole. However, it makes sense for medical research. Since the approval process for drugs takes years, many times the only way a citizen can legally get access is to take part in a study. Thus, medical research is typically not just a scientific inquiry, but also an early adopter program.

This oversight drapes thick cobwebs over any project they apply to. The researcher must not only get consent, but do so in a way that can be archived and proved after the fact. Additionally, the researcher must extensively document the research design, the rationale that it will contribute to the field, the risks that subjects will face, and the counter-measures the project is taking. Some risk is practically unavoidable, and the researcher must document why they believe this is so. It adds up to quite a large proposal packet, entirely separate from the proposal that got the research funded.

Every bit of these proposal documents will be nitpicked by internal review boards. At a medical institution, the review boards will doubtless have substantive issues to consider. At an engineering university, however, the boards rarely find a true ethical problem to consider. As a result, they have no purpose unless they snipe at details and at low-status researchers. In my experience they are happy to do so. Worse, because the review boards are institute-wide, a review board for an engineering university almost certainly includes no one familiar with the field of any particular proposal. The boards aren't competent to evaluate the substantive parts of the proposal even if they try.

You can see how this would cause trouble for the progress of computer science. Human subjects in computer science research mostly just try out tools, respond to questionnaires, and take part in interviews. The risks are miniscule, but researchers are required to brainstorm up possible risks, anyway, just to follow form. Given the miniscule risks, there isn't much of a disclaimer for participants to consent to, but nonetheless it's necessary to draw one up and archive the proof of consent. The early adopter aspect is absent for academic research, because such researchers usually post their tools online, but nonetheless the researcher must carefully avoid discrimination issues. The latter is often difficult. For example, you can't discriminate against children, but children can't give consent. Finally, given all of these other non-issues, there is no remaining point for review boards to evaluate the research methods and potential scientific contribution. They do, however, and aside from their incompetence to do so, they become an effective political weapon for any high-status researcher that wants to shut down a young upstart.

In short, I believe engineering universities have gone wrong to adopt ethical oversight procedures copied from medical science. No university wants to openly say they have no ethics oversight, and apparently they are filling that void by mimicking what is done for medical research.

There are many ways to do better than this while still having ethical oversight. As one possibility, we could collectively define a category of research projects where the human subjects interaction is limited to trying out tools, taking questionnaires, and participating in interviews. Such a category should cover the bulk of human subjects research in many fields including computer science. Further, the issues would be generally the same for every research project in this category, so general-purpose practices could be developed rather than requiring each project to reinvent and defend their own procedures.

Granted, it's all academic to me. Being in industry, if I want to know what my customers think, I simply go ask them. I call them, I email them, I send them an instant message. Really, that's all an interview is, but American universities seem to have forgotten this.

Friday, August 21, 2009

Daily life of Swedes and Danes

Bryan Caplan's impressions about Danes and Swedes generally match mine about Danes and Swiss.
One of the most striking things about Denmark and Sweden: Almost everyone is overqualified for his job. The guy who sells train tickets doesn't just punch buttons and collect cash; he knows his regional transit network like the back of his hand, and eagerly helps you plan your trip.

I'd add to that generalization that both waiters and grocery store clerks are astoundingly well qualified compared to what I've seen in Atlanta and Greenville. In my stomping grounds, these are jobs for teenagers, and the grownups you see taking them are people struggling out of some kind of hardship. In Denmark and Switzerland, when you get to the checkout you tend to see a row of middle-aged people, all sharply dressed, talking back to you with considerable poise. It's weird, and as Bryan points out, it's a waste of talent.

That said, I must take issue with one of Bryan's observations. I don't know about Sweden, but Aarhus looks good, and Lausanne is the most beautiful place I've seen.

Tuesday, August 18, 2009

Optimizing not just for size, but compressed size

Ray Cromwell has come up with a new optimization goal for a compiler, and two examples of it:

I was drawn to a reversible transform the browser already includes support for: gzip compression, and decided to ask the question: what effect does the large-scale structure of the JS output code have on the DEFLATE algorithm of GZIP which is used to serve up compressed script? The answer it turns out, is substantial.

In short, he has found two semantics-preserving program transformations that have low impact on raw output code size, but have a big impact on what gzip can do with that output. One of them is to rename variables in a stable order, instead of the random order GWT previously used. The second is to reorder the top level function definitions so that textually similar functions are near each other. In combination these lead to a whopping 20% reduction in the compressed size of GWT's "Showcase" sample app.

The algorithms he has tried look like the tip of an iceberg of possibilities, yet the benefits are already huge.

Wednesday, August 12, 2009

Microsoft bitten by a software patent

The injunction (PDF), which becomes effective in 60 days, prohibits Microsoft from selling future Word products that allegedly use the patented technology. It also enjoins Microsoft from testing, demonstrating, marketing or offering support for those future products.

Davis also ordered Microsoft to pay i4i more than $290 million in damages.

- Nick Eaton at The Microsoft Blog


Large corporations like Microsoft like software patents because they are a barrier to entrance for new corporations. Large companies can cross-license with each other. Small companies, meanwhile, don't have their own patent portfolio, so they can be sued out of existence. Occasionally the tables are turned, as in this case, but apparently Microsoft still thinks it's worth it, at least as of 2007.

From all other perspectives than that of the largest software companies, software patents are failing. The above patent should have never been granted. Worse, a company with the legal resources of Microsoft should have had no trouble proving prior art and overturning the patent. As much as I relish the irony of this case, it isn't just, and I hope it is overturned on appeal. There is no sense that i4i's inventions are being protected. It's simply legal warfare.

I should write more about software patents. As a place to start, the meandering manifesto of the League for Programming Freedom has several highlights that have influenced me.