Wednesday, December 30, 2009

Run-time types

Guy Steele ruminated a few weeks ago about run-time types:

When Bill Joy and I were working on The Java Language Specification 14 years ago, we wrestled with the distinction between "types" and what many people were calling "run-time types". We found that the the term "run-time types" was causing some confusion, in part because some people seemed to think (a) that an object could have a compile-time type as well as a run-time type, and/or (b) a variable could have a run-time type as well as a compile-time type, and/or (c) that any type could serve as either a compile-time type or a run-time type. In the end, we tried to abolish the term "run-time type" in Java by abolishing it in the language specification.

This issue is coming up for the GWT compiler as we enrich the type hierarchy used within the compiler, only worse. Not only are there types in the source language that differ from the types available at run time, but there is a third type hierarchy used internally by the compiler. How do we make sense of all these type systems?

Steele and Joy have tried to eliminate the very notion of run-time types. However, as the post above indicates, they still haven't found a satisfactory replacement for it. One attempt is to use "class" instead of "run-time type", but how are we supposed to think about the run-time type of an array? Worse, what can GWT do, when it has not two but three type systems of interest? Steele wants a second word to replace "run-time type". GWT would need a third word.

With some trepidation, let me defend why I think run-time types are a useful concept. I have a feeling I am wandering into a quagmire, given the unease of others who have tried it, but so far I don't see exactly where the problem is. I'll give two motivations, and then I'll give some specific details on how I think of the type systems relating to each other.

The first reason to have run-time types is that it's the most obvious thing to call them. Java has a stark distinction between the types in the source language and the types you can test at run time. Thus, there are two systems of type-like things involved. At the same time, both systems are very type-like. They have a subtyping hierarchy, and they have a inclusion relation where some values are in the type and others are not. Thus, we are talking about type-like things that only exist at run time. It's simple English to call these things run-time types.

The second reason is that it works out well in formal language semantics. In Igarashi and Pierce's paper describing the erasure model for compiling Java, you start with a program using source-level types and "erase" parts of the program. For example, type List<String> becomes simply List; the String part is erased. After erasure, you still have a well-typed program. There's nothing special about it. The type system used for the new program is different than the type system for the old program, but it's still a type system in every way. Thus, run-time types appear to be well founded theoretically.

Based on motivations like these, I like to think of run-time types as a first-class concept. More generally, I think of all three of the type systems the GWT compiler sees as plain old type systems.

The trick, then, is how to think of the relations between these type systems. First of all, the GWT compiler largely ignores the Java source-level types. Those are modeled faithfully, and they are available during user-supplied code generation. However, as soon as all code has been loaded or generated, the types are erased, just like in a Java compiler that targets bytecode. Thus, the bulk of the compiler doesn't even see source-level types.

More interesting is the interpretation of the compiler's internal type system. I think of it as supplying extra information that the compiler knows to be true. For example, the internal type system has a "null type" that isn't available in either the source language's type system or the run-time type system. Whenever the compiler can prove that an expression will always evaluate to null, it can replace that expression's type with the null type. Whenever the compiler sees an expression of type null, it can use that for further optimization. For example, if it sees "foo == null ? a : b", and the type of "foo" is the null type, it can drop the "b" and replace it by "foo; a". It knows that "foo==null" will always be true.

The main mind-bender I have encountered with this approach has to do with the fact that compiler types can't be fully checked at run time. What is the meaning of a cast expression where the target type is something that won't be checked at run time? How should it be compiled? Should we feel compelled to add extra run-time checks corresponding to the enriched internal type system?

My current answer is to simply avoid the whole question. It's presently an invariant of the GWT compiler that every cast expression and every instanceof check tests against a run-time type. Problem solved. This approach is a cruder version of the "dcast" predicate used in Igarashi and Pierce. Looking forward, it would also be possible to add a second kind of cast operation, one that the compiler has proven will always succeed. Such a cast can cast to an arbitrary internal type, because it won't be emitted into the final output. So far, though, such a cast hasn't been strictly necessary.

Overall, run-time types look not only useful, but well founded and intuitive. They're types, and they're present at run time.

Tuesday, December 29, 2009

Browsers giving up on repeatedly failing script tags

Today's browser oddity is that Chrome instantly fails a script tag download if the requested URL has already failed a couple of times. It stops issuing network requests and fails the download before even trying. Probably other WebKit-based browsers do the same; I didn't check.

I can see why this behavior would make sense if you think of script tags as part of the static content of a page. If there are ten script tags requesting the same resource, you don't want to issue ten failing requests. However, it caught me by surprise, because I'm trying to use script tags as a way of downloading content over the web.

Browsers sure are a messy platform.

Wednesday, December 23, 2009

More software patent extortion

"Microsoft lost a patent case involving a company called I4i in May, after a jury ruled that Microsoft infringed one of i4i's patents with a custom XML feature found in Word. In August an injunction was placed on sales of Word pending the appeal, which did not go in Microsoft's favor Tuesday."
- from CNET

Sad news. Patent 5,787,449 has been upheld.

On the one hand, Microsoft deserves what it gets for supporting software patents. On the other, the patent system is just terrible. Not only does this patent cover boring and well-known techniques, it was actually upheld in court.

Tuesday, December 15, 2009

Detecting download failures with script tags

Matt Mastracci has done some experimentation and found that most browsers provide some callback or another for indicating that a script tag has failed to download. This is very interesting, because script tags don't have to follow the Same Origin Policy. Here is my replication, for the possible aid of anyone else wandering in these murky woods.

Available callbacks

The simplest callback is the onerror attribute. It can be attached to a script tag like this:

script.onerror = function() {
/* code here is called if the download fails */

For completeness, there is also an onload attribute. It's analagous to onerrer except that it indicates success rather than failure. It can be attached to a script tag like this:

script.onload = function() {
/* code here is called if the download succeeded */

Finally, IE supports onreadystatechange, similarly to the XHR attribute of the same name. The supplied callback will be invoked as the download progresses. The state of the download can be queried via the readyState attribute, which will reach state 'loaded' and/or 'complete'.

script.onreadystatechange= function () {
if (script.readyState == 'loaded') {
script.onreadystatechange = function () { } // prevent duplicate calls
/* error handling code goes here */


I used the test page to see which of the three events are fired on several browsers.

Loading a bad page:
Firefox 3.5: onerror
Safari 4: onerror
Chrome 4: onerror
IE 7: onreadystatechange
IE 8: onreadystatechange

Loading a good page:
Firefox 3.5: onload
Safari 4: onload
Chrome 4: onload
IE 7: onreadystatechange (if not cached)
IE 8: onreadystatechange (if not cached)


The onerror attribute works on all browsers but IE. For IE, onreadystatechange is available. Conveniently, no browser supports both of them, so a handler hooked up to both of them will fire exactly once.

A complication on IE is that onreadystatechange doesn't differentiate whether the download succeeded or not. Downloading a non-cached version looks the same as a download failure. Thus, any code using onreadystatechange needs to check whether the download succeeded or not.

Followup: Order of evaluation versus onreadystatechange

On IE, if onreadystatechange indicates the installation is complete, in what circumstances should the loading be considered to have failed?

I did a followup test where the loaded code (exists.js) does a window.alert. That way, I can see which happens first: the alert, or the onreadystatechange callback. On both IE7 and IE8, the alert happens first. That means if the script sets a global flag to true once it loads, the onreadystatechange callback can check it and reliably determine whether the download has succeeded.

Test script

<title>Test page</title>

function loadFile(url) {
var head = document.getElementsByTagName('head').item(0);
var script = document.createElement('script');
script.src = url;
script.onload = function() {
window.alert("onload called");
script.onerror = function() {
window.alert("onerror called");
script.onreadystatechange= function () {
if (script.readyState == 'loaded') {
script.onreadystatechange = function () { }
window.alert("onreadystatechange (" + script.readyState + ")");


function good() {
function bad() {

<input type="button" value="good" onclick="good()">
<input type="button" value="bad" onclick="bad()">

Friday, December 11, 2009

Tail call optimization vs. growable stacks

Guy Steele has written an interesting post about tail calls being needed to implement certain object-oriented designs:

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a "trivial and optional" optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.

I agree as far as it goes, and it's a very interesting example. However, it doesn't go far enough. What is really needed to make Guy's example work well is stacks that can grow beyond their initial fixed size.

In the example, a set is built using a sequence of "adjoin" operations that add one element to the set at a time. The adjoined set is implemented as a linked list. Whether or not this particular API needs implementation as a linked list, I can certainly agree that or some problems the implementer would be pushed into using such a structure.

Given this implementation, the relevant implementation of the "contains" operation looks like this:

contains(y: ZZ) = if (y = x) then true else s.contains(x) end

First check the head of the list, then recurse into the tail of the list. The problem is that if the list is large, this recursion will overflow any fixed-size stack. The proposed solution is tail call optimization, which keeps the stack from growing.

I agree, but tail call optimization doesn't go far enough for this sort of problem. Look in the very same example at the "contains" operation for "union" objects:

contains(y: ZZ) = s1.contains(x) OR: s2.contains(x)

This operation first checks the first set and then checks the second set. This time neither call is a tail call. Through careful rewriting, one could be made tail recursive, but not the other. To make them both tail recursive would require really butchering the clean algorithm and data structure as they stand.

To sum up the situation, tail call optimization helps for recursing over deeply linked lists, but not for recursing over deeply linked trees. It looks rather narrow to me.

As an aside, I agree with Gilad Bracha that you may as well implement tail call optimization, but the motivation for me is not deeply nested data structures. Instead, one application I think about is to support state machines as implemented via recursion. A related application is continuation passing style. These examples need tail call optimization to avoid the stack needs growing in proportion to the program's run time, so I certainly have nothing against tail call optimization.

What, though, can be done about recursion on deeply nested data structures? Barring programmers from doing so, as Java does, is a real constraint. Both the Scala compiler and the GWT compiler require users to specify very large stack sizes so as to support recursing over large abstract syntax trees. Most of the time the trees aren't that deep and the large stacks are wasted memory. Once in a while even the very large stack sizes aren't big enough. It's a real mess.

A more thorough solution to the problem would be to allow stacks to grow and shrink at run time. Some languages implement this feature by putting their stacks on the heap, and others, such as Go, have a stack that is a linked list of traditional contiguous stacks:

To make the stacks small, Go's run-time uses segmented stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time allocates (and frees) extension segments automatically. The overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number.

By providing Go with growable stacks, Rob Pike stands in contrast to many other language implementers. The reluctance from most implementers I believe is due to the extra overhead it adds to subroutine invocation. With a traditional fixed-size stack, a subroutine call can simply allocate a stack frame and race ahead. If stacks grow and shrink, then the allocation code must check whether the stack really has space left. If it doesn't, it needs to grow the stack. Additionally, a better implementation should probably shrink stacks, too. That means that the subroutine return code needs some overhead as well.

It's not entirely clear that this overhead is non-zero, much less large. For example, the calling convention code could be written so as to write a stack frame even when the space is getting tight, and only to trap into the "grow the stack" subroutine once the stack frame has been written all the way out. With such an arrangement, a modern CPU should be able to do the stack check in parallel to its work to write out the stack frame, so long as the two operations are carefully written to avoid data dependence. To avoid overhead for returning from a subroutine, a frame could be periodically placed on the stack that indicates that it's time for the stack to shrink. The frames could be inserted using similar techniques to the parallelizable calling convention to avoid overhead on most subroutine calls. Given such frames, most subroutine returns would do nothing special. Only returns that fall into one of these frames would need to do anything, and the frames can be set up such that their saved instruction pointer points into the "shrink the stack" subroutine.

That's one theoretical implementation strategy, but it's enough to suggest that growable stacks can be very inexpensive indeed. I will be very interested in how the performance numbers work out for Go as the language matures.

Tuesday, November 24, 2009

Evaluating a scientific theory

I mentioned there is a second issue I'd like to raise about the leaked emails between Phil Jones and Michael Mann: how can scientific theories be evaluated by non-specialists?

It is often claimed that they can't, but I don't agree. One of the distinguishing characteristics of a scientific theory is that it makes objective claims that can be evaluated by anyone. You don't have to wear the right regalia or attune yourself with a spirit walk. The theory should make objective claims, and the experiments should be repeatable.

For that matter, full-time workers in an area have challenges compared to an outsider. One is that they tend to be highly specialized, and thus not familiar with the arguments supporting the foundational issues of a field. For example, practically no one working on programming language design spends a lot of time really studying why it is that computers have proven useful in the larger economy. It's just taken for granted. A second problem is that anyone making a living is highly biased in favor of any theory that will keep that living going. PL folks are eager to accept any reason that PL is important.

So outsiders are not only capable, but an important part of the process. What should they do, though? Let me describe two strategies.

First, anyone can evaluate the ultimate predictions of a theory. You don't have to be a Galileo to observe the results of his famous falling bodies experiment. More closely to my world, you don't have to be a football expert to know that the University of Georgia's football team is doing poorly this year. It takes a specialist to make a better theory of gravity, and it takes a specialist to make a good football team, but anybody can look at the results and see whether they are any good or not.

A second approach is to explore the logical implications and structure of a theory. As a simple example, a theory that contradicts itself is clearly problematic. Any theory that concludes both A and not A is a broken theory. More commonly, a theory makes an obscure claim A that implies a more accessible conclusion B. If someone tells you about some interesting property of phlogiston, try to find out what observable implications it has on real fires. It might be that the implication B is plainly false, in which case again the theory is broken. More frequently, B is contentious but not certainly false. Proponents need to either go further out on a limb and accept B, or they need to weaken A so that it no longer implies B. It takes a specialist to choose which way to go, but anybody can evaluate the whole package of logical implications.

That's two ways to evaluate a scientific theory from the outside. Overall, I believe such evaluation is not only possible, but that it's necessary for a scientific field to be healthy.

Players make poor referees

The recently leaked emails between Phil Jones and Michael Mann raise a number of issues about scientific progress. I'd like to address two of them.

As background, the emails are between major researchers and activists in the climate change debate. Here is a sample of what has observers excited:
In one e-mail, the center's director, Phil Jones, writes Pennsylvania State University's Michael E. Mann and questions whether the work of academics that question the link between human activities and global warming deserve to make it into the prestigious IPCC report, which represents the global consensus view on climate science.

"I can't see either of these papers being in the next IPCC report," Jones writes. "Kevin and I will keep them out somehow -- even if we have to redefine what the peer-review literature is!"

Here we see two people influential with the IPCC conspiring to eject papers that conflict with their preferred conclusions. As a result, we cannot believe that the IPCC is giving a balanced summary of the research that is outstanding, thus undermining what the IPCC claims to do.

What to make of it? What I'd like to emphasize in this post is that it's not bad, by itself, that Jones and Mann are taking sides. The problem is that they are trying to wear two hats, two hats that are particularly incompatible.

To make an analogy, think of scientific claims as sports teams. How do you find out whether a particular sports team is any good? Really, there's no other way than to field the team against other sports teams that are also believed to be good. No amount of bravado, no amount of popularity, is really going to convince an unbiased observer that the team is really good. Ultimately, it needs to play against good teams and win.

The tricky part is here: What counts as playing against a good team? To resolve this, sports have rules that are laid out to be as objective as possible, and they have referees adjudicate the games to make sure the rules are followed. Referees are monitored to make sure that they are applying the rules correctly and fairly, but since the rules are objective, this is a relatively straightforward task. The team players, meanwhile, can try a variety of strategies and techniques. It's hard to judge whether the strategies and techniques are good by themselves, but it's not hard at all to tell who won a fairly refereed sports game.

Bringing it back to science, if Jones and Mann are to be faulted, it's because they are claiming to act as referees even though they are actively taking sides. I don't know the particulars of how the IPCC is organized nor of what influence these two have in it, but it doesn't take a specialist to know that players make poor referees.

Monday, November 23, 2009

David Kravets on ACTA

David Kravets writes in Wired that he is not real happy with ACTA, especially with the way it's being developed:
Dan Glickman, the MPAA’s chairman, informs lawmakers that millions of film-related jobs are in peril because of internet piracy. Simply put, those who don’t back the proposed Anti-Counterfeiting and Trade Agreement don’t support intellectual property rights, he wrote.

“Opponents of ACTA are either indifferent to this situation, or actively hostile toward efforts to improve copyright enforcement worldwide,” Glickman wrote.

That’s an insultingly black-and-white viewpoint. It’s also not an accurate description of the treaty’s critics.

I am not hostile to intellectual property in general, but I oppose ACTA. I simply think we should take it as a constraint that we not give up basic liberties like being able to copy a CD for a friend. I don't think we necessarily have to protect incumbents as technology improves, but in this case copyrights can still be protected and made profitable. For example, we could loosen copyright but tighten rights on public performance.

More fundamentally, this issue should be discussed in public, and with representation by people trying to devise a better way. The ACTA negotiations appear to be between two groups of people: national officials and current copyright holders. I expect the first group is offering police work, the second group taxes, and they'll close the deal once they've agreed on how much of each.

Friday, November 20, 2009

David Pollak is happy about using Scala

David Pollak is happy about using Scala:

I've been a JVM guy since '96, so finding a language that was as on the JVM was a plus for me. I was looking for a statically typed language with high performance, but with the syntactic economy of Ruby. I bounced around a couple of language listing sites and found Scala. Three years ago, I fell in total love with Scala. That love continues today.

I can't say I blame him.

Thursday, November 12, 2009

Google releases the Go language

The Go language's unique feature is very fast builds of large projects, due largely to careful support for intermediate libraries exporting a minimal API. Other aspects are: it is garbage collected, it has actors-like concurrency, it has objects, it uses structural typing for objects, it has updated syntax, and its syntax is downright quirky. It also leaves out a lot, including: inheritance, parametric types, and operator overloading.

Tuesday, November 10, 2009

Google releases Closure Compiler

From the announcement:
Closure Compiler is a JavaScript optimizer that compiles web apps down into compact, high-performance JavaScript code. The compiler removes dead code, then rewrites and minimizes what's left so that it will run fast on browsers' JavaScript engines. The compiler also checks syntax, variable references, and types, and warns about other common JavaScript pitfalls.

There's also a large library and a UI templating system.

Friday, November 6, 2009

Cory Doctorow on ACTA

Cory Doctorow is worried about what he finds. For example:

ISPs have to proactively police copyright on user-contributed material. This means that it will be impossible to run a service like Flickr or YouTube or Blogger, since hiring enough lawyers to ensure that the mountain of material uploaded every second isn't infringing will exceed any hope of profitability

Sites like YouTube have enough clout that I don't expect them to simply be shut down outright. On the contrary, their backing companies will be invited to the negotiating table, where they will help craft a treaty that has enough exceptions for them to operate.

That's great for YouTube, but there is a chilling problem for future innovation. It is unlikely that the next great network service will comply with whatever exceptions are carved out. Thus, if ACTA goes through, large swaths of potential society-changing services will end up requiring an act of congress to even get off the ground.

The Anti-Counterfeiting Trade Agreement (ACTA)

I believe copyright should be rethought in light of computers. We rethought it for the printing press, and computers are enough of a reason to rethink it again. Naturally, large incumbent companies, such as the ones represented by the RIAA, want to increase law enforcement enough that they can continue their business models even though the content is digitized and is distributed over the Internet.

That said, I even more strongly believe that the issue should be discussed in public, not worked out behind closed doors. The public should not be treated as masses of sheep on gaming board. The Anti-Counterfeiting Trade Agreement (ACTA) is being worked out, right now, in just that way. The only people invited are major government agents and their personal choices.

Naturally, the details of ACTA are hard to come by, but the emphasis seems to be on imposing liability on the third parties that distribute content. That's worrisome in general, because improvements in content distribution is one of the key ways that the Internet can continue to advance human well-being. For that matter, retiring dinosaur business models is also key to improving human well-being. I don't see anything good for the public coming from these negotiations. Perhaps that's why they are being held so quietly.

If the Obama Administration wants to prove its commitment to government transparency, there is an opportunity here.

Thursday, October 29, 2009

Worst reason ever to restrict a .com

As reported in the consumer and trade press this past week,,, and have engaged in a price war in the pre-sale of new hardcover bestsellers, including books from John Grisham, Stephen King, Barbara Kingsolver, Sarah Palin, and James Patterson. These books typically retail for between $25 and $35. As of writing of this letter, all three competitors are selling these and other titles for between $8.98 and $9.00.

That's what the American Booksellers Association claims is going to harm American citizens. Somehow, to me, low book prices sound like a huge success, not just for the companies involved, but for all reading people.

HT Overlawyered

Tuesday, October 27, 2009

FCC to regulate the Internet after all?

While network neutrality is a good goal, I don't see how to achieve it by turning the Internet into a totalitarian domain. It would be like achieving free speech by having the FCC constantly monitoring everyone. Unfortunately, the FCC is moving to do just this.

Part of my curiosity is exactly how the FCC has the jurisdiction to do this. The last I heard, network neutrality was voted down in Congress. Did it come up again with no one noticing, or is the FCC just reaching again? If I had known that legislation had come back to life I would have taken part in the debate.

Separately, the reason the FCC has chosen is just as weak as any of the others that have been put forward:
FCC Chairman Julius Genachowski said the rules are needed to ensure that broadband subscribers can access all legal Web sites and services, including Internet calling applications and video sites that compete with the broadband companies' core businesses.

Let's count the problems:

  1. Web sites are vitally interested in letting customers get to them. There are powerful forces in play already that will prevent that from happening. Skype is doing rather well, without the FCC's help. It's not even an American company!
  2. This problem hasn't happened before.
  3. Very powerful companies have tried. AOL, Prodigy, Compuserve, and MSN all tried to limit their users' access to the general Internet, and their customers simply left.
  4. There are credible attempts to divide up the Internet today, but the FCC isn't addressing them. Think of walled gardens like Facebook or Apple's iTunes. The FCC is fighting the last decade's battle.

Legal systems are very complicated programs run by millions of devious agents. If they are to do what we want from them, they have to be simple and straighforward, and even then, we might not get what we tried for.

Sunday, October 25, 2009

Geocities to be archived

Scott and his Archive Team are working to rescue GeoCities by downloading as much of its content as possible — which they estimate to be around ten terabytes. These historians recognize GeoCities as having played a critical role in the development of the Internet.

Neat! HT to James Robertson.

By the way, this bit of history is news to me:
...GeoCities' free hosting space became the home for thousands of sites built around thematically oriented "neighborhoods": conservation, fashion, military, sports, finance, travel, and more.

I had no idea about GeoCities' effort to try and organize these sites into neighborhoods. That interesting idea certainly didn't work out. The way things have gone has stuck with the way the WWW was originally envisioned: structure is induced by the links. Locales are simply cliques of sites that heavily interlink with each other, much like fields of knowledge are induced by authors who read each other and heavily refer to each other. There is a place for making that structure explicit, but it's already well enough handled by giant clearing-house blogs that try to link to every relevant site in an area. There's nothing much left for the self-slotted neighborhoods idea to help.

Again, kudos to the Archive Team for archiving Geocities. Researchers will have a good time in the future looking at these early web sites.

Tuesday, October 20, 2009

Misunderstanding what "exempt" means

One of the more absurd parts of the current way universities approach ethics review of research is the concept of "exempt". Many projects are categorically considered "exempt" from review because they are so obviously not a risk to humans that there is no need for review. For example, if all you are doing is taking a survey, your research is exempt. If all you do is examine test results in a classroom and draw inferences, your research is exempt.

Nonetheless, it's recommended that universities require review even for research that is exempt from review (emphasis mine):
The regulations do not specify who at an institution may determine that research is exempt under 45 CFR 46.101(b). However, OHRP recommends that, because of the potential for conflict of interest, investigators not be given the authority to make an independent determination that human subjects research is exempt.

This is extraordinary paranoia if taken literally. The non-exempt categories are bad enough. The exempt categories are easy to understand and have even more minimal risks. It's not like someone is going to misjudge whether their project is just giving out surveys when in fact it also involves injecting drugs into the students.

Further, some amount of personal judgment is an unavoidable part of any system of law or regulation. Even if every project is reviewed, who is to interpret the conclusions of the review rulings? Either the researcher has to interpret the rulings, or they need someone to make a further ruling on how to interpret the rulings. Assuming research actually happens, this recursion must cease and the researcher must eventually follow through with turning on a light switch or tying their shoes without directly getting review.

Additionally, how many researchers really comply? For example, looking at test results to draw conclusions about your students is exempt, which therefore means you are supposed to get it reviewed by your IRB. How many people really do this, though? In practice, requiring review of exempt research turns it even more into a system of selection enforcement -- a system that will essentially favor the higher-clout researchers over the lower-clout ones.

The saddest part of all this to me is that there is no one standing up for the advancement of knowledge. The higher-clout researchers have no problems with the extra regulations because they can just ask the reviewers to pass them. This makes it thus a benefit to them, because they can leave competing researchers to be tangled in the cobwebs. Where, in all of this, is anyone that is interested in real learning? I don't see it, and I'm not even sure where would be a good place to start. The inmates are running the asylum.

HT the Institutional Review Blog

Thursday, October 15, 2009

FTC to go after bloggers

The FCC is planning to start cracking down on bloggers who endorse a product without disclosing any interests they have in the product. The Citizen Media Law Project has a good analysis of the FCC's precise plans.

I have some more basic questions about this terrible development. Let me describe the main three.

First, shouldn't a question like this be decided in Congress? We aren't talking about the fine details of what the FCC will go after, but a major new category of speech they are going to limit. I would think such an issue should be decided in Congress. Has that already happened and I simply missed it? Where was the debate? The public consideration?

Second, what is really so special about the Internet version of these activities? I asked the same question about the Communications Decency Act, and I never found a satisfactory answer. Yes, it's terrible to mistreat children, but why do we need new law just because you do it over the Internet? Likewise, we have a carefully developed system for dealing with false advertisement and libel. What precisely should be different about these acts when they involve the Internet?

That raises the third issue. Why do we want a third party, the FCC, to bring these cases? It's a wonderful check on legal abuse when part of the burden of a case is to prove that the accuser has been personally harmed by the accused. It eliminates many frivolous cases, and it allows for meaningful settlements to be worked out. To contrast, if the FCC is supposedly standing up for the public, it's hard for them to make a fair settlement, because they don't really know what the amorphous public would settle for if they were actually asked.

For cases of libel, it seems utterly obvious that the entity who should bring the case is the one who was wrongly discredited by the speech in question. For any other party to do so is pure nosiness. For cases where an advertiser didn't disclose their financial incentives, I grant it's hard to identify who precisely is harmed and should bring the case. For that very reason, however, I don't see the appeal of that sort of law. If no identifiable person is harmed, then let the speakers speak. People who are shills for some corporation will see their reputation discredited rapidly without needing the cops sicked on them.

Overall, it's a historical oddity that the FTC has been allowed to police the content of communication. That's normally not allowed in the U.S., due to the First Amendment, but the FCC argued that broadcast is different because it is pervasive. It was a weak argument to begin with, but it's simply absurd for the Internet and for cable television. Nonetheless, organizations strive to survive, so now we see the FTC trying to maintain this branch of their activity as new forms of communication come out.

We shouldn't allow it. The FTC is supposed to report to the people, via our Congress. I hope Congress decides to exercise their oversight. We shouldn't let our speech rights erode just because the FTC wants something to do with their resources.

Sunday, October 11, 2009

One place you need Java source code rather than Java byte code

For the most part, Java tools work with byte code, not source code. If you dynamically load code in Java, it's byte code you will load. If you write a debugger for Java, the unit of single stepping will be the byte code. When a web browser downloads Java code to run, it downloads a "jar" of byte code. If you optimize Java code, you optimize from jars of byte code to better jars of byte code. If you run Findbugs to find errors in your code, you'll be running it across byte code.

So why not the Google Web Toolkit? One simple reason: GWT emits optimized JavaScript code. Byte code adds two challenges that aren't present for Java source code, yet it doesn't have the interop benefits that people hope for from byte code.

The first problem is that byte code has jumps rather than structured loops. If GWT reads from byte code, it would have to find a way to translate these to JavaScript, which has structured loops but does not have a general jumping instruction. This would be a game of trying to infer loops where possible, and then designing a general-case solution for the remaining cases. It would be a lot of work, and the output would suffer.

The second problem is that byte code does not include nesting expressions. Each byte code does one operation on one set of inputs, and then execution proceeds to the next operation. If this is translated directly to JavaScript, then the result would be long chains of tiny statements like "a = foo(); b = Bar(); c = a + b". It would be a lot of work to translate these back to efficient, nesting expressions like "c = foo() + bar()". Until that work got to a certain sophistication, GWT's output would suffer.

Finally, one must ask what the benefit would be. Certainly it would not be possible to run non-Java languages in this way. For the most part, such languages have at least one construct that doesn't map directly to byte code. In most cases, that mapping uses reflection in some way, and GWT doesn't even support reflection. To support such languages via byte code, GWT would have to reverse engineer what those byte codes came from.

Once you reach the point of reverse engineering the low-level format to infer what was intended in the high-level format, every engineer must ask if we couldn't simply use a higher-level format to begin with.

Monday, October 5, 2009

Comments on old articles

I don't think it is terribly helpful to comment on very old articles. I like blogs with comments because they keep the author honest and because they provide relevant pointers to other readers of the blog. Comments on new articles work well for that. Comments on old articles, however, aren't seen by regular readers of the blog. They are only seen by the blog author and by the other people who commented on the same article. That goal is better served by plain old email.

As such, I am going to reject comments on articles more than a month old, as a matter of policy. I would like to make that automatic, but I don't see how in Blogger's settings. Does anyone know how? For now, I'll just do it manually.

My apologies to people whose comments are dropped by this policy. I do read them, and I take them as a suggestion on what to write about.

Friday, October 2, 2009

Science versus activism

The theories of scientific progress I have read involve attacking theories mercilessly and keeping the simplest ones that stand up. Thus, the rationale for ejecting Mitchell Taylor from the Polar Bear Specialist Group (PBSG) is detrimental to science:

I do believe, as do many PBSG members, that for the sake of polar bear conservation, views that run counter to human induced climate change are extremely unhelpful. [...] I too was not surprised by the members not endorsing an invitation.

Gee, I would think that, for the sake of polar bear conservation, it is important to learn the truth.

On further reading, however, I'm not sure the PBSG is a scientific organization. From skimming their web site, they sounds more like a U.N. committee or an activist group. Such groups try to organize action, not to learn.

Wednesday, September 30, 2009

DLL hell as a job description

I've had a hard time explaining my views on package distribution. Let me try and wrap it up this way: DLL hell is a job description. Dealing with it is a work that has to be done, one large enough to consume full-time work. When DLL hell is really problematic is when that work falls on end users who aren't even software experts.

Break that down. First of all, I don't see how the root problem can be avoided. Large software is built modularly. The software is broken into components that are maintained and advanced by independent teams. No matter how hard they try, and no matter what advanced component system they use, those teams will always introduce unexpected incompatibilities between different versions of different components. I am highly sympathetic to WCOP's goal of controlling external dependencies, though I see it as a goal that can only be partially fulfilled.

Second, there is a role to play in finding versions of components that are compatible with each other. My favorite approach is that taken by Linux distributions, where within one distribution there is only one version of each component. Less centralized approaches are also plausible, but they have severe difficulties. An example would be Eclipse Plugins. However, in that example, it mostly appears to work well only when the functionality added is minor and the plugins being added don't interact with each other very much.

With all of that foundation, the DLL hell of Microsoft Windows is much easier to talk about: End users are playing the same role as those making a Linux distribution. Even computer specialists will resent this undesired work, and most users of Windows are not computer specialists. The only good way I see to fix this is to shift the distribution-making work onto some other group. That's challenging with Windows software being sold per copy, but perhaps it can be made to work with enough red tape. Alternatively, perhaps Windows could move over to a subscription model where the application bits can be freely copied. If the bits could be freely copied, then Windows distributions could sprout just as easily as Linux distributions have.

Wednesday, September 23, 2009

Public access to publicly funded research

Bravo to those supporting the Federal Research Public Access Act. If it becomes law, then publications resulting from publicly funded research would have to be made available to the general public.

The specific law makes sense as a part of accountability for public funds. If public funds are spent on research, then the public can rightfully demand that it sees the results of that research.

Additionally, it's just good for the progress of knowledge. We progress faster when any random person can take part in the scholarly debate taking part in journals. Currently, anyone interested has to either pay the dues or physically trek over to a library that has.

In addition to this act passing, it would be nice if the Association of Computing Machinery stopped hiding its sponsored conferences' publications behind a pay wall. The ACM is supposed to support the advancement of computing knowledge, not tax it.

Monday, September 21, 2009

Exclusively live code

Dead-for-now code splitting has the compiler divide up the program into smaller fragments of code that can be downloaded individually. How should the compiler take advantage of this ability? How should it define the chunks of code it divides up?

Mainly what GWT does is carve out code that is exclusively live once a particular split point in the program has been activated. Imagine a program with three split points A, B, and C. Some of the code in the program is needed initially, some is needed when only A has been activated, some is needed when only B has been activated, and some is needed once both are activated, etc. Formally, these are written as L({}), L({A}), L({B}), and L({A,B}). The "L" stands for "live", the opposite of "dead". The entire program is equivalent to L({A,B,C}), because GWT strips out any code that is not live under any assumptions.

The code exclusively live to A would be L({A,B,C})-L({B,C}). This is the code that is only needed once A has been activated. Such code is not live is B has been activated. It's not live when C has been activated. It's not live when both B and C together are activated. Because such code is not needed until A is activated, it's perfectly safe to delay loading this code until A is reached. That's just how the GWT code splitter works: it finds code exclusive to each split point and only loads that code once that split point is activated.

That's not the full story, though. Some code isn't exclusive to any fragment. Such code is all put into a leftovers fragment that must be loaded before any of the exclusive fragments. Dealing with the leftovers fragment is a significant issue for achieving good performance with a code-split program.

That's the gist of the approach: exclusively live fragments as the core approach plus a leftover fragment to hold odds and ends. It's natural to ask why, and indeed I'm far from positive it's ideal. It would be more intuitive, at least to me, to focus on positively live code such as L({A}) and L({B,C}). The challenges so far are to come up with such a scheme that generalizes to lots of split points. It's easy to dream up strategies that cause horrible compile time or bad web-browser caching behavior or both, problems that the current scheme doesn't have. The splitting strategy in GWT is viable, but there might well be better ones.

Wednesday, September 9, 2009

Microsoft sticks to their guns on synchronous code loading

Microsoft's Doloto has been reannounced, and it sounds like they are planning to stick to synchronous code loading:
Profiling information is used to calculate code coverage and a clustering strategy. This determines which functions are stubbed out and which are not and groups functions into batches which are downloaded together, called clusters.

I tentatively believe this approach can produce a somewhat reasonable user experience. It has some unfortunate problems, though, due to unpredictable functions being stubbed out. Any call to a stubbed out function, whichever ones the compiler chooses, will result in a blocking network operation to download more code. Whenever this happens:

  • Unrelated parts of the application are also paused, not just the parts that need the missing code.
  • There is no way for the programmer to sensibly deal with a download failure. The entire application has to halt.
  • There is no way to give a status update to the user indicating that a download is in progress.

These problems are fundamental. In practice, matters are even worse. On many browsers, a synchronous network download will lock up the entire browser, including tabs other than the one that issued the request. Locking an entire browser does not make for a good user experience. It does not make people feel like they are in good hands when they visit a web site.

GWT avoids these problems by insisting that the application never blocks on a code download. The request of a code download returns immediately. If a network download is required, then its success or failure is indicated with an asynchronous callback. Until that callback is triggered, the rest of the application keeps running.

Sunday, August 23, 2009

Ethics oversight of computer science research

Most people wouldn't be aware, but in American universities, computer science research gets substantial oversight to ensure projects treat human subjects properly. I have no objection to this in principle, but the way it is done in practice is beyond useless. It's downright harmful. The Institutional Review Blog covers this issue continuously, and is a regular read of mine. Let me summarize the issue for anyone new to it.

The core problem is that computer science, lacking its own ethics review boards, has ended up under institutes that were designed with medical research in mind. Three main requirements on any approved medical research project are: every participant gives informed consent, the project is designed such that it will further science, and the risks to the participants are in line with the scientific results. So far so good.

Added into the mix is discrimination law. That probably starts to sound strange, especially given that practically any science project will do better to draw from an understood subset of the population rather than the whole. However, it makes sense for medical research. Since the approval process for drugs takes years, many times the only way a citizen can legally get access is to take part in a study. Thus, medical research is typically not just a scientific inquiry, but also an early adopter program.

This oversight drapes thick cobwebs over any project they apply to. The researcher must not only get consent, but do so in a way that can be archived and proved after the fact. Additionally, the researcher must extensively document the research design, the rationale that it will contribute to the field, the risks that subjects will face, and the counter-measures the project is taking. Some risk is practically unavoidable, and the researcher must document why they believe this is so. It adds up to quite a large proposal packet, entirely separate from the proposal that got the research funded.

Every bit of these proposal documents will be nitpicked by internal review boards. At a medical institution, the review boards will doubtless have substantive issues to consider. At an engineering university, however, the boards rarely find a true ethical problem to consider. As a result, they have no purpose unless they snipe at details and at low-status researchers. In my experience they are happy to do so. Worse, because the review boards are institute-wide, a review board for an engineering university almost certainly includes no one familiar with the field of any particular proposal. The boards aren't competent to evaluate the substantive parts of the proposal even if they try.

You can see how this would cause trouble for the progress of computer science. Human subjects in computer science research mostly just try out tools, respond to questionnaires, and take part in interviews. The risks are miniscule, but researchers are required to brainstorm up possible risks, anyway, just to follow form. Given the miniscule risks, there isn't much of a disclaimer for participants to consent to, but nonetheless it's necessary to draw one up and archive the proof of consent. The early adopter aspect is absent for academic research, because such researchers usually post their tools online, but nonetheless the researcher must carefully avoid discrimination issues. The latter is often difficult. For example, you can't discriminate against children, but children can't give consent. Finally, given all of these other non-issues, there is no remaining point for review boards to evaluate the research methods and potential scientific contribution. They do, however, and aside from their incompetence to do so, they become an effective political weapon for any high-status researcher that wants to shut down a young upstart.

In short, I believe engineering universities have gone wrong to adopt ethical oversight procedures copied from medical science. No university wants to openly say they have no ethics oversight, and apparently they are filling that void by mimicking what is done for medical research.

There are many ways to do better than this while still having ethical oversight. As one possibility, we could collectively define a category of research projects where the human subjects interaction is limited to trying out tools, taking questionnaires, and participating in interviews. Such a category should cover the bulk of human subjects research in many fields including computer science. Further, the issues would be generally the same for every research project in this category, so general-purpose practices could be developed rather than requiring each project to reinvent and defend their own procedures.

Granted, it's all academic to me. Being in industry, if I want to know what my customers think, I simply go ask them. I call them, I email them, I send them an instant message. Really, that's all an interview is, but American universities seem to have forgotten this.

Friday, August 21, 2009

Daily life of Swedes and Danes

Bryan Caplan's impressions about Danes and Swedes generally match mine about Danes and Swiss.
One of the most striking things about Denmark and Sweden: Almost everyone is overqualified for his job. The guy who sells train tickets doesn't just punch buttons and collect cash; he knows his regional transit network like the back of his hand, and eagerly helps you plan your trip.

I'd add to that generalization that both waiters and grocery store clerks are astoundingly well qualified compared to what I've seen in Atlanta and Greenville. In my stomping grounds, these are jobs for teenagers, and the grownups you see taking them are people struggling out of some kind of hardship. In Denmark and Switzerland, when you get to the checkout you tend to see a row of middle-aged people, all sharply dressed, talking back to you with considerable poise. It's weird, and as Bryan points out, it's a waste of talent.

That said, I must take issue with one of Bryan's observations. I don't know about Sweden, but Aarhus looks good, and Lausanne is the most beautiful place I've seen.

Tuesday, August 18, 2009

Optimizing not just for size, but compressed size

Ray Cromwell has come up with a new optimization goal for a compiler, and two examples of it:

I was drawn to a reversible transform the browser already includes support for: gzip compression, and decided to ask the question: what effect does the large-scale structure of the JS output code have on the DEFLATE algorithm of GZIP which is used to serve up compressed script? The answer it turns out, is substantial.

In short, he has found two semantics-preserving program transformations that have low impact on raw output code size, but have a big impact on what gzip can do with that output. One of them is to rename variables in a stable order, instead of the random order GWT previously used. The second is to reorder the top level function definitions so that textually similar functions are near each other. In combination these lead to a whopping 20% reduction in the compressed size of GWT's "Showcase" sample app.

The algorithms he has tried look like the tip of an iceberg of possibilities, yet the benefits are already huge.

Wednesday, August 12, 2009

Microsoft bitten by a software patent

The injunction (PDF), which becomes effective in 60 days, prohibits Microsoft from selling future Word products that allegedly use the patented technology. It also enjoins Microsoft from testing, demonstrating, marketing or offering support for those future products.

Davis also ordered Microsoft to pay i4i more than $290 million in damages.

- Nick Eaton at The Microsoft Blog

Large corporations like Microsoft like software patents because they are a barrier to entrance for new corporations. Large companies can cross-license with each other. Small companies, meanwhile, don't have their own patent portfolio, so they can be sued out of existence. Occasionally the tables are turned, as in this case, but apparently Microsoft still thinks it's worth it, at least as of 2007.

From all other perspectives than that of the largest software companies, software patents are failing. The above patent should have never been granted. Worse, a company with the legal resources of Microsoft should have had no trouble proving prior art and overturning the patent. As much as I relish the irony of this case, it isn't just, and I hope it is overturned on appeal. There is no sense that i4i's inventions are being protected. It's simply legal warfare.

I should write more about software patents. As a place to start, the meandering manifesto of the League for Programming Freedom has several highlights that have influenced me.

Saturday, July 25, 2009

Dead-for-Now (DFN) Code Splitting

Demand-loading of program code is an old idea that is very important for web applications. Ajax applications typically have 2-4 megabytes of JavaScript code, and it takes enough time to download and parse that code that there are large user-visible pauses. Web sites need to go lightning fast whenever possible, and one tool in the toolbox is to load only part of the code to begin with and load the rest later.

This idea isn't new, of course. Unix loads programs on demand, one page at a time. When you start an application on Unix, the OS loads only its first page. Whenever execution goes off of that page onto a new page, it traps into the OS to load a new page. On machines too memory-constrained for that to work, a manual-control variation called memory overlays are often used. Also, on the web, Java applets attempted to solve this problem by loading one class at a time. Unlike the prior two approaches, the class-by-class demand loading of Java applets failed to ever become practical, and it was abandoned in favor of loading at the jar granularity.

For about a year now, I've been maintaining GWT's implementation of demand loading, which was designed by Bruce Johnson. One of Bruce's key ideas was to break with the page-by-page loading done for applications loading from disks. Java applets initially stuck to that model, but there are severe problems with it, essentially equivalent to the problems of synchronous XHR that I previously posted about. Bruce is in the "yes, it's evil" camp, and I agree. The problems are just too hard to overcome. Therefore, a key part of GWT's demand loading system is that whenever you request code that might not be available yet, you supply a callback that is invoked asynchronously.

Another key part is that deployed GWT apps always go through a whole-program compile. As a result, GWT's code splitter can use static analysis as part of the system. A common static analysis is the identification of dead code, code that is part of a program but will never run. We like to say that GWT identifies code that is "dead for now."

Tuesday, July 14, 2009

The Fable of the Fable

Gene Expression is normally an interesting blog, but they have
now joined the ranks of Qwerty basher bashing.
They start by being merely incorrect:

In 1990, Stan Liebowitz and Stephen Margolis wrote an article
detailing the history of the now standard QWERTY keyboard layout
vs. its main competitor, the Dvorak Simplified Keyboard. In brief,
the greatest results in favor of the DSK came from a study that was
never officially published and that was headed by none other than
Dvorak himself. Later, when researchers tried to devise more
controlled experiments, the supposed superiority of the DSK mostly

They then go from this bad footing to launch stink bombs like this one:

Shielded from the dynamics of survival-of-the-fittest, all manner of
silly ideas can catch on and become endemic. In this case, the
enduring popularity of the idea is accounted for by the
Microsoft-hating religion of most academics and of geeks outside the
universities. For them, Microsoft is not a company that introduced the
best word processors and spreadsheets to date, and that is largely
responsible for driving down software prices, but instead a folk devil
upon which the cult projects whatever evil forces it can dream
up. Psychologically, though, it's pretty tough to just make shit up
like that. It's easier to give it the veneer of science -- and that's
just what the ideas behind the QWERTY and Betamax examples were able
to give them.

To take this backwards, I read the "Fable" article they link back in the 90s,
and I found it uncompelling. I went on to spend a month retraining with
Dvorak, because I expect to be typing for at least another decade. I believe
the benefits of Dvorak are modest, but given the enormous amount of
typing I expect to do, it will add up.

Microsoft never entered the picture except in a positive way. It occurred to me
at the time that, unlike with the computers of my childhood, computers
with operating systems like Microsoft's make it exceptionally easy to switch
the layout to Dvorak. Far from believing that Microsoft locked in Qwerty,
I have considered them part of the revolution. For me, Microsoft
does not explain the mystery. It deepens it.

The case for Dvorak looks good. The famous "Fables" article
mostly only argues that that Dvorak's study, from nearly a century
ago, is by itself not very compelling. It ignores substantial other
evidence that has accumulated since then. For one, the fastest
typist in the world prefers Dvorak
. Further, the evidence is
overwhelming that Dvorak reduces finger travel distance
when typing English text. By Occam's Razor, we should
assume until proven otherwise that this longer finger distance
implies slower speed and greater wear and tear.

At this point, it looks like Dvorak really is better. The economic
study of how we got here, far from being based on a fable,
is a legitimate scientific probe into a surprising aspect of
our world.

Thursday, June 25, 2009

Not a good job for the FTC

According to the Associated Press:

New guidelines, expected to be approved late this summer with possible modifications, would clarify that the [Federal Trade Commission] can go after bloggers - as well as the companies that compensate them - for any false claims or failure to disclose conflicts of interest.

Such activity seems out of place for American society. We normally allow wide latitude in speech about products, and because of that we get robust protection against biased claims. While people can claim any number of things I would rather they didn't, it is equally easy for others to rebut those claims. Each individual who cares then holds a personal court to weigh the arguments they have heard. It's a beautiful system in theory, and it seems to have worked well in practice. Societies where speech is monitored for impropriety, in addition to being rather unpleasant, tend to have individuals with worse information. The censors damp down everything, not just the bad information.

Given this rich history, what is the argument for doing things differently for blogs? After such good experience with free speech, why flirt now with gossip police?

My best guess is that the FTC is fighting to expand its scope. Especially if broadcast TV declines, which seems inevitable given its many competitors, the FTC might fear having a number of censors on hire but no content to monitor. That would be bad for the FTC, but it looks good for the public.

Thursday, June 4, 2009

Can you spot Frankie?

He's skittish, and he hides all the time. However, he's not very good at it.

Friday, May 29, 2009

Google Wave Announced

Google Wave was announced yesterday.  If you haven't yet, check it out.  It provides a hosted messaging service that, with a few simple features, supports all of: email, instant messaging, and collaborative document editing.  It also has an open extension mechanism, so it can be used as a substrate for specialized applications, like blogging, photo album sharing, and issue tracking.

I haven't perused all the web information myself, but I hear that the Tech Crunch article is quite good.  Also, if you have some time, the demo is mind-blowing.

Features and vision aside, I must say they made an excellent choice in implementation technology.

Saturday, May 23, 2009

Star Trek Rocks (spoiler free)

The new Star Trek movie is just great.  If you liked the original series, go see this.  If you liked making fun of the original series, still go see it.  Just be careful what you say to your Trekkie friends, because they might bite.

The new movie has all the old characters back, but this time with more life and before their rough edges were sanded off.  McCoy is more paranoid than ever.  Spock is more infatuated with logic than ever, yet less equipped due to being just out of college.  Scotty is crazier and drunker than ever.

Somehow they all come off as much less stiff this time around.  In the original series, you could almost see the rod going up their backs--whooooaaa, Trekkies.... take it easy.  They looked like they could put on a snappy 50s-era suit and be right at home carrying a briefcase and shaking hands.  This time, they have big gestures.  They move around.  When Scotty is needed in engineering, or Chekov is needed in the transporter room, they RUN there.  Except Spock.  He's an uptight nerd with pursed lips and a bad poker face, and it stands out wonderfully as the guy you don't want at a party.  My favorite?  The security guards are no longer polite.  You meet them as they start a bar brawl and beat the crap out of someone you like.  Later, when the stuff hits the fan, you're glad they are there.

The fight scenes are upgraded, too.  Kirk still has his signature endless haymakers--gut and face and gut and face.  The others have branched out, though.  Best is that one of them fights with a sword.  All movies should have ninjas or samurais, if not both, and now Star Trek has a futuristic samurai.

I'll admit that the movie is a rather strong cheese, difficult at times to get down.  I love that they say all their old lines, straight, but those lines have been jokes for so long that it's hard to stay in the movie when they say them.  Many scenes are way beyond believable, and you just have to go with it.  Chance encounters, gauntlets of fire, diseases with effects that aren't physically possible--they're all there, and you have to simply enjoy the cheese.

Overall, I give it a hearty thumbs up.  Go see Star Trek if you haven't already.

Monday, May 11, 2009

One cat, a chihuahua, and a turtle

After a sudden decline, our cat Einstein had to be put to sleep after fifteen years.  He had been stable but slow-moving for months.  Last week, soon after a slight change in his meds, he hit a hard enough bump in the road that he just stopped eating.

Deciding how long to give him before letting him go was an awfully hard decision.  This cat has been with Fay through multiple life phases.  We wanted to give him every chance possible, but not to draw things out and make him miserable.  We did our best.

Here he is playing with the baby cat:

Here he is concentrating on something he heard outside:

Bye bye, kittie.  Our place is emptier now.

Friday, April 3, 2009

A high-level language on the JVM

Alex Payne: I think programmers who’ve never worked with a language with pattern matching before should be prepared to have that change their perceptions about programming. I was talking to a group of mostly Mac programmers, largely Objective-C developers. I was trying to convey to them that once you start working with pattern matching, you’ll never want to use a language without it again. It’s such a common thing that a programmer does every day. I have a collection of stuff. Let me pick certain needles out of this haystack, whether its based on a class or their contents, it’s such a powerful tool. It’s so great.

Robey Pointer: I wanted to talk a bit more about starting to use Scala. It definitely wasn’t a flippant choice we made over a few beers one night. We actually agonized over it for quite a while. Maybe not agonized, but certainly discussed it for a long time. One of the biggest draws for us to Scala as opposed to another language, was that once you’d started writing in a really high level language like Ruby, it can be difficult and kind of annoying to go back to a medium level language like Java, where you have to type a lot of code to get the same effect. That was a really big draw for us. With Scala we could still write this really high level code, but be on the JVM.

That's from an Artima interview about the guys behind Twitter evaluating Scala.

It sounds about right to me.  Scala is adventuresome in the high-level features it gives you access to, e.g. pattern-matching, higher-order functions, and mixins.  However, it runs and is integrated with the JVM, so you can always use a Java library or write in low-level Java if you run into performance issues or need some API that Scala doesn't provide.

Saturday, March 14, 2009

SMTP relaying from OS/X

I like to relay my laptop's mail through a personal mail relay, so that no matter where I send mail from, it has a tunnel to a mail server that can send it on to the outside world.  I previously posted the details for doing this on a Debian laptop.  Here's what I have found for an OS/X laptop.

There are two things that I have found that should be changed to use the external relay: and Postfix., the built-in mail reader, can be configured through its preferences GUI.  I don't remember all the individual steps, but I remember it being straightforward.  The first time you send email, you'll be warned about an untrusted certificate.  In that dialog, you can specify both that the certificate is good for sending email through, and that the certificate is in general legitimate.

The other thing that might need modifying, depending on what you do with your laptop, is Postfix.  From some web searching, I found this page by Michael Prokop.  In short, the magic contents of /etc/postfix/ are as follows:

relayhost =
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/smtp_auth
smtp_sasl_security_options = noanonymous

Note that if your mail relay runs on port 26, you can put the port number on the relayhost line, like this:

relayhost =

You then have to set up /etc/postfix/smtp_auth with your client password information.

Overall, it takes a few minutes to set it all up, but you end up with your laptop being able to send email from pretty much any Internet connection.  I'm posting it here in case anyone else finds it convenient.

Friday, March 13, 2009

Ivan Krstic on security-friendly languages

Ivan Krstic writes:
Now, while I’m already grossly overgeneralizing, I think the first group is almost useless, the second group is almost irrelevant, and the third group is absolutely horrible at explaining what the hell they’re talking about.
He and the commenters then give some good introductory links.

Wednesday, March 4, 2009

Installing top-level code with JavaScript's eval

JavaScript is wonderfully dynamic, so it is odd that its eval function is so unportable.  I already knew that it was tricky if not impossible to use eval to install code in an arbitrary nested scope.  Today I learned that even the simple case of installing code into the global scope is different on each browser. Here's what I found after some digging around on the web and some experimentation.

First, there are a lot of web pages discussing this topic.  Here's one of the first ones I read, that tipped me off that there is a well-known problem:

The following page also discusses the problem, but has a really good collection of comments:
UPDATE: Prototype has gone through the same issue, and come up with similar conclusions as mine.  Here is a page with all the bike shedding:

Based on reading these and on tinkering on different web browsers, here are some techniques that look interesting:
  1. window.eval, what I tried to begin with
  2. window.eval, but with a with() clause around it.  Some people report better luck this way.
  3. window.execScript, a variant of window.eval
  4. window.setTimeout
  5. adding a script tag to the document

What I did in each case was try to use the technique to define a function foo() at the global scope, and then try to call it.  I tested these browsers, which I happen to have handy:
  1. Safari/Mac 3.1.1
  2. Firefox/Mac 3.0.6
  3. Firefox/Linux
  4. Firefox/Windows 3.0.3
  5. IE 6.0.2900.xpsp_sp3_gdr.080814-1236 updated to SP3
  6. Chrome

Here are the browsers where each technique works.  I lump together the Firefoxes because they turn out to behave the same on all platforms:
  1. window.eval: FF
  2. window.eval with with: FF
  3. window.execScript: IE, Chrome
  4. window.setTimeout: Chrome, FF, Safari
  5. script tag: IE, Chrome, FF, Safari


  1. The window.execScript function is available on IE and Chrome, and when present it does the right thing.
  2. The window.eval function only works as desired on Firefox.
  3. Adding a with(window) around the window.eval does make a difference, but I couldn't get it to do precisely what is needed for GWT.  In particular, GWT does not have a bunch of "var func1,func2, func3" declarations up front, but such vars are assumed in some of the other web pages I read.
  4. I could not find a synchronous solution for Safari.  Instead, setTimeout and script tags work, but they won't load the code until a few milliseconds have gone by.
  5. Script tags work on all browsers.
  6. Surprisingly, I couldn't get setTimeout to work on IE.  From some web browsing, it looks like the setTimeout callback might run in the wrong scope, but I didn't investigate far.  On IE, execScript is a better solution for the present problem.
Based on these, the following chunk of code is one portable way to install code on any of the major browsers.  It uses execScript if it's available, and otherwise it adds a script tag.
if (window.execScript) {
} else {
  var tag = document.createElement("script")
  tag.type = "text/javascript"
  tag.text = script

The Code
Here is the code for the above examples, for anyone who wants to know the details and/or to try it for themselves.

The wrapper script is as follows:
function installFoo() {
  var script = "function foo() { alert('hi') }"
  // varying part

For the versions that install the code asynchronously (setTimeout or script tags), I changed the line to be:
window.setTimeout(function() { }, 100)

The "varying part" is as follows for each way to load the code.  Note that some of them include a gratuitous reassignment of window to $w; that's how I first ran the test and I don't want to go back and redo all of those.

// window.eval

// window.execScript

// window.eval with a with
var $w = window
with($w) { $w.eval(script) }

// setTimeout
window.setTimeout(script, 0)

// script tag
var tag = document.createElement("script")
tag.type = "text/javascript"
tag.text = script

Friday, February 6, 2009

Why not inject dependencies "manually"?

I just read the first half of the Guice User's Guide.  I stopped there because the motivation already left me hanging, and the very first feature I saw seems at odds with dependency injection.

I agree with the general sentiment of the user's guide: don't use new so much, and don't even use static factories.  Instead, "inject"  a class's dependencies via constructor parameters.  That way, the class is abstracted over both the service implementation as well as from where the service comes from.  This style of programming is already emphasized in at least the Joe-E and Scala communities.  I like it.  OO designers like it.  PL developers like it.

However, I don't understand the difficulty in doing this "manually".  The guide gives this lovely example of substituting a mock service in a test case:

public void testClient() {
  MockService mock = new MockService();
  Client client = new Client(mock);

So far so good.  Here we see the payoff of moving the new ServiceImpl() out of Client is that the constructor of a Client can instantiate the service in an unusual way.  Where I get lost is in the instantiation of the normal production version of the client.  The manual gives this code sample:

public static class ClientFactory {
  private ClientFactory() {}
  public static Client getInstance() {
    Service service = ServiceFactory.getInstance();
    return new Client(service);

Where did these two factories come from?  I find them odd, because one of the beauties to me of dependency injection is that it composes well.  In most cases, the production version would know exactly which concrete client and service to instantiate.  In the remaining cases, the code assembling them should itself have parameters influencing how to construct things.  Percolating this idea to the top level of the application, the parameters to the top-level application factory would be precisely those things configurable via configuration files and command-line arguments.

So far my experience matches this intuition.  The Scala compiler is implemented in a dependency-injection style.  Almost all modules find out about their dependencies by having a reference to the dependency passed in at construction time.  The Scala group has not taken advantage of this with testing mocks, even though that would seem straightforward.  However, I can attest that it was straightforward to reassemble the components to make an X10 variant of the compiler.

Overall, I believe that dependency injection is a good style.  However, I have not yet seen an example where it leads to extraneous code.  At least, the example in the guide isn't very good.  Further, the one tool feature I managed to read before being turned away by motivation was the ability to designate a "default" implementation of an interface.  Isn't this a temptation, though, not to use a dependency-injection style?  I would think if you are injecting dependencies, then any code constructing an implementation would already know enough to choose which one to construct.

Probably there is a breakdown somewhere in this argument.  Tools get popular for a reason.  At the least, though, the counter-argument is not documented in any easily findable place.

Monday, February 2, 2009

Private dependencies in Java

In any system of components, the more contexts you want a component to work in, the lower you want its dependencies to be. Using a Java jar as a component format works pretty well, but it does have a weird non-obvious dependency. Java jars heavily use the global, classloader-wide name space of package and top-level class names. Thus, whenever a jar includes a class name at the top level, that jar can only be used in tandem with other jars that do not have a same-named class at the top scope. This is a dependency, but it looks a little backwards at first. Most dependencies are that the context do provide a service. This dependency is that the context not. This weirdness has repercussions that are analogous to the problems of programming with lots of global variables.

The most common cases of this issue are addressed by using DNS-based package names. This approach doesn't work, though, for diamond-shaped dependencies. Imagine that component Lib is used by both A and B, and that both A and B are used by an application App. App then depends on Lib through two different paths: App->A->Lib and App->B->Lib. There are several ways to cope with this multiple dependency, and it's not solvable in general. A good component distribution system can help you manage the difficulties, but it won't solve it outright.

Within this set of problematic cases, there is a large subset where it would actually work fine for A and B to get their own separate copies of Lib. Having two copies around has its down sides, such as he memory and program-size of the resulting bundle being larger. However, the robustness of the overall application is frequently higher because both A and B are using the version of Lib that they were developed against. For those cases where the trade off looks good, one question is, how can it be implemented in Java?

I know of two general approaches, and would love to know what else is out there.

One approach is to rename A's preferred version of Lib to use different global names than it originally did. All packages in Lib are renamed to somewhere else, typically underneath A's chunk of the global DNS-based name space. Simultaneously, all references in A to something in Lib are updated to point to the new location. This approach is supported by a tool called Jarjar. To make a long story short, we've tried this approach with some of GWT's third-party dependencies, and we have roughly a 50% success rate so far. For some libraries, such a renaming Just Works.

Other libraries have trouble, because somewhere there is some class reference that Jarjar doesn't know how to rename. For example, Jetty includes XML configuration files that have class names embedded in them, and Jarjar doesn't rename those by default. Arguably it is questionable style to program this way, but for one reason or another some libraries do.

This leads to a second approach: load the library in a separate class loader from its client. That is, instead of A accessing Lib directly, it creates a class loader and loads Lib in that separate class loader. This approach has the beauty of reliably putting the implementation of Lib in a separate name space from the rest of the application. However, for it to work, there must be an interface to the library still included in the main application. This leads to two significant limitations of the approach. First, the library must have a small enough interface that it is practical to maintain all of these interfaces. Second, the interfaces still have the dependency hell problem, because they will be in the main package and can conflict with each other.

It's a nasty problem in general, one that it would be great for a language to address more thoroughly. I know that the OSGi system used by Eclipse has some support for this approach, with its heavy use of interfaces and subsidiary class loaders. I don't know the details of how it works, though, because it comes in the context of a bunch of other component-related features that I have never had time to study.