Wednesday, December 30, 2009

Run-time types

Guy Steele ruminated a few weeks ago about run-time types:

When Bill Joy and I were working on The Java Language Specification 14 years ago, we wrestled with the distinction between "types" and what many people were calling "run-time types". We found that the the term "run-time types" was causing some confusion, in part because some people seemed to think (a) that an object could have a compile-time type as well as a run-time type, and/or (b) a variable could have a run-time type as well as a compile-time type, and/or (c) that any type could serve as either a compile-time type or a run-time type. In the end, we tried to abolish the term "run-time type" in Java by abolishing it in the language specification.


This issue is coming up for the GWT compiler as we enrich the type hierarchy used within the compiler, only worse. Not only are there types in the source language that differ from the types available at run time, but there is a third type hierarchy used internally by the compiler. How do we make sense of all these type systems?

Steele and Joy have tried to eliminate the very notion of run-time types. However, as the post above indicates, they still haven't found a satisfactory replacement for it. One attempt is to use "class" instead of "run-time type", but how are we supposed to think about the run-time type of an array? Worse, what can GWT do, when it has not two but three type systems of interest? Steele wants a second word to replace "run-time type". GWT would need a third word.

With some trepidation, let me defend why I think run-time types are a useful concept. I have a feeling I am wandering into a quagmire, given the unease of others who have tried it, but so far I don't see exactly where the problem is. I'll give two motivations, and then I'll give some specific details on how I think of the type systems relating to each other.

The first reason to have run-time types is that it's the most obvious thing to call them. Java has a stark distinction between the types in the source language and the types you can test at run time. Thus, there are two systems of type-like things involved. At the same time, both systems are very type-like. They have a subtyping hierarchy, and they have a inclusion relation where some values are in the type and others are not. Thus, we are talking about type-like things that only exist at run time. It's simple English to call these things run-time types.

The second reason is that it works out well in formal language semantics. In Igarashi and Pierce's paper describing the erasure model for compiling Java, you start with a program using source-level types and "erase" parts of the program. For example, type List<String> becomes simply List; the String part is erased. After erasure, you still have a well-typed program. There's nothing special about it. The type system used for the new program is different than the type system for the old program, but it's still a type system in every way. Thus, run-time types appear to be well founded theoretically.

Based on motivations like these, I like to think of run-time types as a first-class concept. More generally, I think of all three of the type systems the GWT compiler sees as plain old type systems.

The trick, then, is how to think of the relations between these type systems. First of all, the GWT compiler largely ignores the Java source-level types. Those are modeled faithfully, and they are available during user-supplied code generation. However, as soon as all code has been loaded or generated, the types are erased, just like in a Java compiler that targets bytecode. Thus, the bulk of the compiler doesn't even see source-level types.

More interesting is the interpretation of the compiler's internal type system. I think of it as supplying extra information that the compiler knows to be true. For example, the internal type system has a "null type" that isn't available in either the source language's type system or the run-time type system. Whenever the compiler can prove that an expression will always evaluate to null, it can replace that expression's type with the null type. Whenever the compiler sees an expression of type null, it can use that for further optimization. For example, if it sees "foo == null ? a : b", and the type of "foo" is the null type, it can drop the "b" and replace it by "foo; a". It knows that "foo==null" will always be true.

The main mind-bender I have encountered with this approach has to do with the fact that compiler types can't be fully checked at run time. What is the meaning of a cast expression where the target type is something that won't be checked at run time? How should it be compiled? Should we feel compelled to add extra run-time checks corresponding to the enriched internal type system?

My current answer is to simply avoid the whole question. It's presently an invariant of the GWT compiler that every cast expression and every instanceof check tests against a run-time type. Problem solved. This approach is a cruder version of the "dcast" predicate used in Igarashi and Pierce. Looking forward, it would also be possible to add a second kind of cast operation, one that the compiler has proven will always succeed. Such a cast can cast to an arbitrary internal type, because it won't be emitted into the final output. So far, though, such a cast hasn't been strictly necessary.

Overall, run-time types look not only useful, but well founded and intuitive. They're types, and they're present at run time.

Tuesday, December 29, 2009

Browsers giving up on repeatedly failing script tags

Today's browser oddity is that Chrome instantly fails a script tag download if the requested URL has already failed a couple of times. It stops issuing network requests and fails the download before even trying. Probably other WebKit-based browsers do the same; I didn't check.

I can see why this behavior would make sense if you think of script tags as part of the static content of a page. If there are ten script tags requesting the same resource, you don't want to issue ten failing requests. However, it caught me by surprise, because I'm trying to use script tags as a way of downloading content over the web.

Browsers sure are a messy platform.

Wednesday, December 23, 2009

More software patent extortion

"Microsoft lost a patent case involving a company called I4i in May, after a jury ruled that Microsoft infringed one of i4i's patents with a custom XML feature found in Word. In August an injunction was placed on sales of Word pending the appeal, which did not go in Microsoft's favor Tuesday."
- from CNET

Sad news. Patent 5,787,449 has been upheld.

On the one hand, Microsoft deserves what it gets for supporting software patents. On the other, the patent system is just terrible. Not only does this patent cover boring and well-known techniques, it was actually upheld in court.

Tuesday, December 15, 2009

Detecting download failures with script tags

Matt Mastracci has done some experimentation and found that most browsers provide some callback or another for indicating that a script tag has failed to download. This is very interesting, because script tags don't have to follow the Same Origin Policy. Here is my replication, for the possible aid of anyone else wandering in these murky woods.

Available callbacks

The simplest callback is the onerror attribute. It can be attached to a script tag like this:

script.onerror = function() {
/* code here is called if the download fails */
}


For completeness, there is also an onload attribute. It's analagous to onerrer except that it indicates success rather than failure. It can be attached to a script tag like this:

script.onload = function() {
/* code here is called if the download succeeded */
}


Finally, IE supports onreadystatechange, similarly to the XHR attribute of the same name. The supplied callback will be invoked as the download progresses. The state of the download can be queried via the readyState attribute, which will reach state 'loaded' and/or 'complete'.

script.onreadystatechange= function () {
if (script.readyState == 'loaded') {
script.onreadystatechange = function () { } // prevent duplicate calls
/* error handling code goes here */
}
}


Results

I used the test page to see which of the three events are fired on several browsers.

Loading a bad page:
Firefox 3.5: onerror
Safari 4: onerror
Chrome 4: onerror
IE 7: onreadystatechange
IE 8: onreadystatechange

Loading a good page:
Firefox 3.5: onload
Safari 4: onload
Chrome 4: onload
IE 7: onreadystatechange (if not cached)
IE 8: onreadystatechange (if not cached)


Analysis

The onerror attribute works on all browsers but IE. For IE, onreadystatechange is available. Conveniently, no browser supports both of them, so a handler hooked up to both of them will fire exactly once.

A complication on IE is that onreadystatechange doesn't differentiate whether the download succeeded or not. Downloading a non-cached version looks the same as a download failure. Thus, any code using onreadystatechange needs to check whether the download succeeded or not.

Followup: Order of evaluation versus onreadystatechange

On IE, if onreadystatechange indicates the installation is complete, in what circumstances should the loading be considered to have failed?

I did a followup test where the loaded code (exists.js) does a window.alert. That way, I can see which happens first: the alert, or the onreadystatechange callback. On both IE7 and IE8, the alert happens first. That means if the script sets a global flag to true once it loads, the onreadystatechange callback can check it and reliably determine whether the download has succeeded.


Test script

<head>
<title>Test page</title>

<script>
function loadFile(url) {
var head = document.getElementsByTagName('head').item(0);
var script = document.createElement('script');
script.src = url;
script.onload = function() {
window.alert("onload called");
}
script.onerror = function() {
window.alert("onerror called");
}
script.onreadystatechange= function () {
if (script.readyState == 'loaded') {
script.onreadystatechange = function () { }
window.alert("onreadystatechange (" + script.readyState + ")");
}
}
head.appendChild(script);

}

function good() {
loadFile("exists.js");
}
function bad() {
loadFile("bad.js");
}
</script>
</head>

<body>
<input type="button" value="good" onclick="good()">
<input type="button" value="bad" onclick="bad()">
</body>

Friday, December 11, 2009

Tail call optimization vs. growable stacks

Guy Steele has written an interesting post about tail calls being needed to implement certain object-oriented designs:

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a "trivial and optional" optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.


I agree as far as it goes, and it's a very interesting example. However, it doesn't go far enough. What is really needed to make Guy's example work well is stacks that can grow beyond their initial fixed size.

In the example, a set is built using a sequence of "adjoin" operations that add one element to the set at a time. The adjoined set is implemented as a linked list. Whether or not this particular API needs implementation as a linked list, I can certainly agree that or some problems the implementer would be pushed into using such a structure.

Given this implementation, the relevant implementation of the "contains" operation looks like this:

contains(y: ZZ) = if (y = x) then true else s.contains(x) end


First check the head of the list, then recurse into the tail of the list. The problem is that if the list is large, this recursion will overflow any fixed-size stack. The proposed solution is tail call optimization, which keeps the stack from growing.

I agree, but tail call optimization doesn't go far enough for this sort of problem. Look in the very same example at the "contains" operation for "union" objects:

contains(y: ZZ) = s1.contains(x) OR: s2.contains(x)


This operation first checks the first set and then checks the second set. This time neither call is a tail call. Through careful rewriting, one could be made tail recursive, but not the other. To make them both tail recursive would require really butchering the clean algorithm and data structure as they stand.

To sum up the situation, tail call optimization helps for recursing over deeply linked lists, but not for recursing over deeply linked trees. It looks rather narrow to me.

As an aside, I agree with Gilad Bracha that you may as well implement tail call optimization, but the motivation for me is not deeply nested data structures. Instead, one application I think about is to support state machines as implemented via recursion. A related application is continuation passing style. These examples need tail call optimization to avoid the stack needs growing in proportion to the program's run time, so I certainly have nothing against tail call optimization.

What, though, can be done about recursion on deeply nested data structures? Barring programmers from doing so, as Java does, is a real constraint. Both the Scala compiler and the GWT compiler require users to specify very large stack sizes so as to support recursing over large abstract syntax trees. Most of the time the trees aren't that deep and the large stacks are wasted memory. Once in a while even the very large stack sizes aren't big enough. It's a real mess.

A more thorough solution to the problem would be to allow stacks to grow and shrink at run time. Some languages implement this feature by putting their stacks on the heap, and others, such as Go, have a stack that is a linked list of traditional contiguous stacks:

To make the stacks small, Go's run-time uses segmented stacks. A newly minted goroutine is given a few kilobytes, which is almost always enough. When it isn't, the run-time allocates (and frees) extension segments automatically. The overhead averages about three cheap instructions per function call. It is practical to create hundreds of thousands of goroutines in the same address space. If goroutines were just threads, system resources would run out at a much smaller number.


By providing Go with growable stacks, Rob Pike stands in contrast to many other language implementers. The reluctance from most implementers I believe is due to the extra overhead it adds to subroutine invocation. With a traditional fixed-size stack, a subroutine call can simply allocate a stack frame and race ahead. If stacks grow and shrink, then the allocation code must check whether the stack really has space left. If it doesn't, it needs to grow the stack. Additionally, a better implementation should probably shrink stacks, too. That means that the subroutine return code needs some overhead as well.

It's not entirely clear that this overhead is non-zero, much less large. For example, the calling convention code could be written so as to write a stack frame even when the space is getting tight, and only to trap into the "grow the stack" subroutine once the stack frame has been written all the way out. With such an arrangement, a modern CPU should be able to do the stack check in parallel to its work to write out the stack frame, so long as the two operations are carefully written to avoid data dependence. To avoid overhead for returning from a subroutine, a frame could be periodically placed on the stack that indicates that it's time for the stack to shrink. The frames could be inserted using similar techniques to the parallelizable calling convention to avoid overhead on most subroutine calls. Given such frames, most subroutine returns would do nothing special. Only returns that fall into one of these frames would need to do anything, and the frames can be set up such that their saved instruction pointer points into the "shrink the stack" subroutine.

That's one theoretical implementation strategy, but it's enough to suggest that growable stacks can be very inexpensive indeed. I will be very interested in how the performance numbers work out for Go as the language matures.