Thursday, June 2, 2011

Martin Odersky on the state of Scala

Martin Odersky talked today at ScalaDays about the state of Scala. Here are the highlights for me.

Tools and Documentation

He doesn't think documentation is a big issue at this point. People used to complain to him about it all the time, and that has largely died out. I know that this demand for documentation was a large motivation for him to develop a book about Scala. Perhaps it helped.

He still thinks IDE support is an important problem. However, he claims the Eclipse support has gotten good enough for him to switch his compiler hacking from Emacs to Eclipse. That's high praise--Emacs users don't switch! I will have to try it again.

He emphasizes binary compatibility. In practice, libraries need to be recompiled when a new version of Scala comes out, because inevitably some trait or another has a new method in it. He has a number of ways to address this. He's long talked about tools to detect and fix problems by analyzing the bytecodes, and that work is going to be emphasized at TypeSafe. Additionally, new today is that he plans to designate stable releases of Scala that stay valid for a number of years and never have binary-incompatible changes.

He also pointed out that style checking tools would be helpful in larger development groups. It's a good point. Such tools don't take a lot of code, but I guess nobody has gotten interested enough in the problem to whip one up.

Functional programming

Martin went through an extended example based on a 2000 study comparing programming languages. In the study, students implemented a programming problem in one of seven different programming languages. The study is interesting on its own if you haven't read it before, and among other things shows that there is much more variability among programmers than among programming languages. However, we can learn something about programming languages by comparing either the best or the median solutions in each language.

Scala shines well on the programming problem used in the study, and it's because of Scala's excellent support for running functions across collections. Such facilities don't work well unless the language has a concise notation for functions. Here is the bottom line on several different solutions:

  • Students using compiled languages: 200-300 lines of code
  • Students using scripting languages: ~100 lines of code
  • Martin's solution in Scala: 20-30 lines of code
  • A Java master's solution in Java: over 100 lines of code

I will attribute the "Java master" if I can find a reliable source, but Martin showed the Java example and it looked pretty reasonable at a glance. The reason it is so long compared to the Scala solution is that instead of using collection operations, it defines several recursive methods that record their progress in extra parameters to the methods. I've written a lot of code like that in the last few years. I think about the beautiful functional solution, and then I start over with an imperative solution because inner classes in Java require ever so much boilerplate.

Parallelism

Martin talked a bit on his latest thinking on making use of multiple cores, a problem that has obsessed programming-language research for several years now. One principle he emphasized is that people are much more able to find one solution that works than at finding all the potential problems that can occur due to non-determinism. Thus, he's interested lately in programming-language constructs that are parallel yet still deterministic. That's a tough principle to achieve in an expressive language! It rules out all of actors (!), agents, and software-transactional memory, because they all have state, and the state can change differently depending on the non-deterministic choices the implementation makes.

Aside from the general talk on the challenges of parallelism, Martin talked a bit about the parallel collections in Scala. They're better than I realized. Their implementation uses fork-join with work stealing, rather than blindly creating lots of threads. As an additional twist, they adaptively choose a chunk size based on how much work stealing appears to be happening. If there is no work stealing, then every node must be busy, so increase the chunk size.

To demonstrate how the collections can help, he made two tweaks to the phone-number solution to switch from sequential to parallel collections. After doing so, the program ran 2.5 times faster on a 4-core machine. One can imagine doing better than 2.5 times faster, but it's a very good start given how very easy the change was.

Domain-specific languages

Martin emphasized that Scala is excellent for DSLs. However, he points out, and I agree, that embedded DSLs in Scala mesh so well that they are essentially just Scala code. I vastly prefer this style of DSL to the style where you embed the DSL but the constructs of the DSL don't map well to constructs in the host language. Since the code is doing something different from what it looks like it does, all kinds of weird bugs can arise. Whenever working on a DSL that doesn't embed in a straightforward way, I'd prefer to make it an external DSL with a plain old parser and interpreter.

1 comment:

James Iry said...

I think Martin said the Java expert was Josh Bloch in the talk. Perhaps when the slides are posted we can confirm.