Martin Odersky talked today at ScalaDays about the state of
Scala. Here are the highlights for me.
Tools and Documentation
He doesn't think documentation is a big issue at this point. People
used to complain to him about it all the time, and that has largely
died out. I know that this demand for documentation was a large
motivation for him to develop
a book about Scala. Perhaps
it helped.
He still thinks IDE support is an important problem. However, he
claims the Eclipse support has gotten good enough for him to switch
his compiler hacking from Emacs to Eclipse. That's high praise--Emacs
users don't switch! I will have to try it again.
He emphasizes binary compatibility. In practice, libraries need to be
recompiled when a new version of Scala comes out, because inevitably
some trait or another has a new method in it. He has a number of ways
to address this. He's long talked about tools to detect and fix
problems by analyzing the bytecodes, and that work is going to be
emphasized at TypeSafe. Additionally, new today is that he plans to
designate stable releases of Scala that stay valid for a number of
years and never have binary-incompatible changes.
He also pointed out that style checking tools would be helpful in
larger development groups. It's a good point. Such tools don't take a
lot of code, but I guess nobody has gotten interested enough in the
problem to whip one up.
Functional programming
Martin went through an extended example based on
a
2000 study comparing programming languages. In the study,
students implemented a programming problem in one of seven different
programming languages. The study is interesting on its own if you
haven't read it before, and among other things shows that there is
much more variability among programmers than among programming
languages. However, we can learn something about programming languages
by comparing either the best or the median solutions in each language.
Scala shines well on the programming problem used in the study, and
it's because of Scala's excellent support for running functions across
collections. Such facilities don't work well unless the language has a
concise notation for functions.
Here is the bottom line on several different solutions:
- Students using compiled languages: 200-300 lines of code
- Students using scripting languages: ~100 lines of code
- Martin's solution in Scala: 20-30 lines of code
- A Java master's solution in Java: over 100 lines of code
I will attribute the "Java master" if I can find a reliable source, but
Martin showed the Java example and it looked pretty reasonable at a
glance. The reason it is so long compared to the Scala solution is
that instead of using collection operations, it defines several
recursive methods that record their progress in extra parameters to
the methods. I've written a lot of code like that in the last few years.
I think about the beautiful functional solution, and then I start over
with an imperative solution because inner classes in Java require ever
so much boilerplate.
Parallelism
Martin talked a bit on his latest thinking on making use of multiple
cores, a problem that has obsessed programming-language research for
several years now. One principle he emphasized is that people are much
more able to find one solution that works than at finding all the
potential problems that can occur due to non-determinism. Thus, he's
interested lately in programming-language constructs that are parallel
yet still deterministic. That's a tough principle to achieve in an
expressive language! It rules out all of actors (!), agents, and
software-transactional memory, because they all have state, and the
state can change differently depending on the non-deterministic
choices the implementation makes.
Aside from the general talk on the challenges of parallelism, Martin
talked a bit about the parallel collections in Scala. They're better
than I realized. Their implementation uses fork-join
with work stealing, rather than blindly creating lots of threads.
As an additional twist, they adaptively choose a chunk
size based on how much work stealing appears to be happening. If there
is no work stealing, then every node must be busy, so increase the
chunk size.
To demonstrate how the collections can help, he made two tweaks to the
phone-number solution to switch from sequential to parallel
collections. After doing so, the program ran 2.5 times faster on a
4-core machine. One can imagine doing better than 2.5 times faster,
but it's a very good start given how very easy the change was.
Domain-specific languages
Martin emphasized that Scala is excellent for DSLs. However, he points
out, and I agree, that embedded DSLs in Scala mesh so well that they
are essentially just Scala code. I vastly prefer this style of DSL
to the style where you embed the DSL but the constructs of the DSL
don't map well to constructs in the host language. Since the code is
doing something different from what it looks like it does, all kinds
of weird bugs can arise. Whenever working on a DSL that doesn't embed
in a straightforward way, I'd prefer to make it an external DSL with
a plain old parser and interpreter.