Friday, June 27, 2014

Edoardo Vacchi on attribute grammars

I previously wrote that predictable performance is a practical challenge for using attribute grammars on real work. It does little good to quickly write the first version of a compiler pass if you then spend hours debugging oddball performance problems.

Edoardo Vacchi wrote me the following in response. I agree with him: having an explicit evaluation construct, rather than triggering attribute contributions automatically, is likely to make performance more predictable. UPDATED: edited the first paragraph as suggested by Edoardo.

Hi,

This is Edoardo Vacchi from Universit√† degli Studi di Milano (Italy). For my PhD thesis I’m working on a language development framework called “Neverlang”[1,2]. Neverlang is an ongoing project of Walter Cazzola's ADAPT Lab; I am involved with its latest incarnation "Neverlang 2".

I stumbled upon an (old) blog post of yours about Attribute Grammars [3] and I would be interested to know if you knew some “authoritative” references that I could cite with respect to the points that you raise, with particular attention to point (3) “unpredictable performances” and, in part, to (2) caching.

The Neverlang model resembles that of simple “compiler-compilers” like Yacc, where attributes behave more like variables than functions; thus they are generally computed only once; in Neverlang attributes can also be re-computed using the `eval` construct, which descends into a child and re-evaluates the corresponding semantic action.

On the one hand, the need for an explicit `eval` make it less “convenient” than regular AG-based frameworks; on the other hand, I believe this gives better predictability, and, although the focus for the framework are not performances, but rather modularity, I think that “predictability” would better motivate the reasons for this choice.

Thanks in advance,

[1] http://link.springer.com/chapter/10.1007%2F978-3-642-39614-4_2#page-1
[2] http://dl.acm.org/citation.cfm?id=2584478
[3] http://blog.lexspoon.org/2011/04/practical-challenges-for-attribute.html

Edoardo Vacchi is PhD Student at Walter Cazzola's ADAPT-Lab, a research lab at Università degli Studi di Milano that investigates methods and techniques for programming language development and software adaptation and evolution. Walter Cazzola is associate professor at UniMi and his research is concerned with software and language engineering. More info about Neverlang can be found at the website http://neverlang.di.unimi.it.

Tuesday, June 3, 2014

My analysis of the Swift language

Apple has put out Swift, which sounds like a nice language overall. Here is my flagrantly non-humble opinion about how its features line up with what I consider modern, well-established aspects of programming language design.

The good

First off, Swift includes range-checked integer arithmetic! Unless you explicitly ask for wraparound, any overflow will cause an exception. I was just commenting yesterday on what a tough problem this is for current programming languages.

It has function types, nested functions, and closures, and it has numerous forms of optimized syntax for closures. This is all heartening, and I hope it will stick going forward, much the way lexical variable binding has stuck. Closures are one of those things that are both helpful and have small down side once your language has garbage collection.

Swift's closures can assign to variables in an outer scope. That's the right way to do things, and I find it painful how much Java's designers struggle with this issue. As a technical detail, I am unclear what happens if a closure captures a variable but does not modify it. What ought to happen is that any read from it will see the latest value, not the value at the time the capture happened. However, the Closures section of the Language Guide suggests that the compiler will capture just the initial value in this case. I believe this is misguided and will cause as many traps as it fixes; for example, suppose the programmer captured a counter, but does not increment that counter itself? The motto here should be: you don't know what the programmer meant, but you know what they wrote.

Type inference is quite welcome. I don't know what more to say than that developers will take advantage of it all the time, especially for local variables.

Tuple types are a small touch that comes up in practical programming all the time. How many times have you wanted to return two values from a function, and had to design a class for it or otherwise to pervert your design?

Enumerations seem good to include in the language. Language designers often seem to think that enums are already handled by other language features, and therefore should not be included. I respect that, but in this case, it's a simple feature that programmers really like to use. Java's enums are baroque, and none of the several Datalog dialects I have wokred on include enums at all. I miss having language support for a closed set of named integers. It's easy to support and will be extremely popular.

As an interesting trick, keyword arguments to functions are supported, but you have to opt in. That's probably a good combination. Keyword arguments are quite useful in cases where you have a lot of parameters, and sometimes this legitimately happens. However, it's unfortunate if you afflict all functions with keyword arguments, because the keyword arguments become part of the API. By making it opt in, the feature is there for the functions which can use it.

Including both structs and classes looks initially redundant, but it's quite helpful to have a value type that encompasses multiple other values. As an example, the boxed Integer type on Java would be much better as a struct than as a class.

Extensions look valuable for programming in the large. They allow you can make an existing class fit into a new framework, and they let you add convenience methods to an existing class. Scala uses its implicit conversions for extensions, but direct support for extensions also makes a lot of sense.

The way option chaining works is a nice improvement on Objective C. In Objective C, any access to nil returns nil. In practice, programmers are likely better off with getting an error when they access nil, as a form of design by contract: when something goes wrong, you want the program to stop at that point, not some indefinite time later. Still, sometimes you want nil propagation, and when you do, Swift lets you just put a "?" after the access.

Weak references are helpful for any language with automatic memory management, but they look especially helpful in a language with reference-counting memory management. I don't follow why there are also the "unowned" references, except that the designers didn't want your code to get polluted with ! dereferences. Even so, I would think this is a case of do or do not do. If you are worried about ! pollution, which is a legitimate concern, then simply don't require the !.

As an aside, this is the reason I am not sure pervasive null is as bad as often claimed. In practical code, there are a lot of cases where a value is sometimes optional but, in a specific context, is known to be present. In such a case, you are just going to deference it, and possibly suffer a null-pointer check if you were wrong. As such, programmers are guided into a style where they just insert dereferences until the compiler shuts up, which makes the code noisey without increasing practical reliability.

The dubious

Swift looks very practical and workable, but there are some issues I think could have been done better.

Single inheritance seems like a step backward. The linearization style of multiple inheritance has proven helpful in practice, and it eliminates the need for a separate "interface" or "protocol" feature. Perhaps designers feel like C++'s multiple inheritance went badly, and are avoiding multiple inheritance like the plague? I used to think that way, but it's been multiple decades since C++'s core design. There are better design for multiple inheritance nowadays.

Swift doesn't appear to include persistent data structures. This is the one feature I miss the most when I don't get to program in Scala, and I don't know why it isn't catching on more widely in other languages. Developers can add their own collection types, but since the new types aren't standard, you end up having to convert to standard types whenever you call into another library.

The automatic immutability of collections assigned to constants looks driven by the lack of persistent collections. It's better to support both features independently: let variables be either mutable or not, and let collections be mutable or not. All four combinations are very useful.

Deinitialization, also known as finalization, looks like a throwback to me. In a system with automatic memory management, you don't want to know precisely when your memory is going to get freed. As such, you can't count on deinitializers running soon enough to be useful. Thus, you always need a fallback plan of deallocating things manually. Once you deallocate manually, though, deinitializers become just a debugging technique. It's better to debug leaks using a tool than with a language feature.

In-out parameters seem like a step backwards. The trouble is that most functions use only in parameters, so when you see a function call, a programmer's default assumption is that the callee will not modify the argument. It can lead to bad surprises if the parameter gets modified at all. Out parameters are so unusual that it's better to be explicit about them, for example by taking a mutable collection as an argument.

Custom precedence (and associativity) is likely to go badly. We discussed this in detail, over the course of days, for X10, because X10 is a scientific language that really benefits from a rich set of operators. One problem with user-defined precedence is that it's hard to scope: you want to scope the operators themselves, not their implementations, because parsing happens before method lookup. It's also tough on programmers if they have to learn a new precedence table for every file of code they read. All in all, we concluded that Scala had a reasonable set of trade offs here: have a built-in precedence table with a huge number of available operators, and make library designers simply choose from the existing operators.

I see no exceptions, which is likely to be a nuisance to programmers if they are truly missing. Sometimes you want to tear down a whole chunk of computation without exiting the whole program. In such cases, exceptions work very well. Maybe I just missed it.

Integer types are hard to get right, and I am not sure Swift has chosen a great approach. It's best to avoid unsigned types, and instead to have untyped operations that can apply to typed integers. It's also best to avoid having low-precision operations, even if you have low-precision storage. Given all of the above, you don't really need explicit conversions any more. Java's integer design is quite good, with the exception of the unnecessary char type that is not even good for holding characters. I suspect many people overlook this about Java, because it's a case where programmers are better off with a language with fewer features.

Saturday, January 18, 2014

Is Internet access a utility?

I forwarded a link about Network Neutrality to Google Plus, and it got a lot of comments about how Internet access should be treated like a utility. I think that's a reasonable perspective to start with. What we all want, I think, is to have Internet access itself be a baseline service, and that Internet services on top of it have fierce competition.

In addition to considering the commonalities with Internet access and utilities, we should also note the differences.

One difference is that a utility is for a monopoly, but Internet access is not monopolized. You can only put one road in any physical location, and I will presume for the sake of argument that you don't want to have multiple power grids in the same locale. Internet access is not a monopoly, though! At least in Atlanta, we have cable, DSL, WiMax, and several cellular providers. We have more high-speed Internet providers than supermarket chains.

Another difference is that utilities lock down technology change to a snail's pace. With roads and power grids, the technology already provides near-maximum service for what is possible, so this doesn't matter. With telephony, progress has been locked down for decades, and I think we all lost out because of that; the telephone network could have been providing Skype-like services a long time ago, but as a utility they kept doing things the same way as always. Meanwhile, the Internet is changing rapidly. It would be really bad to stop progress on Internet access right now, the way we did with telephony several decades ago.

I believe a better model than utilities would be supermarkets. Like Internet providers, supermarkets carry a number of products that are mostly produced by some other company. I think it has gone well for everyone that supermarkets to have tremendous freedom in their content selection, pricing, promotional activities, hours, floor layout, buggies, and checkout technology.

In contrast to what some commenters ask, I do not have any strong expectation about what Comcast will or won't try. I would, however, like them to be free to experiment. I've already switched from Comcast and don't even use them right now. If Comcast is locked into their current behavior, then that does nothing for me good or bad. If they can experiment, maybe they will come up with something better.

In principle, I know that smart people disagree on this, but I currently don't see anything fundamentally wrong with traffic shaping. If my neighbor is downloading erotica 24/7, then I think it is reasonable that Comcast give my Game of Thrones episode higher priority. The fact that Comcast has implemented this badly in the past is troubling, but that doesn't mean the next attempt won't work better. I'd like them to be free to try.

Monday, November 11, 2013

It's ad targeting, isn't it?

I see continued assumptions by people that the real names policies of Facebook and Google Plus have actual teeth.

I've posted before on whether real names are truly enforced on Facebook, and it looks like the answer there is no. My impression is that it's not working great on Plus, either, although there have been some famous botched efforts.

The rationale that it improves the level of discussion seems thin and inaccurate. There are too many legitimate reasons to participate in a forum but not to want it to pop up when your boss does a Google search on your name.

As far as I can tell, the main purpose of a real names policy is to appease advertisers. Advertisers feel, probably correctly, that more information about users will improve the accuracy of ad targeting. It's weird, though, because nobody seems to talk about it that way. It's analogous to the exhortations in a hotel room that it's good for the environment to avoid washing so many towels. Ummm, I'm pretty sure it's more about the money.

Sunday, June 16, 2013

When to best use type inference

Type inference can make code much better. It can save you from writing down something that is completely obvious, and thus a total waste of space to write down. For example, type inference is helpful in the following code:
    // Type inference
    val date = new Date

    // No type inference
    val date: Date = new Date
It's even better for generics, where the version without type inference is often absurd:
    // Type inference
    val lengths: List[Int] =
        names.map(n => n.length).filter(l => l >= 0)

    // No type inference
    val lengths: List[Int] =
        names.map[Int, List[Int]]((n: String) => n.length).
        filter((l: Int) => l >= 0)
When would a type not be "obvious"? Let me describe two scenarios.

First, there is obvious to the reader. If the reader cannot tell what a type is, then help them out and write it down. Good code is not an exercise in swapping puzzles with your coworkers.

    // Is it a string or a file name?
    val logFile = settings.logFile

    // Better
    val logFile: File = settings.logFile
Second, there is obvious to the writer. Consider the following example:
    val output =
        if (writable(settings.out))
            settings.out
        else
            "/dev/null"
To a reader, this code is obviously producing a string. How about to the writer? If you wrote this code, would you be sure that you wrote it correctly? I claim no. If you are honest, you aren't sure what settings.out is unless you go look it up. As such, you should write it this way, in which case you might discover an error in your code:
    val output: String =
        if (writable(settings.out))
            settings.out  // ERROR: expected String, got a File
        else
            "/dev/null"
Languages with subtyping all have this limitation. The compiler can tell you when an actual type fails to satisfy the requirements of an expected type. However, if you ask it whether two types can ever be used in the same context as each other, it will always say yes, they could be used as type Any. ML and Haskell programmers are cackling as they read this.

It's not just if expressions, either. Another place this issue crops up is in collection literals. Unless you tell the compiler what kind of collection you are trying to make, it will never fail to find a type for it. Consider this example:

    val path = List(
        "/etc/scpaths",
        "/usr/local/sc/etc/paths",
        settings.paths)
Are you sure that settings.paths is a string and not a file? Are you sure nobody will change that type in the future and then see what type check errors they get? If you aren't sure, you should write down the type you are trying for:
    val path = List[String](
        "/etc/scpaths",
        "/usr/local/sc/etc/paths",
        settings.paths)  // ERROR: expected String, got a File
Type inference is a wonderful thing, but it shouldn't be used to create mysteries and puzzles. In code, just like in prose, strive to say the interesting and to elide the obvious.

Sunday, April 21, 2013

Google Voice after several months

I've been exclusively using Google Voice for months now, and just for voice mail for more than a year. I feel like the plain-old telephone system (POTS) is an unreasonably high toll to pay given how technology has improved. There is no reason to have non-trivial long-distance rates between Europe and the U.S. in a day when Skype does it for around a penny a minute. Google is doing a wonderful thing by promoting an Internet-based phone number.

Rather, Google is starting a wonderful thing. In the time I have used it, many of the most obvious problems haven't improved in the slightest.

Here's a quick run down of the good and the bad as I see it. Overall, I see it as comparable to my two-year stint using a Mac for software development. The promise is there, but when you actually try it, you realize why it's not yet the norm.

The Good

I love receiving calls and having all my devices ring. In 2013, it's the way things ought to work. If I'm in the car, my car stereo should ring. If I'm at my desk, I should get a notification on my desktop. If I'm watching TV, my physical phone should ring. Google Voice gets this just right.

I love the option to take calls at my desk. I already do a lot of voice chat sessions with coworkers around the world, and it just seems right that I should do the same thing with gmail addresses and numeric phone numbers.

I love the text transcription of voice mails, for those times I can't take a call immediately. The quality is iffy but is usually high enough that I can understand the gist of what the person was telling me.

Phone number porting works just fine, so you can keep your pre-existing number and not even tell people you are using Google Voice. Well, you have to tell them for a different reason: there is so much bad with Google Voice that you need to warn your potential calling partners about how gimped your phone service is.

The Bad

There's a lot of bad.

It doesn't work over data connections. I really don't understand why it is missing. Because of this problem, I have to buy minutes on the POTS to use it on my cell phone, and minutes are far more expensive than the associated data cost. More pragmatically, if I am travelling and don't yet have a local SIM card, it means I cannot use my phone to call over a wifi network.

You can't make or take calls directly from the Voice web page. You have to log into both Google Talk and Google Voice, configure Voice to talk to Talk, and then make your call from Voice. Yes, you can also make a call from Talk directly, but that's a separate feature of the Google suite, thus confusing matters even further. Google is normally excellent at building web user interfaces, but that seems to go down the tubes when an issue crosses multiple teams.

When you make a call at your desk, using Talk, the volume is extremely low. I originally thought that was just my configuration, but some web searching indicates that this has been a widespread problem for several years. I have to turn up my system volume to the max just to barely hear the other person, at which point every random system notification is an ear splitter.

It doesn't support phone usage from the UK. This is a very surprising restriction, because Talk can make calls to the UK just fine. Part of the benefit of Voice for me is the promise that I can travel around and call POTS numbers from wherever I am. However, even if I get a UK SIM card, it's just not supported by Voice.

There is no MMS, and there is no warning on either side when an attempted MMS does not go through. I have to tell people to use email, or to use my physical cell phone number rather than my Google Voice number. If Mom emails me a photo of one of my nieces, it quietly disappears. I am oblivious, and she is wondering what planet I am on that I didn't write back.

The Ugly

The ugly part is that Google is not doing anything to fix all of this. I'm willing to be a beta tester in this case. It's not beta testing, though, if they never fix the problems.

At this point, the POTS tax is substantially higher than the Microsoft tax of yore. It costs tens of dollars a month to participate, and you can't live without it.

Saturday, March 23, 2013

C compilers exploiting undefined behavior

It's getting out of hand the way C compilers exploit undefined behavior. I see via John Regehr's blog that there is a SPEC benchmark being turned into a noop via an undefined-behavior argument.

This isn't what the spec writers had in mind when they added undefined behavior. To fix it, Regehr's idea of having extra checkers to find such problems is a plausible one, though it will take a dedicated effort to get there.

An easier thing to do would be for gcc and Clang to stop the madness! If they see an undefined behavior bullet in their control-flow graphs, then they should leave it there, rather than assuming it won't happen and reasoning backward. This will cause some optimizations to stop working, but really, C compilers were already plenty good 10 years ago. The extra level of optimizations is not a net win for developers. Developers want speed, sure, but above all they want their programs to do what they look like they do.

It should also be possible to improve the spec around this, to pin down what undefined behavior means a little more specifically. For example, left-shifting into the sign bit of a signed integer is undefined behavior. That's way underspecified. The only real options are: shift into the sign bit as expected, turn the integer into unpredictable garbage, or throw an exception. As things stand, a C compiler is allowed to observe a bad left shift and then turn your whole program into a noop.