Monday, November 11, 2013

It's ad targeting, isn't it?

I see continued assumptions by people that the real names policies of Facebook and Google Plus have actual teeth.

I've posted before on whether real names are truly enforced on Facebook, and it looks like the answer there is no. My impression is that it's not working great on Plus, either, although there have been some famous botched efforts.

The rationale that it improves the level of discussion seems thin and inaccurate. There are too many legitimate reasons to participate in a forum but not to want it to pop up when your boss does a Google search on your name.

As far as I can tell, the main purpose of a real names policy is to appease advertisers. Advertisers feel, probably correctly, that more information about users will improve the accuracy of ad targeting. It's weird, though, because nobody seems to talk about it that way. It's analogous to the exhortations in a hotel room that it's good for the environment to avoid washing so many towels. Ummm, I'm pretty sure it's more about the money.

Sunday, June 16, 2013

When to best use type inference

Type inference can make code much better. It can save you from writing down something that is completely obvious, and thus a total waste of space to write down. For example, type inference is helpful in the following code:
    // Type inference
    val date = new Date

    // No type inference
    val date: Date = new Date
It's even better for generics, where the version without type inference is often absurd:
    // Type inference
    val lengths: List[Int] =
        names.map(n => n.length).filter(l => l >= 0)

    // No type inference
    val lengths: List[Int] =
        names.map[Int, List[Int]]((n: String) => n.length).
        filter((l: Int) => l >= 0)
When would a type not be "obvious"? Let me describe two scenarios.

First, there is obvious to the reader. If the reader cannot tell what a type is, then help them out and write it down. Good code is not an exercise in swapping puzzles with your coworkers.

    // Is it a string or a file name?
    val logFile = settings.logFile

    // Better
    val logFile: File = settings.logFile
Second, there is obvious to the writer. Consider the following example:
    val output =
        if (writable(settings.out))
            settings.out
        else
            "/dev/null"
To a reader, this code is obviously producing a string. How about to the writer? If you wrote this code, would you be sure that you wrote it correctly? I claim no. If you are honest, you aren't sure what settings.out is unless you go look it up. As such, you should write it this way, in which case you might discover an error in your code:
    val output: String =
        if (writable(settings.out))
            settings.out  // ERROR: expected String, got a File
        else
            "/dev/null"
Languages with subtyping all have this limitation. The compiler can tell you when an actual type fails to satisfy the requirements of an expected type. However, if you ask it whether two types can ever be used in the same context as each other, it will always say yes, they could be used as type Any. ML and Haskell programmers are cackling as they read this.

It's not just if expressions, either. Another place this issue crops up is in collection literals. Unless you tell the compiler what kind of collection you are trying to make, it will never fail to find a type for it. Consider this example:

    val path = List(
        "/etc/scpaths",
        "/usr/local/sc/etc/paths",
        settings.paths)
Are you sure that settings.paths is a string and not a file? Are you sure nobody will change that type in the future and then see what type check errors they get? If you aren't sure, you should write down the type you are trying for:
    val path = List[String](
        "/etc/scpaths",
        "/usr/local/sc/etc/paths",
        settings.paths)  // ERROR: expected String, got a File
Type inference is a wonderful thing, but it shouldn't be used to create mysteries and puzzles. In code, just like in prose, strive to say the interesting and to elide the obvious.

Sunday, April 21, 2013

Google Voice after several months

I've been exclusively using Google Voice for months now, and just for voice mail for more than a year. I feel like the plain-old telephone system (POTS) is an unreasonably high toll to pay given how technology has improved. There is no reason to have non-trivial long-distance rates between Europe and the U.S. in a day when Skype does it for around a penny a minute. Google is doing a wonderful thing by promoting an Internet-based phone number.

Rather, Google is starting a wonderful thing. In the time I have used it, many of the most obvious problems haven't improved in the slightest.

Here's a quick run down of the good and the bad as I see it. Overall, I see it as comparable to my two-year stint using a Mac for software development. The promise is there, but when you actually try it, you realize why it's not yet the norm.

The Good

I love receiving calls and having all my devices ring. In 2013, it's the way things ought to work. If I'm in the car, my car stereo should ring. If I'm at my desk, I should get a notification on my desktop. If I'm watching TV, my physical phone should ring. Google Voice gets this just right.

I love the option to take calls at my desk. I already do a lot of voice chat sessions with coworkers around the world, and it just seems right that I should do the same thing with gmail addresses and numeric phone numbers.

I love the text transcription of voice mails, for those times I can't take a call immediately. The quality is iffy but is usually high enough that I can understand the gist of what the person was telling me.

Phone number porting works just fine, so you can keep your pre-existing number and not even tell people you are using Google Voice. Well, you have to tell them for a different reason: there is so much bad with Google Voice that you need to warn your potential calling partners about how gimped your phone service is.

The Bad

There's a lot of bad.

It doesn't work over data connections. I really don't understand why it is missing. Because of this problem, I have to buy minutes on the POTS to use it on my cell phone, and minutes are far more expensive than the associated data cost. More pragmatically, if I am travelling and don't yet have a local SIM card, it means I cannot use my phone to call over a wifi network.

You can't make or take calls directly from the Voice web page. You have to log into both Google Talk and Google Voice, configure Voice to talk to Talk, and then make your call from Voice. Yes, you can also make a call from Talk directly, but that's a separate feature of the Google suite, thus confusing matters even further. Google is normally excellent at building web user interfaces, but that seems to go down the tubes when an issue crosses multiple teams.

When you make a call at your desk, using Talk, the volume is extremely low. I originally thought that was just my configuration, but some web searching indicates that this has been a widespread problem for several years. I have to turn up my system volume to the max just to barely hear the other person, at which point every random system notification is an ear splitter.

It doesn't support phone usage from the UK. This is a very surprising restriction, because Talk can make calls to the UK just fine. Part of the benefit of Voice for me is the promise that I can travel around and call POTS numbers from wherever I am. However, even if I get a UK SIM card, it's just not supported by Voice.

There is no MMS, and there is no warning on either side when an attempted MMS does not go through. I have to tell people to use email, or to use my physical cell phone number rather than my Google Voice number. If Mom emails me a photo of one of my nieces, it quietly disappears. I am oblivious, and she is wondering what planet I am on that I didn't write back.

The Ugly

The ugly part is that Google is not doing anything to fix all of this. I'm willing to be a beta tester in this case. It's not beta testing, though, if they never fix the problems.

At this point, the POTS tax is substantially higher than the Microsoft tax of yore. It costs tens of dollars a month to participate, and you can't live without it.

Saturday, March 23, 2013

C compilers exploiting undefined behavior

It's getting out of hand the way C compilers exploit undefined behavior. I see via John Regehr's blog that there is a SPEC benchmark being turned into a noop via an undefined-behavior argument.

This isn't what the spec writers had in mind when they added undefined behavior. To fix it, Regehr's idea of having extra checkers to find such problems is a plausible one, though it will take a dedicated effort to get there.

An easier thing to do would be for gcc and Clang to stop the madness! If they see an undefined behavior bullet in their control-flow graphs, then they should leave it there, rather than assuming it won't happen and reasoning backward. This will cause some optimizations to stop working, but really, C compilers were already plenty good 10 years ago. The extra level of optimizations is not a net win for developers. Developers want speed, sure, but above all they want their programs to do what they look like they do.

It should also be possible to improve the spec around this, to pin down what undefined behavior means a little more specifically. For example, left-shifting into the sign bit of a signed integer is undefined behavior. That's way underspecified. The only real options are: shift into the sign bit as expected, turn the integer into unpredictable garbage, or throw an exception. As things stand, a C compiler is allowed to observe a bad left shift and then turn your whole program into a noop.

Thursday, January 31, 2013

The "magic moment" for IPv6

The Internet has undergone many large changes in the protocols it uses. A few examples are: the use of MIME email, the replacement of Gopher by HTTP, and the use of gzip compression within HTTP. In all three of these examples, the designers of the protocol upgrades were careful to provide a transition plan. In two out of the three examples (sorry, Gopher), the old protocol is still practical to use today, if you can live with its limitations.

Things are going differently for IPv6. In thinking about why, I like Dan Bernstein's description of a "magic moment" for IPv6. It goes like this:

The magic moment for IPv6 will be the moment when people can start relying on public IPv6 addresses as replacements for public IPv4 addresses. That's the moment when the Internet will no longer be threatened by the IPv4 address crunch.

Note that Dan focuses on the address crunch. Despite claims to the contrary, I believe most people are interested in IPv6 for its very large address space. While there are other cool things in IPv6, such as built-in encryption and simplified fragmentation, they are not enough that people would continue to lobby for IPv6 after all these years. The address crunch is where it's at.

While I like Dan's concept of a magic moment, I think the above quote asks for too much. There are some easier magic moments for individual kinds of nodes on the computer, and some might well happen before others. Let me focus on two particular kinds of Internet nodes: public web sites and home Internet users.

How close is the magic moment for web sites? Well, web servers can discard their IPv4 addresses just as soon as the bulk of the people connecting to them all have IPv6 connectivity. I do not know how to gather data on that, but as a spot point, I have good networking hardware but cannot personally connect to IPv6 sites. My reason is both mundane and common: I am behind a Linksys NATing router, and that router does not support IPv6. Even if it did, it does not support any sort of tunneling that would allow my local computer to connect to an IPv6-only web server. To the extent people are using plain old Linksys routers, we are a long way away from the magic moment for web servers.

How about for home users? Well, it's the other way around for home users: home users can switch once the majority of public web sites have an IPv6 address. This status is easier to gather data on. I just looked up the top ten web sites (according to Alexa's Top 500 Web Sites) and checked them with a publicly available IPv6 validation site (http://ipv6-test.com/validate.php). Of the top ten web sites, only four can be reached from an IPv6-only client: Google, Facebook, YouTube, and Wikipedia. The other six still require IPv4: Yahoo, Baidu, Live.com, Amazon, QQ.com, and Twitter. As things stand, we are also a long way from when home users can switch to IPv6-only.

Overall, this was a cursory analysis, but I think these "magic moments" are a helpful framework for thinking about the IPv6 changeover. Unfortunately, this framework currently indicates that we are nowhere close.

Tuesday, January 22, 2013

Virtual classes

Gilad Bracha has a great post up on virtual classes:
I wanted to share a nice example of class hierarchy inheritance....All we need then, is a slight change to ThreadSubject so it knows how to filter out the synthetic frames from the list of frames. One might be able to engineer this in a more conventional setting by subclassing ThreadSubject and relying on dependency injection to weave the new subclass into the existing framework - assuming we had the foresight and stamina to use a DI framework in the first place.

I looked into virtual classes in the past, as part of my work at Google to support web app developers. Bruce Johnson put out the call to support problems like Gilad describes above, and a lot of us thought hard on it. Just replace "ThreadSubject" by some bit of browser arcana such as "WorkerThread". You want it to work one way on App Engine, and a different way on Internet Explorer, and you want to allow people to subclass your base class on each platform.

Nowadays I'd call the problem one of "product lines", having had the benefit of talking it over with Kurt Stirewalt. It turns out that software engineering and programming languages have something to do with each other. In the PL world, thinking about "product lines" leads you to coloring, my vote for one of the most overlooked ideas in PL design.

Here is my reply on Gilad's blog:

I'd strengthen your comment about type checking, Gilad: if you try to type check virtual classes, you end up wanting to make the virtual classes very restrictive, thus losing much of the benefit. Virtual classes and type checking are in considerable tension.

Also agreed about the overemphasis on type checking in PL research. Conceptual analysis matters, but it's hard to do, and it's even harder for a paper committee to review it.

I last looked into virtual classes as part of GWT and JS' (the efforts tended to go in tandem). Allow me to add to the motivation you provide. A real problem faced by Google engineers is to develop code bases that run on multiple platforms (web browsers, App engine, Windows machines) and share most of the code. The challenge is to figure out how to swap out the non-shared code on the appropriate platform. While you can use factories and interfaces in Java, it is conceptually cleaner if you can replace classes rather than subclass them. More prosaically, this comes up all the time in regression testing; how many times have we all written an interface and a factory just so that we could stub something out for unit testing?

I found type checking virtual classes to be problematic, despite having delved into a fair amount of prior work on the subject. From what I recall, you end up wanting to have *class override* as a distinct concept from *subclassing*, and for override to be much more restrictive. Unlike with subclassing, you can't refine the type signature of a method from the class being overridden. In fact, even *adding* a new method is tricky; you have to be very careful about method dispatch for it to work.

To see where the challenges come from, imagine class Node having both an override and a subclass. Let's call these classes Node, Node', and LocalizedNode, respectively. Think about what virtual classes mean: at run time, Node' should, in the right circumstances, completely replace class Node. That "replacement" includes replacing the base class of LocalizedNode!

That much is already unsettling. In OO type checking, you must verify that a subclass conforms to its superclass. How do you do this if you can't see the real superclass?

To complete the trap, imagine Node has a method "name" that returns a String. Node' overrides this and--against my rules--returns type AsciiString, because its names only have 7-bit characters in them. LocalizedNode, meanwhile, overrides the name method to look up names in a translation dictionary, so it's very much using Unicode strings. Now imagine calling "name" on a variable of static type Node'. Statically, you expect to get an AsciiString back. However, at run time, this variable might hold a LocalizedNode, in which case you'll get a String. Boom.

Given all this, if you want type checking, then virtual classes are in the research frontier. One reasonable response is to ditch type checking and write code the way you like. Another approach is to explore alternatives to virtual classes. One possible alternative is to look into "coloring", as in Colored FJ.

Monday, January 21, 2013

Andreas Raab

It seems we had a bad week for programmers. I learned via James Robinson that Andreas Raab has died. There is an outpouring of messages on the Squeak mailing list.

I worked with Andreas on the Squeak project several years ago, where I got to see first-hand his outstanding work. Among many other tasks, he played a leading role in actually implementing the Croquet system for eventual consistency. At the time I worked with him, he was developing Tweak, a self-initiated project for rapid GUI development.

On a selfish note, I learned a lot working with Andreas. You only get better in this industry by practicing with good people. Andreas was one of the best.

I'd be remiss not to say he was also a blast to work with. He was always laughing, yet always insightful. He always found ways to get the people around him on a better path. He always found ways to tell them that they were happy to hear.

Rest in peace, Andreas. We are lucky to have had you, for as long as you could stay.