Tuesday, December 14, 2010

Typing arithmetic in Datalog

Unlike in imperative languages, arithmetic in Datalog can execute in different orders. If you write a formula z=x+y in an imperative language, then the meaning is that you compute x, you compute y, and then you add them together to produce z. If you want to understand the typing of such an expression, you first find the types of x and y, and then look at the numeric conversion rules to see what the type of z must be. If x is a 32-bit integer and y is a 64-bit floating-point number, then z will be a 64-bit floating-point number.

In Datalog, there are additional options for using the same formula. One option is to do as in an imperative language, compute x and y, and then use the arithmetic formula to compute z. However, you could just as well compute x and z and then subtract to find y. In total there are four different ways to use the formula: you could use it to compute x, y, or z, or if all three are already computed, you could add them up and verify that they are in a proper relation with each other. How can we reason about the typing implications about an arithmetic formula matching?

Call such a constraint an arithmetic type constraint. Such a constraint must treat x, y, and z symmetrically, so think of them as being in a set. That is, an arithmetic type constraint is defined by a set of variables. If the variables are x, y, and z, then the arithmetic constraint for them could be written arith({x,y,z}).

It appears to work well to treat an arithmetic constraint as equivalent to the following set of individual constraints:

 
  // constraints for arith({x,y,z}) are:
  x <= lub(y, z)
  y <= lub(x, z)
  z <= lub(x, y)
In these constraints, "<=" means subtype-of, and "lub" means least upper bound. The lub of two types is the smallest type that is a supertype of both of them. For numeric types, humor me and think of subtyping as including valid numeric upcasts, so int32 is a subtype of float64. This post is already too long to get into subtyping versus numeric conversion.

Try these equations on a few examples, and it appears to work well. For example, if x and y are already known to be int32, then these constraints imply that z is no larger than lub(int32,int32), which is just int32. As another example, if x is an int32 but y and z are completely unknown, then these constraints give no new bound to anything. As a final example, if all three variables already have a known type, but one of them is a larger type than the other two, then the third one can be shrunk to the lub of the other two.

In general, these constraints will only allow the types of three variables to be in one of the following four combinations:

  • All three types are completely unknown.
  • One type is known but the other two are unknown.
  • All three types are the same.
  • Two of the types are the same, and the third type is a subtype of the other two.
The first two cases will eventually lead to a compile-time error, because the programmer has introduced variables but not provided a way to reliably bind those variables to values. The third option is a normal, happy case, where there are no numeric conversions being applied. The fourth option is most interesting, in my view, because there is an operation between two types and the result is the lub of those two types. Notably absent from the list is any combination where all three types are different.

As one final wrinkle, it's arguably more programmer friendly to always perform arithmetic on at least 32-bit if not 64-bit values. Otherwise, when a programmer writes 127+1, they might get 8-bit arithmetic and thus an overflow. To make arithmetic work that way, it's possible to add a minimum type to an arithmetic constraint. For example, arith({x,y,z},int64) would be an arithmetic constraint over x, y, and z, and a minimum result type of 64-bit integers. This would be equivalent to the following combination of constraints:

  
  // constraints for arith({x,y,z}, int64) are:
  x <= lub(y, z, int64)
  y <= lub(x, z, int64)
  z <= lub(x, y, int64)

No comments: