z=x+y
in an
imperative language, then the meaning is that you compute x, you
compute y, and then you add them together to produce z. If you want to
understand the typing of such an expression, you first find the types
of x and y, and then look at the numeric conversion rules to see what
the type of z must be. If x is a 32-bit integer and y is a 64-bit
floating-point number, then z will be a 64-bit floating-point number.
In Datalog, there are additional options for using the same formula. One option is to do as in an imperative language, compute x and y, and then use the arithmetic formula to compute z. However, you could just as well compute x and z and then subtract to find y. In total there are four different ways to use the formula: you could use it to compute x, y, or z, or if all three are already computed, you could add them up and verify that they are in a proper relation with each other. How can we reason about the typing implications about an arithmetic formula matching?
Call such a constraint an arithmetic type constraint. Such a
constraint must treat x, y, and z symmetrically, so think of them as
being in a set. That is, an arithmetic type constraint is defined by a set
of variables. If the variables are x, y, and z, then the arithmetic
constraint for them could be written arith({x,y,z})
.
It appears to work well to treat an arithmetic constraint as equivalent to the following set of individual constraints:
// constraints for arith({x,y,z}) are: x <= lub(y, z) y <= lub(x, z) z <= lub(x, y)In these constraints, "<=" means subtype-of, and "lub" means least upper bound. The lub of two types is the smallest type that is a supertype of both of them. For numeric types, humor me and think of subtyping as including valid numeric upcasts, so int32 is a subtype of float64. This post is already too long to get into subtyping versus numeric conversion.
Try these equations on a few examples, and it appears to work well. For example, if x and y are already known to be int32, then these constraints imply that z is no larger than lub(int32,int32), which is just int32. As another example, if x is an int32 but y and z are completely unknown, then these constraints give no new bound to anything. As a final example, if all three variables already have a known type, but one of them is a larger type than the other two, then the third one can be shrunk to the lub of the other two.
In general, these constraints will only allow the types of three variables to be in one of the following four combinations:
- All three types are completely unknown.
- One type is known but the other two are unknown.
- All three types are the same.
- Two of the types are the same, and the third type is a subtype of the other two.
As one final wrinkle, it's arguably more programmer friendly to always
perform arithmetic on at least 32-bit if not 64-bit values. Otherwise,
when a programmer writes 127+1, they might get 8-bit arithmetic and
thus an overflow. To make arithmetic work that way, it's possible to
add a minimum type to an arithmetic constraint. For example,
arith({x,y,z},int64)
would be an arithmetic constraint
over x, y, and z, and a minimum result type of 64-bit integers. This
would be equivalent to the following combination of constraints:
// constraints for arith({x,y,z}, int64) are: x <= lub(y, z, int64) y <= lub(x, z, int64) z <= lub(x, y, int64)
No comments:
Post a Comment