In a nutshell, generalized algebraic data types, or GADTs for short, are a first-class phantom type of sorts. The basic idea is to enrich the constructors of an ADT with a more expressive "return" type by allowing the instantiation of one or more type parameters. A simple motivating example is that of an abstract syntax in Peyton Jones's original paper:
data Term a where Lit :: Int -> Term Int Inc :: Term Int -> Term Int IsZ :: Term Int -> Term Bool If :: Term Bool -> Term a -> Term a -> Term a Pair :: Term a -> Term b -> Term (a,b) Fst :: Term (a,b) -> Term a Snd :: Term (a,b) -> Term b
For instance, the constructor for Inc
specifies the type of its argument as Term Int
by constraining its application to bind the parameter a to Int
, in effect specifying the "return type" of the constructor. Xavier Leroy's proposal for a syntax extension of OCaml would look like this:
type 'a term = | Lit of int constraint 'a = int | Inc of int term constraint 'a = int | IsZ of int term constraint 'a = bool | If of bool term * 'a term * 'a term | Pair of 'b term * 'c term constraint 'a = 'b * 'c | Fst of ('b * 'c) term constraint 'a = 'b | Snd of ('b * 'c) term constraint 'a = 'c
Each syntax emphasizes complementary views of what "is" the essence of this generalization: Haskell's makes clear the parameter's functional dependency on the constructor; OCaml's stresses the constraints placed on a match on the constructor. In any case, this type shows the three cases a GADT might present:
- No constraint, as in the constructor
If
- A concrete constraint, as in the constructors
Lit
,Inc
orIsZ
- An existential constraint, as in the constructors
Pair
,Fst
orSnd
The constraints a GADT places on the parameters must be met by the functions that do a pattern-matching on the type. In general, the result type is a function of the types of the parameter. This requires polymorphic recursion which forces the use of an explicit rank-2 type for it. For instance, a function evaluating a Term a
will have type a for each possible a defined in the GADT. In the following code the match for Lit
forces a to be instantiated to Int
, whereas the matching for IsZ
forces it to Bool
:
eval :: Term a -> a eval term = case term of Lit n -> n Inc t -> eval t + 1 IsZ t -> eval t == 0 If c x y -> if eval c then eval x else eval y Pair x y -> (eval x, eval y) Fst t -> fst (eval t) Snd t -> snd (eval t)
Unfortunately, the current version of OCaml (3.11) does not yet include either GADTs or rank-2 polymorphism. On July 11th, 2009, Oleg announced that OCaml's type system is expressive enough to encode GADTs. On his site he shows three applications of the technique. However, Oleg's examples do not offer much guidance in how to apply the encoding, especially since the more complete example, that of the typed printf
/scanf
, is too advanced an application to use as a template. I have shown in another post how to encode polymorphic recursion with rank-2 types. Here I'll explain how to treat each of the three cases in a GADT to systematically embed the type in OCaml.
The basic technique is to use a witness for the constraint placed on the parameter. Oleg uses a module with the required semantics:
module EQ = struct type ('a,'b) eq = Refl of 'a option ref * 'b option ref let refl () = let r = ref None in Refl (r, r) let symm (Refl (x, y)) = Refl (y, x) let apply_eq (Refl (x, y)) v = x := Some v; match !y with | Some w -> x := None; w | _ -> assert false end
A value eq
of type ('a, 'b) eq
witnesses the equality of types a and b. Given such a witness and a value x of type a, apply_eq eq x
is a safe type cast of x to type b. This is sufficient to force the constraints on the type parameter, as explained in the paper Oleg references. For each one of the above identified cases, Oleg's code encodes the constructor, defines a smart constructor to ease the use of the type as an embedded DSL and pattern-matches on the constructor in order to access its parameters. This is how:
- For unconstrained constructors, nothing needs to be done.
For instance, the type constructor for
If
does not require fixing the constructor's parameter a as it is by definition equal to the type parameter:| If of bool term * 'a term * 'a term
A smart constructor for this constructor is routine:
let if_ c t f = If (c, t, f)
(the underscore is to avoid a clash with the reserved keyword). Equally routine is pattern-matching on it:
| If (c,t,f) -> …
If desired for reasons of regularity, however, the
constraint 'a = 'b
for an existentially-qualified parameter b can be introduced explicitely using the last encoding. - For concretely-constrained constructors, it is necessary to witness the equality of the parameter and this concrete type.
It is not as simple, however, to augment the constructor with a value of type
('a, t) EQ.eq
: since matches on those values would force the type of a to be the same for every match, it would require that t be the same concrete type everywhere. What is needed instead is an existential type, say b such that it is constrained to be t and which in turn constrains a. Oleg encodes the existential with the double negation on a rank-2 universal, by way of a witness type. In order to maximize the compactness of the code, he uses a class type to introduce the required rank-2 type:| Lit of < m_lit : 'w . ((int, 'a) eq -> int -> 'w) -> 'w >
Here w witnesses the existence of an a such that, if successfully constrained to
int
, it would bear the required typeint term
. A smart constructor for this constructor must build the object instance which will provide the required witness:let lit n = Lit (object method m_lit : 'w . ((int, 'a) eq -> int -> 'w) -> 'w = fun k -> k (refl ()) n end)
Note first that, under the Curry-Howard isomorphism, continuations encode double negation. Note also that the typing must be explicit since it relies on structual and not nominal class types. Note finally that the encoding style is curried much as in Haskell definitions; it is entirely equivalent to encode Leroy-style tupled constructors.
Now a match on a constructor of this kind must invoke the method on the object parameter with a suitable continuation that will accept the witness as valid and coerce the argument to the constrained type:
| Lit o -> o#m_lit (fun eq n -> apply_eq eq n)
As will be apparent from the complete encoding of the Haskell example, the other cases are the same.
- For existentially-constrained constructors, it is necessary to encode the existential with a rank-2 type, and witness the equality of the parameter and this type.
This case is an extension of the previous one, with the twist that the existential must be encoded separately. In this case Oleg uses a record type instead of an object type, but the encoding could use the latter equally well. For instance, in the case of
Pair
s, there must exist types b and c such that a is constrained to be b×c. The constructor factors out this existential:| Pair of < m_pair : 'w . ('a, 'w) k_pair -> 'w >
Here,
k_pair
will bear the required witness w on a's constraint:type ('a, 'w) k_pair = { k_pair : 'b 'c . ('b*'c, 'a) eq -> 'b term -> 'c term -> 'w }
The equality witness expresses the required intent on the types of the constructor's arguments. The smart constructor is analogous to the previous one, except that in this case the continuation has type
k_pair
:let pair p q = Pair (object method m_pair : 'w . ('a, 'w) k_pair -> 'w = fun k -> k.k_pair (refl ()) p q end)
The pattern-match is analogous too:
| Pair o -> o#m_pair { k_pair = fun eq x y -> apply_eq eq … }
Here's the complete code for the type definitions and the smart constructors encoding the Haskell mini-expression language example:
type 'a term = | Lit of < m_lit : 'w . ((int , 'a) eq -> int -> 'w) -> 'w > | Inc of < m_inc : 'w . ((int , 'a) eq -> int term -> 'w) -> 'w > | IsZ of < m_isz : 'w . ((bool, 'a) eq -> int term -> 'w) -> 'w > | If of bool term * 'a term * 'a term | Pair of < m_pair : 'w . ('a, 'w) k_pair -> 'w > | Fst of < m_fst : 'w . ('a, 'w) k_fst -> 'w > | Snd of < m_snd : 'w . ('a, 'w) k_snd -> 'w > and ('a, 'w) k_pair = { k_pair : 'b 'c . ('b*'c, 'a) eq -> 'b term -> 'c term -> 'w } and ('a, 'w) k_fst = { k_fst : 'b 'c . ('b , 'a) eq -> ('b * 'c) term -> 'w } and ('a, 'w) k_snd = { k_snd : 'b 'c . ('c , 'a) eq -> ('b * 'c) term -> 'w } let lit n = Lit (object method m_lit : 'w . ((int , 'a) eq -> int -> 'w) -> 'w = fun k -> k (refl ()) n end) let inc t = Inc (object method m_inc : 'w . ((int , 'a) eq -> int term -> 'w) -> 'w = fun k -> k (refl ()) t end) let isz t = IsZ (object method m_isz : 'w . ((bool, 'a) eq -> int term -> 'w) -> 'w = fun k -> k (refl ()) t end) let if_ c t f = If (c, t, f) let pair p q = Pair (object method m_pair : 'w . ('a, 'w) k_pair -> 'w = fun k -> k.k_pair (refl ()) p q end) let first p = Fst (object method m_fst : 'w . ('a, 'w) k_fst -> 'w = fun k -> k.k_fst (refl ()) p end) let second p = Snd (object method m_snd : 'w . ('a, 'w) k_snd -> 'w = fun k -> k.k_snd (refl ()) p end)
It is important to remark on the types of the smart constructors, which are the most similar to the Haskell constructors:
val lit : int -> int term val inc : int term -> int term val isz : int term -> bool term val if_ : bool term -> 'a term -> 'a term -> 'a term val pair : 'a term -> 'b term -> ('a * 'b) term val first : ('a * 'b) term -> 'a term val second : ('a * 'b) term -> 'b term
Now, as I mentioned above, an evaluator has type 'a term -> 'a
for all a; this requires rank-2 polymorphism. In order to keep the namespace tidy, I use a local module definition hiding a record type encoding the higher-kinded function type:
let eval e = let module E = struct type eval = { eval : 'a . 'a term -> 'a } let rec eval = { eval = function | Lit o -> o#m_lit (fun eq n -> apply_eq eq n) | Inc o -> o#m_inc (fun eq t -> apply_eq eq (eval.eval t + 1)) | IsZ o -> o#m_isz (fun eq t -> apply_eq eq (eval.eval t = 0)) | If (c,t,f) -> if eval.eval c then eval.eval t else eval.eval f | Pair o -> o#m_pair { k_pair = fun eq x y -> apply_eq eq (eval.eval x, eval.eval y) } | Fst o -> o#m_fst { k_fst = fun eq p -> apply_eq eq (fst (eval.eval p)) } | Snd o -> o#m_snd { k_snd = fun eq p -> apply_eq eq (snd (eval.eval p)) } } end in E.eval.E.eval e
In every case the coercion must be applied at the end, to the result of recursively evaluating the subexpressions in a constructor.
This is all there is to Oleg's encoding. For me, this exegesis has two lessons: first, much as rank-2 polymorphism, GADTs are a conservative extension to Haskell's and OCaml's type system. In fact, version 3.12 will allow higher-ranked functions via explicit annotations. One can only hope that the latter would also find its way into the next release. Second, that Oleg's proof-of-concept examples bear detailed examination to find the general mechanism behind his insights.