Showing posts with label Haskell. Show all posts
Showing posts with label Haskell. Show all posts

2010-03-22

GADTs for the Rest of Us

In a nutshell, generalized algebraic data types, or GADTs for short, are a first-class phantom type of sorts. The basic idea is to enrich the constructors of an ADT with a more expressive "return" type by allowing the instantiation of one or more type parameters. A simple motivating example is that of an abstract syntax in Peyton Jones's original paper:

data Term a where
  Lit  :: Int        -> Term Int
  Inc  :: Term Int   -> Term Int
  IsZ  :: Term Int   -> Term Bool
  If   :: Term Bool  -> Term a    -> Term a     -> Term a
  Pair :: Term a     -> Term b    -> Term (a,b)
  Fst  :: Term (a,b) -> Term a
  Snd  :: Term (a,b) -> Term b

For instance, the constructor for Inc specifies the type of its argument as Term Int by constraining its application to bind the parameter a to Int, in effect specifying the "return type" of the constructor. Xavier Leroy's proposal for a syntax extension of OCaml would look like this:

type 'a term =
| Lit  of int                            constraint 'a = int
| Inc  of int term                       constraint 'a = int
| IsZ  of int term                       constraint 'a = bool
| If   of bool term * 'a term * 'a term
| Pair of  'b  term * 'c term            constraint 'a = 'b * 'c
| Fst  of ('b * 'c) term                 constraint 'a = 'b
| Snd  of ('b * 'c) term                 constraint 'a = 'c

Each syntax emphasizes complementary views of what "is" the essence of this generalization: Haskell's makes clear the parameter's functional dependency on the constructor; OCaml's stresses the constraints placed on a match on the constructor. In any case, this type shows the three cases a GADT might present:

  • No constraint, as in the constructor If
  • A concrete constraint, as in the constructors Lit, Inc or IsZ
  • An existential constraint, as in the constructors Pair, Fst or Snd

The constraints a GADT places on the parameters must be met by the functions that do a pattern-matching on the type. In general, the result type is a function of the types of the parameter. This requires polymorphic recursion which forces the use of an explicit rank-2 type for it. For instance, a function evaluating a Term a will have type a for each possible a defined in the GADT. In the following code the match for Lit forces a to be instantiated to Int, whereas the matching for IsZ forces it to Bool:

eval :: Term a -> a
eval term = case term of
  Lit n    -> n
  Inc t    -> eval t + 1
  IsZ t    -> eval t == 0
  If c x y -> if eval c then eval x else eval y
  Pair x y -> (eval x, eval y)
  Fst t    -> fst (eval t)
  Snd t    -> snd (eval t)

Unfortunately, the current version of OCaml (3.11) does not yet include either GADTs or rank-2 polymorphism. On July 11th, 2009, Oleg announced that OCaml's type system is expressive enough to encode GADTs. On his site he shows three applications of the technique. However, Oleg's examples do not offer much guidance in how to apply the encoding, especially since the more complete example, that of the typed printf/scanf, is too advanced an application to use as a template. I have shown in another post how to encode polymorphic recursion with rank-2 types. Here I'll explain how to treat each of the three cases in a GADT to systematically embed the type in OCaml.

The basic technique is to use a witness for the constraint placed on the parameter. Oleg uses a module with the required semantics:

module EQ = struct
  type ('a,'b) eq = Refl of 'a option ref * 'b option ref

  let refl () = let r = ref None in Refl (r, r)

  let symm (Refl (x, y)) = Refl (y, x)

  let apply_eq (Refl (x, y)) v =
    x := Some v;
    match !y with
  | Some w -> x := None; w
  | _      -> assert false
end

A value eq of type ('a, 'b) eq witnesses the equality of types a and b. Given such a witness and a value x of type a, apply_eq eq x is a safe type cast of x to type b. This is sufficient to force the constraints on the type parameter, as explained in the paper Oleg references. For each one of the above identified cases, Oleg's code encodes the constructor, defines a smart constructor to ease the use of the type as an embedded DSL and pattern-matches on the constructor in order to access its parameters. This is how:

  • For unconstrained constructors, nothing needs to be done.

    For instance, the type constructor for If does not require fixing the constructor's parameter a as it is by definition equal to the type parameter:

    | If of bool term * 'a term * 'a term
    

    A smart constructor for this constructor is routine:

    let if_ c t f = If (c, t, f)
    

    (the underscore is to avoid a clash with the reserved keyword). Equally routine is pattern-matching on it:

    | If (c,t,f) -> …
    

    If desired for reasons of regularity, however, the constraint 'a = 'b for an existentially-qualified parameter b can be introduced explicitely using the last encoding.

  • For concretely-constrained constructors, it is necessary to witness the equality of the parameter and this concrete type.

    It is not as simple, however, to augment the constructor with a value of type ('a, t) EQ.eq: since matches on those values would force the type of a to be the same for every match, it would require that t be the same concrete type everywhere. What is needed instead is an existential type, say b such that it is constrained to be t and which in turn constrains a. Oleg encodes the existential with the double negation on a rank-2 universal, by way of a witness type. In order to maximize the compactness of the code, he uses a class type to introduce the required rank-2 type:

    | Lit of < m_lit : 'w . ((int, 'a) eq -> int -> 'w) -> 'w >
    

    Here w witnesses the existence of an a such that, if successfully constrained to int, it would bear the required type int term. A smart constructor for this constructor must build the object instance which will provide the required witness:

    let lit n = Lit (object
      method m_lit : 'w . ((int, 'a) eq -> int -> 'w) -> 'w =
        fun k -> k (refl ()) n end)
    

    Note first that, under the Curry-Howard isomorphism, continuations encode double negation. Note also that the typing must be explicit since it relies on structual and not nominal class types. Note finally that the encoding style is curried much as in Haskell definitions; it is entirely equivalent to encode Leroy-style tupled constructors.

    Now a match on a constructor of this kind must invoke the method on the object parameter with a suitable continuation that will accept the witness as valid and coerce the argument to the constrained type:

    | Lit o -> o#m_lit (fun eq n -> apply_eq eq n)
    

    As will be apparent from the complete encoding of the Haskell example, the other cases are the same.

  • For existentially-constrained constructors, it is necessary to encode the existential with a rank-2 type, and witness the equality of the parameter and this type.

    This case is an extension of the previous one, with the twist that the existential must be encoded separately. In this case Oleg uses a record type instead of an object type, but the encoding could use the latter equally well. For instance, in the case of Pairs, there must exist types b and c such that a is constrained to be b×c. The constructor factors out this existential:

    | Pair of < m_pair : 'w . ('a, 'w) k_pair -> 'w >
    

    Here, k_pair will bear the required witness w on a's constraint:

    type ('a, 'w) k_pair = { k_pair : 'b 'c . ('b*'c, 'a) eq -> 'b term -> 'c term -> 'w }
    

    The equality witness expresses the required intent on the types of the constructor's arguments. The smart constructor is analogous to the previous one, except that in this case the continuation has type k_pair:

    let pair p q = Pair (object
      method m_pair : 'w . ('a, 'w) k_pair -> 'w =
        fun k -> k.k_pair (refl ()) p q end)
    

    The pattern-match is analogous too:

    | Pair o -> o#m_pair { k_pair = fun eq x y -> apply_eq eq … }
    

Here's the complete code for the type definitions and the smart constructors encoding the Haskell mini-expression language example:

type 'a term =
| Lit  of < m_lit  : 'w . ((int , 'a) eq -> int -> 'w) -> 'w >
| Inc  of < m_inc  : 'w . ((int , 'a) eq -> int term -> 'w) -> 'w >
| IsZ  of < m_isz  : 'w . ((bool, 'a) eq -> int term -> 'w) -> 'w >
| If   of bool term * 'a term * 'a term
| Pair of < m_pair : 'w . ('a, 'w) k_pair -> 'w >
| Fst  of < m_fst  : 'w . ('a, 'w) k_fst  -> 'w >
| Snd  of < m_snd  : 'w . ('a, 'w) k_snd  -> 'w >
and ('a, 'w) k_pair = { k_pair : 'b 'c . ('b*'c, 'a) eq -> 'b term -> 'c term -> 'w }
and ('a, 'w) k_fst  = { k_fst  : 'b 'c . ('b   , 'a) eq -> ('b * 'c) term -> 'w }
and ('a, 'w) k_snd  = { k_snd  : 'b 'c . ('c   , 'a) eq -> ('b * 'c) term -> 'w }

let lit  n     = Lit  (object
  method m_lit  : 'w . ((int , 'a) eq -> int -> 'w) -> 'w =
    fun k -> k (refl ()) n end)
let inc  t     = Inc  (object
  method m_inc  : 'w . ((int , 'a) eq -> int term -> 'w) -> 'w =
    fun k -> k (refl ()) t end)
let isz  t     = IsZ  (object
  method m_isz  : 'w . ((bool, 'a) eq -> int term -> 'w) -> 'w =
    fun k -> k (refl ()) t end)
let if_  c t f = If   (c, t, f)
let pair p q   = Pair (object
  method m_pair : 'w . ('a, 'w) k_pair -> 'w =
    fun k -> k.k_pair (refl ()) p q end)
let first  p   = Fst  (object
  method m_fst  : 'w . ('a, 'w) k_fst  -> 'w =
    fun k -> k.k_fst  (refl ()) p end)
let second p   = Snd  (object
  method m_snd  : 'w . ('a, 'w) k_snd  -> 'w =
    fun k -> k.k_snd  (refl ()) p end)

It is important to remark on the types of the smart constructors, which are the most similar to the Haskell constructors:

val lit    : int -> int term
val inc    : int term -> int term
val isz    : int term -> bool term
val if_    : bool term -> 'a term -> 'a term -> 'a term
val pair   : 'a term -> 'b term -> ('a * 'b) term
val first  : ('a * 'b) term -> 'a term
val second : ('a * 'b) term -> 'b term

Now, as I mentioned above, an evaluator has type 'a term -> 'a for all a; this requires rank-2 polymorphism. In order to keep the namespace tidy, I use a local module definition hiding a record type encoding the higher-kinded function type:

let eval e =
  let module E = struct
    type eval = { eval : 'a . 'a term -> 'a }
    let rec eval = { eval = function
    | Lit  o      -> o#m_lit  (fun eq n -> apply_eq eq n)
    | Inc  o      -> o#m_inc  (fun eq t -> apply_eq eq (eval.eval t + 1))
    | IsZ  o      -> o#m_isz  (fun eq t -> apply_eq eq (eval.eval t = 0))
    | If  (c,t,f) -> if eval.eval c then eval.eval t else eval.eval f
    | Pair o      ->
      o#m_pair { k_pair = fun eq x y -> apply_eq eq (eval.eval x, eval.eval y) }
    | Fst  o      ->
      o#m_fst  { k_fst  = fun eq p   -> apply_eq eq (fst (eval.eval p)) }
    | Snd  o      ->
      o#m_snd  { k_snd  = fun eq p   -> apply_eq eq (snd (eval.eval p)) }
    }
  end in E.eval.E.eval e

In every case the coercion must be applied at the end, to the result of recursively evaluating the subexpressions in a constructor.

This is all there is to Oleg's encoding. For me, this exegesis has two lessons: first, much as rank-2 polymorphism, GADTs are a conservative extension to Haskell's and OCaml's type system. In fact, version 3.12 will allow higher-ranked functions via explicit annotations. One can only hope that the latter would also find its way into the next release. Second, that Oleg's proof-of-concept examples bear detailed examination to find the general mechanism behind his insights.

2010-01-19

A Staggering Sequence

Happy new year! Inspired by N. J. A. Sloane's Seven Staggering Sequences, I started thinking about the Gijswijt sequence, A090822 in the OEIS. What appealed to me was the notion of "word logarithm" implicit in its definition:

We begin with b(1) = 1. The rule for computing the next term, b(n + 1), is again rather unusual. We write the sequence of numbers we have seen so far, b(1), b(2),… b(n) in the form of an initial string X, say (which can be the empty string ∅), followed by as many repetitions as possible of som nonempty string Y. That is, we write b(1), b(2),… b(n) = XYk, where k is as large as possible.

I wrote on paper a direct translation of the definition into an "implicit division" function in Haskell:

divide s y      = k
  where s       = x ++ power k y
        power k = concat . replicate k

This motivated me to try and plunge head-first into it! Given such a "division" operation of one string s by a non-empty suffix y, finding the logarithm is easy. First, we need some imports; then, the logarithm itself:


> import Data.List (tails)
> 
> logarithm s = maximum . map (divide s) . init . tails $ s

That is, find among all the suffixes of s the one that can be factored out from it the greatest number of times. Looking up the suffix repeatedly gives rise to a quadratic algorithm; but any problem on suffixes can be turned into a problem on the prefixes of the reversed strings. Because of this, I'll be sloppy and use prefixes and suffixes interchangeably. Counting the length of the longest common prefix of two lists is easy (edit simplified the original recursive function):


> prefixLength :: Eq a => [a] -> [a] -> Int
> prefixLength xs = length . takeWhile (uncurry (==)) . zip xs

In order to know how many times the string y is a prefix of s, it suffices to find the length of the common prefix between s and an infinity of repetitions of y:


> divide s y = n `div` length y
>   where n  = prefixLength (cycle . reverse $ y) (reverse s)

That's divide, now for the hard part. The sequence definition makes each element depend on all the preceding ones. In effect, it requires a function nest such that nest f e = [e, f [e], f [e, f [e]], f [e, f [e], f [e, f [e]]], …]. I thought of tying the knot on the tails of the result, like this (warning! broken code):

nest f e = xs where xs = e : (map f . tail . inits $ xs)

Unfortunately, inits is too strict, and goes off trying to match a corecursive stream against the empty list. The solution is to explicitly take longer and longer prefixes with a co-induction index:


> nest :: ([a] -> a) -> (a -> [a])
> nest f e       = xs
>   where xs     = e : from 1
>         from i = f (take i xs) : from (i + 1)

(note the funny type). With that in place, the sequence is:


> a090822 = nest logarithm 1

A quick check shows that everything works:

Main Data.Maybe Data.List> fromJust . elemIndex 4 $ a090822
219

(the quadratic complexity of next shows here). Don't try looking for the elemIndex 5 a090822. Read the paper for details.

Thanks to ddarius, opqdonut and Berengal on #haskell.

2009-08-30

Fun with (Type) Class

I'd like to motivate OCaml's object-oriented features (the "O" in "OCaml") by showing an encoding of (first-order) Haskell type classes. This will require making use of almost everything in OCaml's object buffet: class types, virtual classes, inheritance of class types and of classes, double dispatch and coercion.

Haskell could be said to have the best type system among all languages in wide use at the moment, striking a delicate balance between power and practicality without compromising purity (neither conceptual nor functional). In the words of Oleg Kiselyov, it sports "attractive types". A great feature is the presence of type-indexed values via the type-class mechanism. In a nutshell, type-indexed values are functions from types to values. A classic example is overloaded operators, where for instance the same symbol + denotes addition over the integers, floating-point and rational numbers, depending on the type of its parameters. Haskell provides "type magic" to automatically infer the intended function (the value) based on the types of the arguments to which it is applied, achieving a form of type-safe "ad-hoc" polymorphism.

The ML family of languages doesn't have the machinery to attain such expressive power. There are, however, a number of ways to encode type-indexed families with an explicit parameter encoding the selector type. The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types. Another technique is using an explicit dictionary (Yaron Minsky has a worked example) from which to select the appropriate operation. Normally these dictionaries are records; the labels provide type-safe instantiation of the generic value family. The problem with records is that they are not extensible, so it is not immediate how to encode the equivalent of inheritance of type classes. OCaml's object-oriented sublanguage is especially well-suited for the task, allowing for a natural encoding that shows that the "class" in Haskell's "type classes" is more object-oriented than it seems at first sight.

I'll start with a bit of Haskell's standard prelude, as shown in the Report in a nice graph. I'll focus on the two most common nontrivial classes, Eq and Ord. The first is defined as:

class  Eq a  where
    (==), (/=)  ::  a -> a -> Bool

    x /= y  = not (x == y)
    x == y  = not (x /= y)

I encode a Haskell type class in OCaml with… (surprise!) a class type and associated functions:

class type ['a] eq = object
  method equal     : 'a -> 'a -> bool
  method not_equal : 'a -> 'a -> bool
end

Since the type class has a type parameter a, my class type also has a type parameter 'a. Each member of the type class corresponds to a (named) method and to a regular function taking the class type as a dictionary parameter:

let (===) x y (eq : 'a #eq) = eq#equal x y
and (<=>) x y (eq : 'a #eq) = eq#not_equal x y

(I've opted to not shadow the regular operators in Pervasives). The weird order of the parameters (you'd expect the dictionary to come first) is such that I can write:

    if x === y $ eq_int then …

where ($) is the application operator. Haskell provides default implementations for both members; as the Report notes:

This declaration gives default method declarations for both /= and ==, each being defined in terms of the other. If an instance declaration for Eq defines neither == nor /=, then both will loop.

I find that unsettling, so I'll just define one via a virtual class (not class type):

class virtual ['a] eq_base = object (self)
  method virtual equal : 'a -> 'a -> bool
  method not_equal x y = not (self#equal x y)
end

In this case, equal is left virtual. It is important that the class be left virtual even if equal is defined, so that it cannot be instantiated. This means that it cannot be coerced to the class type eq, since it is illegal to shadow virtual members that were declared non-virtual. Now come the instances of the type class. In Haskell, a very simple instance is that for unit which is built-in (via deriving) but could be declared as:

instance  Eq ()  where
    () == () = true

To encode instances, I use an immediate object bound to an identifier:

let eq_unit : unit eq = object
  inherit [unit] eq_base
  method equal     () () = true
  method not_equal () () = false
end

Note the coercion to the corresponding monomorphic instance of the class type. I inherit from the default implementation in the class base, even if in this case I explicitly implement all methods for efficiency. The same encoding works for booleans too:

let eq_bool : bool eq = object
  inherit [bool] eq_base
  method equal     = (==)
  method not_equal = (!=)
end

OCaml has built-in ad-hoc polymorphic comparisons that I could use directly:

class ['a] eq_poly : ['a] eq = object
  inherit ['a] eq_base
  method equal     = Pervasives.(=)
  method not_equal = Pervasives.(<>)
end

With that I can have type classes for the primitive types:

let eq_int    = (new eq_poly :> int    eq)
and eq_float  = (new eq_poly :> float  eq)
and eq_char   = (new eq_poly :> char   eq)
and eq_string = (new eq_poly :> string eq)

Here I instantiate objects of the eq_poly class coerced to the correct monomorphic instance of eq (recall that in OCaml coercion is always explicit). The type class instance for pairs requires type classes for each of its components. In Haskell, we'd have:

instance  (Eq a, Eq b) => Eq ((,) a b)  where …

(not really, but let me gloss over that). Encoding that in OCaml also requires a witness for each type parameter as explicit arguments to the constructor:

let eq_pair (l : 'a eq) (r : 'b eq) : ('a * 'b) eq = object (self)
  inherit ['a * 'b] eq_base
  method equal (p, q) (p', q') = l#equal p p' && r#equal q q'
end

An inefficiency of this encoding is that it creates a new object with every invocation of the witness function. For instance, in general eq_pair eq_unit eq_bool != eq_pair eq_unit eq_bool and so on for every witness. The same is needed to encode an equality witness for option types:

let eq_option (o : 'a eq) : 'a option eq = object
  inherit ['a option] eq_base
  method equal x y = match x, y with
    None  , None   -> true
  | Some x, Some y -> o#equal x y
  | _              -> false
end

And for lists:

let eq_list (o : 'a eq) : 'a list eq = object
  inherit ['a list] eq_base
  method equal =
    let rec equal x y = match x, y with
    | [], [] -> true
    | _ , []
    | [], _  -> false
    | x :: xs, y :: ys when o#equal x y -> equal xs ys
    | _      -> false
    in equal
end

(I use explicit closed recursion for efficiency). In all these cases, I encoded a derived instance via an explicit witness passed as a parameter. Derived type classes require genuine inheritance of class types, as seen in the encoding of Haskell's Ord type class:

class  (Eq a) => Ord a  where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

(I omit showing the default implementations). I use the SML ordering type for fidelity:

type ordering = LT | EQ | GT

let of_comparison c = if c < 0 then LT else if c > 0 then GT else EQ
and to_comparison   = function LT -> -1 | EQ -> 0 | GT -> -1

The class type encoding the Ord type class is:

class type ['a] ord = object
  inherit ['a] eq
  method compare       : 'a -> 'a -> ordering
  method less_equal    : 'a -> 'a -> bool
  method less_than     : 'a -> 'a -> bool
  method greater_equal : 'a -> 'a -> bool
  method greater_than  : 'a -> 'a -> bool
  method max           : 'a -> 'a -> 'a
  method min           : 'a -> 'a -> 'a
end

Again, operators take an explicit witness as a dictionary:

let (<==) x y (ord : 'a #ord) = ord#less_equal    x y
and (<<<) x y (ord : 'a #ord) = ord#less_than     x y
and (>==) x y (ord : 'a #ord) = ord#greater_equal x y
and (>>>) x y (ord : 'a #ord) = ord#greater_than  x y

And I provide a virtual class with default implementations:

class virtual ['a] ord_base = object (self)
  inherit ['a] eq_base
  method virtual compare : 'a -> 'a -> ordering
  method equal         x y = match self#compare x y with EQ -> true  | _ -> false
  method less_equal    x y = match self#compare x y with GT -> false | _ -> true
  method less_than     x y = match self#compare x y with LT -> true  | _ -> false
  method greater_equal x y = match self#compare x y with LT -> false | _ -> true
  method greater_than  x y = match self#compare x y with GT -> true  | _ -> false
  method max           x y = match self#compare x y with LT -> y | _ -> x
  method min           x y = match self#compare x y with GT -> y | _ -> x
end

Note how the class type inheritance ord :> eq translates into class inheritance ord_base :> eq_base. The instances for unit and bool use inheritance and selective overriding for efficiency:

let ord_unit : unit ord = object
  inherit [unit] ord_base
  method equal     = eq_unit#equal
  method not_equal = eq_unit#not_equal
  method compare () () = EQ
  method max     () () = ()
  method min     () () = ()
end

let ord_bool : bool ord = object
  inherit [bool] ord_base
  method equal     = eq_bool#equal
  method not_equal = eq_bool#not_equal
  method compare b b' = match b, b' with
  | false, true -> LT
  | true, false -> GT
  | _           -> EQ
  method max = (||)
  method min = (&&)
end

In order to profit from the efficient implementation of equal and not_equal I explicitly delegate on the methods in the corresponding eq instances. Now as before I can use the ad-hoc polymorphic operators in Pervasives to speed writing the type classes for atomic types:

class ['a] ord_poly : ['a] ord = object
  inherit ['a] ord_base
  method compare x y   = of_comparison (Pervasives.compare x y)
  method less_equal    = Pervasives.(<=)
  method less_than     = Pervasives.(< )
  method greater_equal = Pervasives.(>=)
  method greater_than  = Pervasives.(> )
  method max           = Pervasives.max
  method min           = Pervasives.min
end

let ord_int    = (new ord_poly :> int    ord)
and ord_float  = (new ord_poly :> float  ord)
and ord_char   = (new ord_poly :> char   ord)
and ord_string = (new ord_poly :> string ord)

(Note the use of the injection into ordering). The case for pairs requires appropriate witnesses as before:

let ord_pair (l : 'a ord) (r : 'b ord) : ('a * 'b) ord = object
  inherit ['a * 'b] ord_base
  method equal = (eq_pair (l :> 'a eq) (r :> 'b eq))#equal
  method compare (p, q) (p', q') =
    match l#compare p p' with EQ -> r#compare q q' | c -> c
end

In this case, delegating equal requires passing the witnesses to eq_pair explicitly coerced to the eq class type. Finally, the classes for options and lists are straightforward applications of the encoding:

let ord_option (o : 'a ord) : 'a option ord = object
  inherit ['a option] ord_base
  method equal = (eq_option (o :> 'a eq))#equal
  method compare x y = match x with
    None   -> (match y with None -> EQ | Some _ -> LT)
  | Some x -> (match y with None -> GT | Some y -> o#compare x y)
end

let ord_list (o : 'a ord) : 'a list ord = object
  inherit ['a list] ord_base
  method equal = (eq_list (o :> 'a eq))#equal
  method compare =
    let rec compare x y = match x, y with
    | [], [] -> EQ
    | _ , [] -> GT
    | [], _  -> LT
    | x :: xs, y :: ys -> match o#compare x y with
    | EQ -> compare xs ys
    | c  -> c
    in compare
end

As it is clear, the encoding shows a direct correspondence between Haskell type classes and OCaml's classes and objects:

  • type classes into class types with top-level functions dispatching on a witness object of that class type
  • derived type classes into inheriting class types
  • default implementations into virtual classes
  • instances into objects inheriting from the default virtual class
  • derived instances into objects having witnesses of the appropriate class type

In every case, the witness objects are used purely as closed vtables without state. It would be interesting to investigate which use cases would open using row polymorphism or data members. Also, note that there is no need to abstract over the object types, as each witness carries with it the intended implementation in a type-safe manner.

This technique is analogous to the one using functors to encode type classes, but then why have both in the same language?, in the words of Jacques Garrigue. One reason is that functors are not first-class values in OCaml but objects are, which makes using them as type witnesses much more lightweight. On the other hand, one limitation that functors don't have is that this encoding is limited to first-order type classes. In other words, it is not clear what would be required to implement Haskell type classes like:

instance  (Monad m) => Functor m  where
    fmap f x = bind x (return . f)

where m is a type constructor with sort * → * using the encoding presented here, whereas it is straightforward to concoct a (ML) functor:

module MonadFunctor (M : MONAD) : FUNCTOR = struct
  type 'a t = 'a M.t
  let fmap f x = M.bind x (fun v -> M.return (f v))
end

Another limitation is the need to pass a witness object to the type-indexed functions. One would ideally want to write something like:

let (===) (x : eq) = x#equal

where the value is its own witness, but obviously this would require that every value be wrapped into an object, requiring, for instance, code like the following:

if (new eq_int 5) === (new eq_int x) then …

Moreover, in order to actually operate on the wrapped values, the classes would need an explicit accessor. OCaml supports double dispatch making possible forcing the types of x and y to be equal:

class type ['a] eq = object ('self)
  method value     : 'a
  method equal     : 'self -> bool
  method not_equal : 'self -> bool
end

class eq_int x : [int] eq = object (self : 'self)
  method value                    = x
  method equal     (that : 'self) = self#value == that#value
  method not_equal (that : 'self) = self#value != that#value
end

Canonicalizing the witnesses for finite types like bool is very tedious:

let eq_bool : bool -> bool eq =
  let _t : bool eq = object
    method value          = true
    method equal     that =      that#value
    method not_equal that = not (that#value)
  end and _f : bool eq = object
    method value          = false
    method equal     that = not (that#value)
    method not_equal that =      that#value
  end in fun b -> if b then _t else _f

(essentially, a Church-encoded boolean). More importantly, this encoding is much more dynamic, making it rather remote with respect to the original, fully static Haskell semantics (this, incidentally, is a definite advantage of the functorial encoding).

In closing, it would be interesting to see how far this technique can be carried, especially to implement a numeric tower analogous to Haskell's, or maybe a better type class hierarchy to encode would be that of the Numeric Prelude. Another necessary avenue of investigation is measuring the abstraction penalty incurred by this encoding. In any case, I believe this is a neat showcase of the often-forgotten object-oriented features in OCaml.