I'd like to motivate OCaml's object-oriented features (the "O" in "OCaml") by showing an encoding of (first-order) Haskell type classes. This will require making use of almost everything in OCaml's object buffet: class types, virtual classes, inheritance of class types and of classes, double dispatch and coercion.
Haskell could be said to have the best type system among all languages in wide use at the moment, striking a delicate balance between power and practicality without compromising purity (neither conceptual nor functional). In the words of Oleg Kiselyov, it sports "attractive types". A great feature is the presence of type-indexed values via the type-class mechanism. In a nutshell, type-indexed values are functions from types to values. A classic example is overloaded operators, where for instance the same symbol +
denotes addition over the integers, floating-point and rational numbers, depending on the type of its parameters. Haskell provides "type magic" to automatically infer the intended function (the value) based on the types of the arguments to which it is applied, achieving a form of type-safe "ad-hoc" polymorphism.
The ML family of languages doesn't have the machinery to attain such expressive power. There are, however, a number of ways to encode type-indexed families with an explicit parameter encoding the selector type. The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types. Another technique is using an explicit dictionary (Yaron Minsky has a worked example) from which to select the appropriate operation. Normally these dictionaries are records; the labels provide type-safe instantiation of the generic value family. The problem with records is that they are not extensible, so it is not immediate how to encode the equivalent of inheritance of type classes. OCaml's object-oriented sublanguage is especially well-suited for the task, allowing for a natural encoding that shows that the "class" in Haskell's "type classes" is more object-oriented than it seems at first sight.
I'll start with a bit of Haskell's standard prelude, as shown in the Report in a nice graph. I'll focus on the two most common nontrivial classes, Eq
and Ord
. The first is defined as:
class Eq a where (==), (/=) :: a -> a -> Bool x /= y = not (x == y) x == y = not (x /= y)
I encode a Haskell type class in OCaml with… (surprise!) a class type and associated functions:
class type ['a] eq = object method equal : 'a -> 'a -> bool method not_equal : 'a -> 'a -> bool end
Since the type class has a type parameter a, my class type also has a type parameter 'a
. Each member of the type class corresponds to a (named) method and to a regular function taking the class type as a dictionary parameter:
let (===) x y (eq : 'a #eq) = eq#equal x y and (<=>) x y (eq : 'a #eq) = eq#not_equal x y
(I've opted to not shadow the regular operators in Pervasives
). The weird order of the parameters (you'd expect the dictionary to come first) is such that I can write:
if x === y $ eq_int then …
where ($)
is the application operator. Haskell provides default implementations for both members; as the Report notes:
This declaration gives default method declarations for both /= and ==, each being defined in terms of the other. If an instance declaration for Eq defines neither == nor /=, then both will loop.
I find that unsettling, so I'll just define one via a virtual class (not class type):
class virtual ['a] eq_base = object (self) method virtual equal : 'a -> 'a -> bool method not_equal x y = not (self#equal x y) end
In this case, equal
is left virtual. It is important that the class be left virtual even if equal
is defined, so that it cannot be instantiated. This means that it cannot be coerced to the class type eq
, since it is illegal to shadow virtual members that were declared non-virtual. Now come the instances of the type class. In Haskell, a very simple instance is that for unit
which is built-in (via deriving
) but could be declared as:
instance Eq () where () == () = true
To encode instances, I use an immediate object bound to an identifier:
let eq_unit : unit eq = object inherit [unit] eq_base method equal () () = true method not_equal () () = false end
Note the coercion to the corresponding monomorphic instance of the class type. I inherit from the default implementation in the class base, even if in this case I explicitly implement all methods for efficiency. The same encoding works for booleans too:
let eq_bool : bool eq = object inherit [bool] eq_base method equal = (==) method not_equal = (!=) end
OCaml has built-in ad-hoc polymorphic comparisons that I could use directly:
class ['a] eq_poly : ['a] eq = object inherit ['a] eq_base method equal = Pervasives.(=) method not_equal = Pervasives.(<>) end
With that I can have type classes for the primitive types:
let eq_int = (new eq_poly :> int eq) and eq_float = (new eq_poly :> float eq) and eq_char = (new eq_poly :> char eq) and eq_string = (new eq_poly :> string eq)
Here I instantiate objects of the eq_poly
class coerced to the correct monomorphic instance of eq
(recall that in OCaml coercion is always explicit). The type class instance for pairs requires type classes for each of its components. In Haskell, we'd have:
instance (Eq a, Eq b) => Eq ((,) a b) where …
(not really, but let me gloss over that). Encoding that in OCaml also requires a witness for each type parameter as explicit arguments to the constructor:
let eq_pair (l : 'a eq) (r : 'b eq) : ('a * 'b) eq = object (self) inherit ['a * 'b] eq_base method equal (p, q) (p', q') = l#equal p p' && r#equal q q' end
An inefficiency of this encoding is that it creates a new object with every invocation of the witness function. For instance, in general eq_pair eq_unit eq_bool != eq_pair eq_unit eq_bool
and so on for every witness. The same is needed to encode an equality witness for option
types:
let eq_option (o : 'a eq) : 'a option eq = object inherit ['a option] eq_base method equal x y = match x, y with None , None -> true | Some x, Some y -> o#equal x y | _ -> false end
And for list
s:
let eq_list (o : 'a eq) : 'a list eq = object inherit ['a list] eq_base method equal = let rec equal x y = match x, y with | [], [] -> true | _ , [] | [], _ -> false | x :: xs, y :: ys when o#equal x y -> equal xs ys | _ -> false in equal end
(I use explicit closed recursion for efficiency). In all these cases, I encoded a derived instance via an explicit witness passed as a parameter. Derived type classes require genuine inheritance of class types, as seen in the encoding of Haskell's Ord
type class:
class (Eq a) => Ord a where compare :: a -> a -> Ordering (<), (<=), (>), (>=) :: a -> a -> Bool max, min :: a -> a -> a
(I omit showing the default implementations). I use the SML ordering
type for fidelity:
type ordering = LT | EQ | GT let of_comparison c = if c < 0 then LT else if c > 0 then GT else EQ and to_comparison = function LT -> -1 | EQ -> 0 | GT -> -1
The class type encoding the Ord
type class is:
class type ['a] ord = object inherit ['a] eq method compare : 'a -> 'a -> ordering method less_equal : 'a -> 'a -> bool method less_than : 'a -> 'a -> bool method greater_equal : 'a -> 'a -> bool method greater_than : 'a -> 'a -> bool method max : 'a -> 'a -> 'a method min : 'a -> 'a -> 'a end
Again, operators take an explicit witness as a dictionary:
let (<==) x y (ord : 'a #ord) = ord#less_equal x y and (<<<) x y (ord : 'a #ord) = ord#less_than x y and (>==) x y (ord : 'a #ord) = ord#greater_equal x y and (>>>) x y (ord : 'a #ord) = ord#greater_than x y
And I provide a virtual class with default implementations:
class virtual ['a] ord_base = object (self) inherit ['a] eq_base method virtual compare : 'a -> 'a -> ordering method equal x y = match self#compare x y with EQ -> true | _ -> false method less_equal x y = match self#compare x y with GT -> false | _ -> true method less_than x y = match self#compare x y with LT -> true | _ -> false method greater_equal x y = match self#compare x y with LT -> false | _ -> true method greater_than x y = match self#compare x y with GT -> true | _ -> false method max x y = match self#compare x y with LT -> y | _ -> x method min x y = match self#compare x y with GT -> y | _ -> x end
Note how the class type inheritance ord :> eq
translates into class inheritance ord_base :> eq_base
. The instances for unit
and bool
use inheritance and selective overriding for efficiency:
let ord_unit : unit ord = object inherit [unit] ord_base method equal = eq_unit#equal method not_equal = eq_unit#not_equal method compare () () = EQ method max () () = () method min () () = () end let ord_bool : bool ord = object inherit [bool] ord_base method equal = eq_bool#equal method not_equal = eq_bool#not_equal method compare b b' = match b, b' with | false, true -> LT | true, false -> GT | _ -> EQ method max = (||) method min = (&&) end
In order to profit from the efficient implementation of equal
and not_equal
I explicitly delegate on the methods in the corresponding eq
instances. Now as before I can use the ad-hoc polymorphic operators in Pervasives
to speed writing the type classes for atomic types:
class ['a] ord_poly : ['a] ord = object inherit ['a] ord_base method compare x y = of_comparison (Pervasives.compare x y) method less_equal = Pervasives.(<=) method less_than = Pervasives.(< ) method greater_equal = Pervasives.(>=) method greater_than = Pervasives.(> ) method max = Pervasives.max method min = Pervasives.min end let ord_int = (new ord_poly :> int ord) and ord_float = (new ord_poly :> float ord) and ord_char = (new ord_poly :> char ord) and ord_string = (new ord_poly :> string ord)
(Note the use of the injection into ordering
). The case for pairs requires appropriate witnesses as before:
let ord_pair (l : 'a ord) (r : 'b ord) : ('a * 'b) ord = object inherit ['a * 'b] ord_base method equal = (eq_pair (l :> 'a eq) (r :> 'b eq))#equal method compare (p, q) (p', q') = match l#compare p p' with EQ -> r#compare q q' | c -> c end
In this case, delegating equal
requires passing the witnesses to eq_pair
explicitly coerced to the eq
class type. Finally, the classes for option
s and list
s are straightforward applications of the encoding:
let ord_option (o : 'a ord) : 'a option ord = object inherit ['a option] ord_base method equal = (eq_option (o :> 'a eq))#equal method compare x y = match x with None -> (match y with None -> EQ | Some _ -> LT) | Some x -> (match y with None -> GT | Some y -> o#compare x y) end let ord_list (o : 'a ord) : 'a list ord = object inherit ['a list] ord_base method equal = (eq_list (o :> 'a eq))#equal method compare = let rec compare x y = match x, y with | [], [] -> EQ | _ , [] -> GT | [], _ -> LT | x :: xs, y :: ys -> match o#compare x y with | EQ -> compare xs ys | c -> c in compare end
As it is clear, the encoding shows a direct correspondence between Haskell type classes and OCaml's classes and objects:
- type classes into class types with top-level functions dispatching on a witness object of that class type
- derived type classes into inheriting class types
- default implementations into virtual classes
- instances into objects inheriting from the default virtual class
- derived instances into objects having witnesses of the appropriate class type
In every case, the witness objects are used purely as closed vtables without state. It would be interesting to investigate which use cases would open using row polymorphism or data members. Also, note that there is no need to abstract over the object types, as each witness carries with it the intended implementation in a type-safe manner.
This technique is analogous to the one using functors to encode type classes, but then why have both in the same language?
, in the words of Jacques Garrigue. One reason is that functors are not first-class values in OCaml but objects are, which makes using them as type witnesses much more lightweight. On the other hand, one limitation that functors don't have is that this encoding is limited to first-order type classes. In other words, it is not clear what would be required to implement Haskell type classes like:
instance (Monad m) => Functor m where fmap f x = bind x (return . f)
where m is a type constructor with sort * → *
using the encoding presented here, whereas it is straightforward to concoct a (ML) functor:
module MonadFunctor (M : MONAD) : FUNCTOR = struct type 'a t = 'a M.t let fmap f x = M.bind x (fun v -> M.return (f v)) end
Another limitation is the need to pass a witness object to the type-indexed functions. One would ideally want to write something like:
let (===) (x : eq) = x#equal
where the value is its own witness, but obviously this would require that every value be wrapped into an object, requiring, for instance, code like the following:
if (new eq_int 5) === (new eq_int x) then …
Moreover, in order to actually operate on the wrapped values, the classes would need an explicit accessor. OCaml supports double dispatch making possible forcing the types of x and y to be equal:
class type ['a] eq = object ('self) method value : 'a method equal : 'self -> bool method not_equal : 'self -> bool end class eq_int x : [int] eq = object (self : 'self) method value = x method equal (that : 'self) = self#value == that#value method not_equal (that : 'self) = self#value != that#value end
Canonicalizing the witnesses for finite types like bool
is very tedious:
let eq_bool : bool -> bool eq = let _t : bool eq = object method value = true method equal that = that#value method not_equal that = not (that#value) end and _f : bool eq = object method value = false method equal that = not (that#value) method not_equal that = that#value end in fun b -> if b then _t else _f
(essentially, a Church-encoded boolean). More importantly, this encoding is much more dynamic, making it rather remote with respect to the original, fully static Haskell semantics (this, incidentally, is a definite advantage of the functorial encoding).
In closing, it would be interesting to see how far this technique can be carried, especially to implement a numeric tower analogous to Haskell's, or maybe a better type class hierarchy to encode would be that of the Numeric Prelude. Another necessary avenue of investigation is measuring the abstraction penalty incurred by this encoding. In any case, I believe this is a neat showcase of the often-forgotten object-oriented features in OCaml.
2 comments:
"...in wide use"
Hmmm...stretching that a bit.
The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types.
No, neither of the referred techniques has phantom types as an essential ingredient. The complete example that you are referring to is about polytypic or datatype-generic programming and basically uses a dictionary passing approach. Perhaps you wanted to link to the printf page?
Post a Comment