I've committed a number of errors and inaccuracies in my last post that warrant clarification. The most egregious and urgent is that the definition I gave for length
is utterly wrong: the type encodes the length from the inside out, so that its "most significant bit" is the inner constructor. The recursion must take the form of a right fold, not of a left fold as I wrote. The correct function is then:
type length = { length : 'a . 'a vec -> int } let length v = let rec length = { length = function Nil -> 0 | One (_, ps) -> 1 + length.length (Zero ps) | Zero ps -> 2 * length.length ps } in length.length v
Second, the option -rectypes is not necessary in OCaml 3.11.0 in order to use non-uniform recursion with this technique. The language permits non-uniform type definitions just fine. But, as Okasaki remarks, "[a]lthough Standard ML allows the definition of non-uniform recursive datatypes, the type system disallows most of the interesting functions on such types." (Okasaki, Chris. Purely Functional Data Structures, p. 143). What -rectype does is to disable the so-called occurs check in the typechecking algorithm that forces an error if a type is found to be defined in terms of itself when reconstructing the type of an expression via unification. This is why the types resulting in the compilation of a function using non-uniform recursion have the form E[α] as α
, for a type expression E.
Third, I spoke of the type of a non-uniform parameter as being existential, which is technically true in this particular case but not the usual characterization of such a type. The way in which the type t→∀α.E[α]
is more commonly described is as rank-2:
A type is said to be of rank 2 if no path from its root to a ∀ quantifier passes to the left of 2 or more arrows, when the type is drawn as a tree. For example,
(∀X.X→X)→Nat
is of rank 2, as areNat→Nat
andNat→(∀X.X→X)→Nat→Nat
, but((∀X.X→X)→Nat)→Nat
is not.(Pierce, Benjamin C. Types and Programming Languages, p. 359)
A type definition type α t = …
is a function from α to the declared type, so it counts as an arrow. Furthermore, there is an encoding of existential types in terms of rank-2 types that for this application suffices. This is the source of my confusion. Abstract signatures implement existential types in OCaml, which is why I said that this technique can be applied by means of a module. Also, OCaml allows rank-2 polymorphism in exactly two places: in record type declarations and in class declarations. So I missed a third way to encode non-uniform recursion in OCaml, namely via objects.
In a comment to the last post, bluestorm shows examples of the three encodings, using local module definitions to tidy up the global namespace of types and record labels. I want to compare the efficiency of the three approaches. The following function serves as a simple timing loop:
let test lap f e = let now = Sys.time () in let rec go cnt = let dt = Sys.time () -. now in if dt > lap then dt /. float cnt else (ignore (f e); go (succ cnt)) in go 0
(the toplevel in my machine has an overhead of 4.9µs to execute the empty loop). Note that the result of the function is thrown away; fortunately, OCaml is a strict language. The test would measure the time it takes for different versions of index
to find an element. This will require some test data. The simplest way to build a long list is with range
:
let range m n = let rec go l i = if i < m then l else go (cons i l) (pred i) in go Nil n
Now everything works as expected:
# length (range 1 500_000) ;; - : int = 500000
The first version of index
uses records to encode the rank-2 type:
let index1 n v = let module Index = struct type index = { index : 'a . int -> 'a vec -> 'a } let rec index = { index = fun n l -> match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> index.index (n - 1) (Zero ps) | Zero ps -> let (l, r) = index.index (n / 2) ps in if n mod 2 = 0 then l else r } end in Index.index.Index.index n v
(Malkovich Malkovich Malkovich). This is what I wrote in the last post, but now using a local module to encapsulate the type definitions, as per bluestorm's suggestion. The second version uses an object to encode the rank-2 type:
let index2 n v = let index = object (self) method index : 'a . int -> 'a vec -> 'a = fun n l -> match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> self#index (n - 1) (Zero ps) | Zero ps -> let (l, r) = self#index (n / 2) ps in if n mod 2 = 0 then l else r end in index#index n v
Objects in OCaml are always self-recursive. The third version uses a recursive module to encode the non-uniform recursion by forcing the type of the second argument to remain generic:
let index3 n v = let module M = struct module rec Index : sig val index : int -> 'a vec -> 'a end = struct let index n l = match l with Nil -> failwith "index" | One (x, ps) when n = 0 -> x | One (x, ps) -> Index.index (n - 1) (Zero ps) | Zero ps -> let (l, r) = Index.index (n / 2) ps in if n mod 2 = 0 then l else r end end in M.Index.index n v
(alas, local modules cannot be recursive). Now, in order to eliminate interference by the garbage collector, I predeclare my test data:
let data = range 1 1_000_000 ;;
I test each function for two seconds:
# test 2. (index1 999_999) data ;; - : float = 1.45106870782848256e-05 # test 2. (index2 999_999) data ;; - : float = 1.60240840297084448e-05 # test 2. (index3 999_999) data ;; - : float = 1.91896990107750189e-05
The encoding using an object is 10% slower than the one using a record, which is to be expected as the open recursion is performed through a method table. On the other hand, the encoding using a recursive module is almost 25% slower than the one using a record, which is completely unexpected as I would believe the call to be statically resolved. Note also that there is no need to use a local module if the code is in a module bound by an abstract signature. All in all, this shows that these lists are extremely fast: It takes less than 15µs to find the last element in the list.
5 comments:
Thanks for these posts. Rank-2/existential types in OCaml are sorely under-documented and I've been curious about them for a long time.
Maybe I'm being dense, but could you explain why "type index = { index : 'a . int -> 'a vec -> 'a }" has rank-2 but the plain function type "int -> 'a vec -> 'a" doesn't?
Thanks. That was really interesting posts about these not so much documented nested datatypes.
@Chris: the snippet I quoted from TaPL is not very clear, or is it? Wikipedia doesn't help much either, but it's a starting point for defining terminology.
It is an abuse of language to speak of records as "rank-2 polymorphic". What is polymorphic (with or without rank restriction) are the types of expressions and values. For instance, a function taking or returning an index would have a type where one of its components is quantified. For instance, a function f : int -> index would violate the prenex restriction in that it would have a universally quantified variable in a subterm (the one corresponding to index if you were to expand the definition), but would be rank-2 polymorphic.
Properly speaking, records and objects in OCaml are polymorphic, period, and can appear anywhere in a type. The resulting type doesn't necessarily follow the restrictions for rank-2 polymorphism, so they could be used to encode unbounded rank polymorphism.
As to why polymorphic types can be used to encode existentials, this post is a good overview of the logic behind the equivalence.
Matias: Note that I started that thread on the OCaml list! And here we are, two years later, and I'm still confused! :S
I think I understand Pierce's definition and the examples he gives. What I'm missing is what the tree representation for a record type should be, and how the quantifier inside a record type is different from the (implicit) quantifier on regular polymorphic types. I.e., what is the difference between "'a . int -> 'a vec -> 'a" and just "int -> 'a vec -> 'a" (which would be the type of a naive, uniformly polymorphic implementation of "index")?
@Chris: I can't believe how moronic the comment filter is. I'll post my reply to you as a new entry, as it broke when posted as a comment.
Post a Comment