2009-12-24

Functional Macramé

What is the meaning of an expression like

```let rec e = f … e …
```

known Haskell is as "knot-tying"? As Lloyd Allison explains:

A circular program creates a data structure whose computation depends upon itself or refers to itself. The technique is used to implement the classic data structures circular and doubly-linked lists, threaded trees and queues, in a functional programming language. These structures are normally thought to require updatable variables found in imperative languages…

But first, a bit of motivation. OCaml allows for the construction of cyclic values going through a data constructor. For instance, the following is legal:

```let rec ones = 1 :: ones in ones
```

as the expression is a `cons` cell. This can be extended naturally to mutually recursive values. For instance, a grammar can be built with:

```type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol list list

let arith_grammar =
let rec s  = Nonterminal [[add]]
and mul    = Nonterminal [[term]; [mul; Terminal "*"; term]]
and term   = Nonterminal [[number]; [Terminal "("; s; Terminal ")"]]
and number = Nonterminal [[digit]; [digit; number]]
and digit  = Nonterminal (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
in s
```

This is a perfectly valid recursive value in OCaml and it is a direct translation from the code in this article. The problem is that an implementation of the `Omega` monad for breadth-first enumeration of a list of lists requires laziness in an essential way. I'll use a stub (mock?) sequence type:

```module Seq = struct
type 'a cons = Nil | Cons of 'a * 'a t
and 'a t  = 'a cons Lazy.t
let rec map f q = lazy (match Lazy.force q with
| Nil -> Nil
| Cons (x, q) -> Cons (f x, map f q))
let rec of_list l = lazy (match l with
| [] -> Nil
| x :: xs -> Cons (x, of_list xs))
end
```

Unfortunately this doesn't work:

```type 'a symbol = Terminal of 'a | Nonterminal of 'a symbol Seq.t Seq.t

let rule ss = Nonterminal (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
let rec s  = rule [[add]]
and mul    = rule [[term]; [mul; Terminal "*"; term]]
and term   = rule [[number]; [Terminal "("; s; Terminal ")"]]
and number = rule [[digit]; [digit; number]]
and digit  = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
in s
```

The dreaded `"This kind of expression is not allowed as right-hand side of `let rec'"` error raises its ugly head: `rule` is a function, not a constructor, and OCaml rightly complains that it cannot lazily evaluate a bunch of strict definitions involving computation. In a lazy language, in contrast, the right-hand-side expression is not evaluated until it is needed. So, again, what is the meaning of an expression like

```let rec e = f … e …
```

When the value of e is required, the function f is called with an unevaluated e. If it doesn't use it, the result is well-defined; if it does, this results in a recursive call to f. In a strict language we must be explicit in the delaying and forcing of thunks:

```let f' … e … =
…
if e_is_needed then … Lazy.force e …
…
in
let rec e = lazy (f' … e …)
```

Alas, if f' itself is lazy, the expected code won't work:

```let f' … e … = lazy (
…
if e_is_needed then … Lazy.force e …
…
)
in
let rec e = f' … e …
```

because `lazy` works syntactically as a constructor in OCaml, again we're told that `"This kind of expression is not allowed as right-hand side of `let rec'"`. This means that we cannot use knot-tying with lazy abstract data types like infinite lists and streams without going explicitly through `lazy`.

Stepping back and taking a little distance from the problem at hand, let's revisit Allison example of circular lists. He gives essentially this example:

```let circ p f g x =
let rec c = build x
and build y = f y :: if p y then c else build (g y)
in c
```

(which is equivalent to an `unfold` followed by a knot-tying). This unsurprisingly doesn't work, but using `lazy` as outlined above does:

```let circ p f g x =
let rec c = lazy (build x)
and build y = Seq.Cons (f y, if p y then c else lazy (build (g y)))
in c
```

and the knot is tied by actually making a reference to the value as desired:

```let x = circ (fun _ -> true) id id 0 in let Seq.Cons (_, y) = Lazy.force x in x == y ;;
- : bool = true
```

A bit of lambda-lifting to make the binding and value recursion distinct and separate gives:

```let circ p f g x =
let rec build y c = Seq.Cons (f y, if p y then c else lazy (build (g y) c)) in
let rec c = lazy (build x c) in c
```

This hints at what appears to be a limitation of strict languages, namely that circular computations seem to require explicit binding management in an essential way, either imperative like in this code or by using a method like Dan Piponi's Löb functor. Applying this technique to our grammar makes for tedious work: all the mutually recursive references must be lambda-lifted, and the knot tied simultaneously through `lazy`:

```let rule ss = Lazy.force (Seq.map Seq.of_list (Seq.of_list ss))

let arith_grammar =
and make_mul    exp add mul trm num dig = rule [[trm]; [mul; Terminal "*"; trm]]
and make_term   exp add mul trm num dig = rule [[num]; [Terminal "("; exp; Terminal ")"]]
and make_number exp add mul trm num dig = rule [[dig]; [dig; num]]
and make_digit  exp add mul trm num dig = rule (List.map (fun d -> [Terminal (string_of_int d)]) (range 0 9))
in let
rec exp = Nonterminal (lazy (make_expr   exp add mul trm num dig))
and mul = Nonterminal (lazy (make_mul    exp add mul trm num dig))
and trm = Nonterminal (lazy (make_term   exp add mul trm num dig))
and num = Nonterminal (lazy (make_number exp add mul trm num dig))
and dig = Nonterminal (lazy (make_digit  exp add mul trm num dig)) in
exp
```

This can become unworkable pretty quickly, but is a solution! Note that the type of sequences forces me to use an explicit evaluation discipline: `rule` must return an evaluated expression, but the evaluation itself is delayed inside the `Nonterminal` constructor.

Allison's paper ends with an alternative for strict imperative languages like Pascal: using an explicit reference for the circular structure, something like this:

```let f' … e … =
…
if e_is_needed then … Ref.get e …
…
in
let e_ref = Ref.make () in
let e = f' … e_ref … in
Ref.set e_ref e
```

where `Ref.make` has type `unit -> 'a`, that is, it is magic. Unfortunately, Xavier Leroy himself stated that `Obj.magic` is not part of the OCaml language :-) (although a quick look shows many a would-be apprentice at work). And in this case it is true that no amount of magic wold make this work in the general case since `e_ref` must refer to an otherwise dummy value of the appropriate type which gets overwritten with the final result, in effect reserving memory of the necessary size. In specific cases, however, this can be made to work with a bit of care.

Merry Christmas!

2009-12-23

Gained in Translation

First of all, I'd like to apologize for the infrequent updates and the lightness of the last few entries. I seldom have time of late for anything but the quickest of finger exercises, but I wanted to put something on writing before the year is over. What better inspiration than one of Remco Niemeijer's terse solutions to the daily Programming Praxis. This week's asks for an implementation of Parnas's permuted indices, and Remco's solution is minimal enough. I translated his Haskell code to almost-verbatim OCaml, interjecting the necessary definitions to make the code read essencially the same way. For example, I needed to translate:

```rot xs = [(unwords a, unwords b) | (a, b) <- init \$
zip (inits xs) (tails xs), notElem (head b) stopList]
```

(n.b: this is Haskell). The function `inits` returns all the initial segments of a list, so that `inits "abc" = ["", "a", "ab", "abc"]`. Conversely, `tails` returns all the tails of a list, so that `tails "abc" = ["abc", "bc", "c", ""]`. The `zip` of both lists is the list of all the ways in which you can split a list, so that with our example:

```zip (inits "abc") (tails "abc") = [
("", "abc"),
("a", "bc"),
("ab", "c"),
("abc", "")
]
```

and the `init` of that is every element on that list except for the last one. After writing the necessary infrastructure, the equivalent solution was simple to write, but then I noticed that I could refactor it into something a bit terser. The first opportunity for compression I found was to use an ad-hoc function for splitting a list in every way possible except the last, in effect subsuming `init \$ zip (inits xs) (tails xs)` into a single recursive function:

```let rec split_all l = match l with
| []      -> []
| x :: xs -> ([], l) :: List.map (fun (hs, ts) -> x :: hs, ts) (split_all xs)
```

Classic of text processing tasks in Haskell is the use of functions converting text into lists and vice versa; this required writing some simple helper functions:

```let words = Str.split (Str.regexp " ")
and lines = Str.split (Str.regexp "\n")
and unwords = String.concat " "
```

The permuted index construction must filter a number of stop words:

```let stop_list = words "a an and by for if in is of on the to"
```

As Remco explains, the core function for generating a permuted index finds all the splittings of a given sentence, and uses the head as the context for the tail. The function he gives is a typical generation—filtering—reduction pipeline expressed as a list comprehension. I initally wrote the comprehension as a right fold (this is always possible), and in a second phase I rewrote that into a point-free function more directly expressing the reduction. For that I needed a number of combinators:

```let ( % ) f g x = f (g x)

let cross f (x, y) = (f x, f y)

let distrib f (x, y) (z, t) = (f x z, f y t)

let flip f x y = f y x
```

The composition operator `%` is an old friend of this blog. The combinator `cross` lifts a function over a pair, and `distrib` distributes a binary function over two pairs. The combinator `flip` swaps the arguments to a curried function. All of this is standard and it allowed me to write:

```let rot = List.map (cross unwords) % List.filter (not % flip List.mem stop_list % List.hd % snd) % split_all
```

This function takes a list of words, splits it every which way, throws away those pairs whose second component (the tail) begins with a stop word, and joins each component of the resulting pairs of words into sentence fragments. The function pretty-printing an index underwent a similar compression: instead of finding the longest fragment separately on each component, I did it in one pass over the list of pairs:

```let pp_index xs =
let l1, l2 = List.fold_right (distrib (max % String.length)) xs (0, 0) in
List.iter (fun (a, b) -> Printf.printf "%*s   %-*s\n" l1 a l2 b) xs
```

The function `max % String.length` composes on the first argument of the curried `max`, and thus has type `string → int → int`; in order to distribute over pairs and make the types come out right I needed to use a `right_fold` instead of a more natural (in OCaml) `left_fold`, but this is otherwise straightforward. The pretty printing of the index is exactly like in Remco's code, as both OCaml's and Haskell's `printf` implement the same formats. Putting everything together needs a sort on the second components; I use a (very inefficient) helper function implementing case-insensitive sort:

```let ci_compare a b = compare (String.lowercase a) (String.lowercase b)

let permute_index =
pp_index % List.sort (fun (_, a) (_, b) -> ci_compare a b) % List.concat % List.map (rot % words) % lines
```

The text is decomposed into lines, each line is further decomposed into words and an index is built for it, the fragmentary indexes are collated and sorted into the final result which is finally printed out. A test gives out the expected result:

```let () = permute_index "All's well that ends well.\nNature abhors a vacuum.\nEvery man has a price.\n"
Nature   abhors a vacuum.
All's well that ends well.
All's well that   ends well.
Every man has a price.
Every man   has a price.
Every   man has a price.
Nature abhors a vacuum.
Every man has a   price.
All's well   that ends well.
Nature abhors a   vacuum.
All's   well that ends well.
All's well that ends   well.
```

A point to keep in mind is that `permute_index` and especially `rot` would probably have been clearer written in a monadic style, as it emphasizes an "element-at-a-time" view of list processing as I've written before. The downside would have been the need to name every intermediate value being transformed:

```let rot xs =
split_all xs >>= fun (hs, ts) ->
guard (not (List.mem (List.hd ts) stop_list)) >>
return (unwords hs, unwords ts)
```

It seems that, in this sense, monadic beats recursion but point-free beats monadic for conciseness. As it is, the 30 lines comprising this code fit in one short page. Not bad.

2009-11-06

Reflecting on One-Liners

The delightful Futility Closet posted a simple puzzler about the next year after 1961 that reads the same upside down. It is quite easy to see that it will be 6009, and to convince oneself that indeed no smaller number exists a dozen of lines of code suffice.

The key to concision is to lift the syntax and semantics of monads into the problem. Since not every digit is itself a digit upon rotation by 180° (I admit to taking poetic license with the title), it is natural to work with failing computations, otherwise known as the option monad:

```let return x = Some x
let (>>=) m f = match m with None -> None | Some x -> f x
```

Of all the natural operations on monads known and loved by Haskell practitioners, I'll need just a bit of support:

```let fmap f m = m >>= fun x -> return (f x)

let lift2m f m n = m >>= fun x -> n >>= fun y -> return (f x y)

let sequence ms = List.fold_right (lift2m (fun x xs -> x :: xs)) ms (return [])
```

As the last bit of scaffolding, I need a monadized lookup function:

```let find l e = try Some (List.assoc e l) with Not_found -> None
```

These lines are a very well-known part of the standard library of Haskell, so I'm already six over the par. I'm looking for digits that read as digits upon a rotation of 180°. An association list maps those digits to their rotations:

```let rotations = [0, 0; 1, 1; 6, 9; 8, 8; 9, 6]
```

If these numbers were intended to be read on seven-segment displays, I could have added the `(2, 5)` and `(5, 2)` pairs. By repeatedly dividing by 10 I can get the list of decimal digits of a number:

```let to_digits = let rec go l n = if n = 0 then l else go (n mod 10 :: l) (n / 10) in go []
```

The opposite operation, building a number from the list of its decimal digits is an application of Horner's Rule:

```let of_digits l = List.fold_left (fun x y -> 10 * x + y) 0 l
```

Reflection is a compact and dense point-free one-liner:

```let reflect = fmap (of_digits % List.rev) % sequence % List.map (find rotations) % to_digits
```

For each digit I find its rotation, if it exists. The result is a list of digit computations, some of those possibly failed. I turn that into a list computation that succeeds only if all its components succeed with `sequence`. The desired result is built out of the reversed list of rotated digits. Now a number is symmetric if it can be read upside down as itself:

```let is_symmetric n = match reflect n with Some m -> n = m | _ -> false
```

The first year after 1961 that is symmetric is 6009, as expected:

```let year = List.hd \$ List.filter is_symmetric (range 1962 10_000)
```

That's it, twelve lines. Composition and `range` are left as an exercise for the reader.

2009-08-30

Fun with (Type) Class

I'd like to motivate OCaml's object-oriented features (the "O" in "OCaml") by showing an encoding of (first-order) Haskell type classes. This will require making use of almost everything in OCaml's object buffet: class types, virtual classes, inheritance of class types and of classes, double dispatch and coercion.

Haskell could be said to have the best type system among all languages in wide use at the moment, striking a delicate balance between power and practicality without compromising purity (neither conceptual nor functional). In the words of Oleg Kiselyov, it sports "attractive types". A great feature is the presence of type-indexed values via the type-class mechanism. In a nutshell, type-indexed values are functions from types to values. A classic example is overloaded operators, where for instance the same symbol `+` denotes addition over the integers, floating-point and rational numbers, depending on the type of its parameters. Haskell provides "type magic" to automatically infer the intended function (the value) based on the types of the arguments to which it is applied, achieving a form of type-safe "ad-hoc" polymorphism.

The ML family of languages doesn't have the machinery to attain such expressive power. There are, however, a number of ways to encode type-indexed families with an explicit parameter encoding the selector type. The oldest, and most complicated technique is that of Danvy (see also this complete example) which is not really equivalent to Haskell's type classes but a sophisticated application of phantom types. Another technique is using an explicit dictionary (Yaron Minsky has a worked example) from which to select the appropriate operation. Normally these dictionaries are records; the labels provide type-safe instantiation of the generic value family. The problem with records is that they are not extensible, so it is not immediate how to encode the equivalent of inheritance of type classes. OCaml's object-oriented sublanguage is especially well-suited for the task, allowing for a natural encoding that shows that the "class" in Haskell's "type classes" is more object-oriented than it seems at first sight.

I'll start with a bit of Haskell's standard prelude, as shown in the Report in a nice graph. I'll focus on the two most common nontrivial classes, `Eq` and `Ord`. The first is defined as:

```class  Eq a  where
(==), (/=)  ::  a -> a -> Bool

x /= y  = not (x == y)
x == y  = not (x /= y)
```

I encode a Haskell type class in OCaml with… (surprise!) a class type and associated functions:

```class type ['a] eq = object
method equal     : 'a -> 'a -> bool
method not_equal : 'a -> 'a -> bool
end
```

Since the type class has a type parameter a, my class type also has a type parameter `'a`. Each member of the type class corresponds to a (named) method and to a regular function taking the class type as a dictionary parameter:

```let (===) x y (eq : 'a #eq) = eq#equal x y
and (<=>) x y (eq : 'a #eq) = eq#not_equal x y
```

(I've opted to not shadow the regular operators in `Pervasives`). The weird order of the parameters (you'd expect the dictionary to come first) is such that I can write:

```    if x === y \$ eq_int then …
```

where `(\$)` is the application operator. Haskell provides default implementations for both members; as the Report notes:

This declaration gives default method declarations for both /= and ==, each being defined in terms of the other. If an instance declaration for Eq defines neither == nor /=, then both will loop.

I find that unsettling, so I'll just define one via a virtual class (not class type):

```class virtual ['a] eq_base = object (self)
method virtual equal : 'a -> 'a -> bool
method not_equal x y = not (self#equal x y)
end
```

In this case, `equal` is left virtual. It is important that the class be left virtual even if `equal` is defined, so that it cannot be instantiated. This means that it cannot be coerced to the class type `eq`, since it is illegal to shadow virtual members that were declared non-virtual. Now come the instances of the type class. In Haskell, a very simple instance is that for `unit` which is built-in (via `deriving`) but could be declared as:

```instance  Eq ()  where
() == () = true
```

To encode instances, I use an immediate object bound to an identifier:

```let eq_unit : unit eq = object
inherit [unit] eq_base
method equal     () () = true
method not_equal () () = false
end
```

Note the coercion to the corresponding monomorphic instance of the class type. I inherit from the default implementation in the class base, even if in this case I explicitly implement all methods for efficiency. The same encoding works for booleans too:

```let eq_bool : bool eq = object
inherit [bool] eq_base
method equal     = (==)
method not_equal = (!=)
end
```

OCaml has built-in ad-hoc polymorphic comparisons that I could use directly:

```class ['a] eq_poly : ['a] eq = object
inherit ['a] eq_base
method equal     = Pervasives.(=)
method not_equal = Pervasives.(<>)
end
```

With that I can have type classes for the primitive types:

```let eq_int    = (new eq_poly :> int    eq)
and eq_float  = (new eq_poly :> float  eq)
and eq_char   = (new eq_poly :> char   eq)
and eq_string = (new eq_poly :> string eq)
```

Here I instantiate objects of the `eq_poly` class coerced to the correct monomorphic instance of `eq` (recall that in OCaml coercion is always explicit). The type class instance for pairs requires type classes for each of its components. In Haskell, we'd have:

```instance  (Eq a, Eq b) => Eq ((,) a b)  where …
```

(not really, but let me gloss over that). Encoding that in OCaml also requires a witness for each type parameter as explicit arguments to the constructor:

```let eq_pair (l : 'a eq) (r : 'b eq) : ('a * 'b) eq = object (self)
inherit ['a * 'b] eq_base
method equal (p, q) (p', q') = l#equal p p' && r#equal q q'
end
```

An inefficiency of this encoding is that it creates a new object with every invocation of the witness function. For instance, in general `eq_pair eq_unit eq_bool != eq_pair eq_unit eq_bool` and so on for every witness. The same is needed to encode an equality witness for `option` types:

```let eq_option (o : 'a eq) : 'a option eq = object
inherit ['a option] eq_base
method equal x y = match x, y with
None  , None   -> true
| Some x, Some y -> o#equal x y
| _              -> false
end
```

And for `list`s:

```let eq_list (o : 'a eq) : 'a list eq = object
inherit ['a list] eq_base
method equal =
let rec equal x y = match x, y with
| [], [] -> true
| _ , []
| [], _  -> false
| x :: xs, y :: ys when o#equal x y -> equal xs ys
| _      -> false
in equal
end
```

(I use explicit closed recursion for efficiency). In all these cases, I encoded a derived instance via an explicit witness passed as a parameter. Derived type classes require genuine inheritance of class types, as seen in the encoding of Haskell's `Ord` type class:

```class  (Eq a) => Ord a  where
compare              :: a -> a -> Ordering
(<), (<=), (>), (>=) :: a -> a -> Bool
max, min             :: a -> a -> a
```

(I omit showing the default implementations). I use the SML `ordering` type for fidelity:

```type ordering = LT | EQ | GT

let of_comparison c = if c < 0 then LT else if c > 0 then GT else EQ
and to_comparison   = function LT -> -1 | EQ -> 0 | GT -> -1
```

The class type encoding the `Ord` type class is:

```class type ['a] ord = object
inherit ['a] eq
method compare       : 'a -> 'a -> ordering
method less_equal    : 'a -> 'a -> bool
method less_than     : 'a -> 'a -> bool
method greater_equal : 'a -> 'a -> bool
method greater_than  : 'a -> 'a -> bool
method max           : 'a -> 'a -> 'a
method min           : 'a -> 'a -> 'a
end
```

Again, operators take an explicit witness as a dictionary:

```let (<==) x y (ord : 'a #ord) = ord#less_equal    x y
and (<<<) x y (ord : 'a #ord) = ord#less_than     x y
and (>==) x y (ord : 'a #ord) = ord#greater_equal x y
and (>>>) x y (ord : 'a #ord) = ord#greater_than  x y
```

And I provide a virtual class with default implementations:

```class virtual ['a] ord_base = object (self)
inherit ['a] eq_base
method virtual compare : 'a -> 'a -> ordering
method equal         x y = match self#compare x y with EQ -> true  | _ -> false
method less_equal    x y = match self#compare x y with GT -> false | _ -> true
method less_than     x y = match self#compare x y with LT -> true  | _ -> false
method greater_equal x y = match self#compare x y with LT -> false | _ -> true
method greater_than  x y = match self#compare x y with GT -> true  | _ -> false
method max           x y = match self#compare x y with LT -> y | _ -> x
method min           x y = match self#compare x y with GT -> y | _ -> x
end
```

Note how the class type inheritance `ord :> eq` translates into class inheritance `ord_base :> eq_base`. The instances for `unit` and `bool` use inheritance and selective overriding for efficiency:

```let ord_unit : unit ord = object
inherit [unit] ord_base
method equal     = eq_unit#equal
method not_equal = eq_unit#not_equal
method compare () () = EQ
method max     () () = ()
method min     () () = ()
end

let ord_bool : bool ord = object
inherit [bool] ord_base
method equal     = eq_bool#equal
method not_equal = eq_bool#not_equal
method compare b b' = match b, b' with
| false, true -> LT
| true, false -> GT
| _           -> EQ
method max = (||)
method min = (&&)
end
```

In order to profit from the efficient implementation of `equal` and `not_equal` I explicitly delegate on the methods in the corresponding `eq` instances. Now as before I can use the ad-hoc polymorphic operators in `Pervasives` to speed writing the type classes for atomic types:

```class ['a] ord_poly : ['a] ord = object
inherit ['a] ord_base
method compare x y   = of_comparison (Pervasives.compare x y)
method less_equal    = Pervasives.(<=)
method less_than     = Pervasives.(< )
method greater_equal = Pervasives.(>=)
method greater_than  = Pervasives.(> )
method max           = Pervasives.max
method min           = Pervasives.min
end

let ord_int    = (new ord_poly :> int    ord)
and ord_float  = (new ord_poly :> float  ord)
and ord_char   = (new ord_poly :> char   ord)
and ord_string = (new ord_poly :> string ord)
```

(Note the use of the injection into `ordering`). The case for pairs requires appropriate witnesses as before:

```let ord_pair (l : 'a ord) (r : 'b ord) : ('a * 'b) ord = object
inherit ['a * 'b] ord_base
method equal = (eq_pair (l :> 'a eq) (r :> 'b eq))#equal
method compare (p, q) (p', q') =
match l#compare p p' with EQ -> r#compare q q' | c -> c
end
```

In this case, delegating `equal` requires passing the witnesses to `eq_pair` explicitly coerced to the `eq` class type. Finally, the classes for `option`s and `list`s are straightforward applications of the encoding:

```let ord_option (o : 'a ord) : 'a option ord = object
inherit ['a option] ord_base
method equal = (eq_option (o :> 'a eq))#equal
method compare x y = match x with
None   -> (match y with None -> EQ | Some _ -> LT)
| Some x -> (match y with None -> GT | Some y -> o#compare x y)
end

let ord_list (o : 'a ord) : 'a list ord = object
inherit ['a list] ord_base
method equal = (eq_list (o :> 'a eq))#equal
method compare =
let rec compare x y = match x, y with
| [], [] -> EQ
| _ , [] -> GT
| [], _  -> LT
| x :: xs, y :: ys -> match o#compare x y with
| EQ -> compare xs ys
| c  -> c
in compare
end
```

As it is clear, the encoding shows a direct correspondence between Haskell type classes and OCaml's classes and objects:

• type classes into class types with top-level functions dispatching on a witness object of that class type
• derived type classes into inheriting class types
• default implementations into virtual classes
• instances into objects inheriting from the default virtual class
• derived instances into objects having witnesses of the appropriate class type

In every case, the witness objects are used purely as closed vtables without state. It would be interesting to investigate which use cases would open using row polymorphism or data members. Also, note that there is no need to abstract over the object types, as each witness carries with it the intended implementation in a type-safe manner.

This technique is analogous to the one using functors to encode type classes, but then why have both in the same language?, in the words of Jacques Garrigue. One reason is that functors are not first-class values in OCaml but objects are, which makes using them as type witnesses much more lightweight. On the other hand, one limitation that functors don't have is that this encoding is limited to first-order type classes. In other words, it is not clear what would be required to implement Haskell type classes like:

```instance  (Monad m) => Functor m  where
fmap f x = bind x (return . f)
```

where m is a type constructor with sort `* → *` using the encoding presented here, whereas it is straightforward to concoct a (ML) functor:

```module MonadFunctor (M : MONAD) : FUNCTOR = struct
type 'a t = 'a M.t
let fmap f x = M.bind x (fun v -> M.return (f v))
end
```

Another limitation is the need to pass a witness object to the type-indexed functions. One would ideally want to write something like:

```let (===) (x : eq) = x#equal
```

where the value is its own witness, but obviously this would require that every value be wrapped into an object, requiring, for instance, code like the following:

```if (new eq_int 5) === (new eq_int x) then …
```

Moreover, in order to actually operate on the wrapped values, the classes would need an explicit accessor. OCaml supports double dispatch making possible forcing the types of x and y to be equal:

```class type ['a] eq = object ('self)
method value     : 'a
method equal     : 'self -> bool
method not_equal : 'self -> bool
end

class eq_int x : [int] eq = object (self : 'self)
method value                    = x
method equal     (that : 'self) = self#value == that#value
method not_equal (that : 'self) = self#value != that#value
end
```

Canonicalizing the witnesses for finite types like `bool` is very tedious:

```let eq_bool : bool -> bool eq =
let _t : bool eq = object
method value          = true
method equal     that =      that#value
method not_equal that = not (that#value)
end and _f : bool eq = object
method value          = false
method equal     that = not (that#value)
method not_equal that =      that#value
end in fun b -> if b then _t else _f
```

(essentially, a Church-encoded boolean). More importantly, this encoding is much more dynamic, making it rather remote with respect to the original, fully static Haskell semantics (this, incidentally, is a definite advantage of the functorial encoding).

In closing, it would be interesting to see how far this technique can be carried, especially to implement a numeric tower analogous to Haskell's, or maybe a better type class hierarchy to encode would be that of the Numeric Prelude. Another necessary avenue of investigation is measuring the abstraction penalty incurred by this encoding. In any case, I believe this is a neat showcase of the often-forgotten object-oriented features in OCaml.

2009-07-21

(Dis)Functional Bowling

It is saddening to see people invested so heavily in cargo-cult programming practices that they cannot see natural solutions to simple problems after having their thoughts twisted by object-orientation as an ends and TDD as a fighting technique. Uncle Bob is struggling (and failing) to approach functional programming in the natural way by what seems to be his biases and commercial interests. His Bowling Game Kata (warning, PPT; right-click to download) is to find the score of a 10-frame bowling game:

The game consists of 10 frames as shown above. In each frame the player has two opportunities to knock down 10 pins. The score for the frame is the total number of pins knocked down, plus bonuses for strikes and spares.

A spare is when the player knocks down all 10 pins in two tries. The bonus for that frame is the number of pins knocked down by the next roll. So in frame 3 above, the score is 10 (the total number knocked down) plus a bonus of 5 (the number of pins knocked down on the next roll.)

A strike is when the player knocks down all 10 pins on his first try. The bonus for that frame is the value of the next two balls rolled.

In the tenth frame a player who rolls a spare or strike is allowed to roll the extra balls to complete the frame. However no more than three balls can be rolled in tenth frame.

In order to sidestep the problem of validating games for length, I'll consider a bowling game as an infinite number of balls, a finite prefix of which is non-zero:

```let repeat x = let rec l = x :: l in l

let game strikes = strikes @ repeat 0
```

Scoring a frame according to the rules is a matter of pattern matching on the first few balls. According to the result, a complete frame is removed and the rest of the game returned:

```let score_frame = function
| 10 :: (n :: m :: _ as g) -> 10 + n + m, g
| n :: m :: (r :: _ as g) when n + m = 10 -> 10 + r, g
| n :: m :: g -> n + m, g
```

To score a game, I keep track of how many frames I'm into the game, and stop at the tenth:

```let rec score frame g =
if frame = 10 then 0 else
let n, g = score_frame g in
n + score (succ frame) g
```

The game begins at the zeroth-frame, obviously:

```# score 0 (game [1; 4; 4; 5; 6; 4; 5; 5; 10; 0; 1; 7; 3; 6; 4; 10; 2; 8; 6]) ;;
- : int = 133
```

10 minutes of thinking, and some looking around to realize that I had mistyped the scorecard. What TDD do you need, Uncle Bob?

2009-07-19

Reading Bonsai Code's solutions to Programming Praxis's weekly puzzles (which I can't recommend highly enough) makes me feel acutely aware of how verbose OCaml is, and how inadequate its standard library when compared to Haskell's. However, I've found that the latest puzzles yield concisely to a monadic style over the lists.

Breaking with my usual literate top-down presentation I'll concentrate on the code required to solve the puzzles and leave the obvious scaffolding to the reader. I'll still putt over the par, especially if scored against Remco's brutally terse solutions, but I hope that what is missing is straightforward to fill in. I've written about point-free style and its limitations in the face of OCaml's value restriction. In this case I'll use monadic style for the solutions as a way to show that procedural expression too has its place in functional programming.

This week's problems are drawn from the International Mathematical Olympiads and are very much in the spirit of Project Euler's problems, yielding nicely to brute-force search. The first:

Determine all three-digit numbers N having the property that N is divisible by 11, and N/11 is equal to the sum of the squares of the digits of N.

can be solved simply by:

```let imo_1960_01 =
range 1 99 >>= fun i ->
let n = 11 * i in
guard (sum % fmap square % digits \$ n = i) >>
return n
```

In this solution and the ones that follow I express the problem in monadic terms via `fmap`, `return`, `bind` (and the two synonyms `>>=` and `>>`) and `guard`. Here `%` is composition, and `range`, `sum`, `square` and `digits` are obvious. Equally pythy is the solution to the second problem:

Find the smallest natural number n which has the following properties:

1. Its decimal representation has 6 as the last digit.
2. If the last digit 6 is erased and placed in front of the remaining digits, the resulting number is four times as large as the original number n.

Since n = 10*k + 6, the condition is equivalent to asking that 4*(10*k + 6) = 6*10^b + k, where b = ν(k) the number of decimal digits in k. Simplifying, the problem is equivalent to finding the smallest integer k = 2/13*(10^b - 4) with exactly b digits. In code:

```let imo_1962_01 =
range 1 9 >>= fun b ->
let e = pow 10 b in
guard (e mod 13 = 4) >>
let k = (e - 4) / 13 * 2 in
guard (List.length % digits \$ k = b) >>
return (10 * k + 6)
```

The number is not too large, and a 31-bit version of `pow` is sufficient. The third problem will require more scaffolding:

Five students, A, B, C, D, E, took part in a contest. One prediction was that the contestants would finish in the order ABCDE. This prediction was very poor. In fact no contestant finished in the position predicted, and no two contestants predicted to finish consecutively actually did so. A second prediction had the contestants finishing in the order DAECB. This prediction was better. Exactly two of the contestants finished in the places predicted, and two disjoint pairs of students predicted to finish consecutively actually did so. Determine the order in which the contestants finished.

This is more of a word problem than a combinatorial one, and as the latter is not very straightforward and to brute force it I'll need a number of auxiliary functions. A way to list all permutations is first:

```let rec selections = function
| [] -> []
| x :: xs ->
(x, xs) :: List.fold_right (fun (y, ys) l ->
(y, x :: ys) :: l) (selections xs) []

let rec permutations = function
| ([] | [_]) as l -> [l]
| l ->
List.fold_right (fun (y, ys) ->
List.fold_right (fun zs l -> (y :: zs) :: l)
(permutations ys)) (selections l) []
```

The first condition asks for permutations having no fixed points, or derangements. I need a way to single derangements out:

```let count_fixpoints l p =
List.fold_left (fun s (x, y) ->
if x = y then succ s else s) 0 (List.combine l p)

let is_derangement l p = count_fixpoints l p = 0
```

Lastly, in order to filter consecutive positions, I need a way to generate them and filter them out:

```let intersect l m = List.filter (fun x -> List.mem x l) m

let rec pairup = function
| [] | [_] -> []
| x :: (y :: _ as xs) -> (x, y) :: pairup xs
```

The solution to the problem is a word-for-word translation of the problem's conditions:

```let imo_1963_01 =
let prediction  = ['D'; 'A'; 'E'; 'C'; 'B'] in
let contestants = List.sort compare prediction in
let all_pairs   = pairup contestants
and pred_pairs  = pairup prediction in
permutations contestants >>= fun p ->
guard (is_derangement  contestants p
&& count_fixpoints prediction  p = 2) >>
let pp = pairup p in
guard (List.length (intersect all_pairs pp) = 0) >>
guard (match intersect pred_pairs pp with
| [(x, y); (z, t)] -> y <> z && t <> x
| _ -> false) >>
return p
```

that is, the solution is to be found among all the `permutations` of the `contestants` which are `derangements` and have exactly two positions in common with the `prediction`. Of these candidates they must have no pair in common with the pairs in the sorted list of `contestants`, and has to have two disjoint pairs in common with the `prediction`.

Some would argue, I'm sure, that monadic code is not purely functional, or that it is too rigidly laid out by the sequential nature of monadic binding. I think that it is ideal to solve these word problems since I find that the solution closely follows the conditions laid out in the statement. All in all I've left out less than 40 lines of perfectly obvious support code, and gave solutions with 5, 7 and 30-odd lines. It was a fun exercise.

2009-06-17

The Tree Nursery

On my last post, Benedikt Grundmann asked if changing representations for trie nodes, in particular eliminating `'a option`-boxed references, would represent a speed-up or not. I started tinkering with a direct representation:

```type 'a t =
{ value : 'a option;
split : char;
lokid : 'a t;
eqkid : 'a t;
hikid : 'a t; }
```

(call this representation `trie1`). This forces me to use a sentinel node to represent empty trees:

```let rec empty = { value = None; split = '\255'; lokid = empty; eqkid = empty; hikid = empty; }

let is_empty t = t == empty
```

(the `split` value is entirely arbitrary and never relied upon). This recursive structure means that I cannot use pattern matches to discriminate and deconstruct nodes, and all comparisons must be strictly by reference equality on the carefully preserved unique sentinel node. This represented a rough 10% speedup on my original code.

Meanwhile, Mauricio Fernández tried his hand at it, and came with a tantalizing result:

in the lapse between the present post and the preceding one, I implemented ternary trees with different constructors for the nodes with and without a value. In my tests, construction is 50% faster, and lookups are between 20% and 80% faster. The code is much less elegant than the one presented here, because it moreover includes some special-casing to handle the `""` (empty string) key, which the above code chokes at.

(It's true that the empty string cannot be a key in both the original and the above implementation. I thought it obvious by the tradeoffs I listed in selecting the representation for the key, but truth is I never guarded the operations against it. It's a bug, not a feature). The numbers are too exciting to not try this approach. By carefully teasing apart each case, I arrived to the following data type:

```type 'a t = E
| L of 'a
| B of      char * 'a t * 'a t * 'a t
| K of 'a * char * 'a t * 'a t * 'a t
```

(call this representation `trie2`). The first constructor `E` represents an empty tree. The second constructor `L` represents a leaf with value `α`. The third constructor `B` represents a branching node without value. The fourth constructor `K` represents a marked, valued branching node. Functions over this type require eight matches in general, four each for the middle and four for the end of the key.

I'll present the benchmark results first, then the code. In every case I've run the benchmark three times in sequence from the command line and retained the last measurement. My test machine is an MSI U100 with an 1.6 GHz Intel Atom and 2 GB RAM running Mac OS X 10.5.7, and the code is compiled with `ocamlopt -g -inline 10 -w Aelz -pp camlp4o` version 3.11.0. Each run is comprised of three tests using the standard `Map` balanced-height tree, `trie1` and `trie2` ternary trees, with a `Gc.full_major` collection at the start of each. The test data sets are the 234936-word `/usr/share/dict/words` file in Mac OS X, and a 89546-word Spanish lexicon, with 3.91% overlap between both. The tests measure:

1. Insertion of all the English words into an empty tree via a `fold_left`
2. Retrieval of the list of all inserted words via a `S.fold` (with validation of length and sortedness outside the timing)
3. Retrieval of all the English words
4. Retrieval of all the Spanish words
5. Deletion of all the English words from the tree via a `fold_left` (with validation that the resulting tree is empty)

These are the results for the list of words in dictionary order:

Test of 89546-word set in dictionary order. In ops/s.
Data StructureInsertList AllFind AllFind Sp.Delete All
`Map`167,105.772,056,784.27254,101.92225,067.88802,020.62
`trie1`184,619.90373,794.66981,049.901,201,493.12273,188.70
`trie2`232,617.76468,132.42791,158.14984,052.05371,658.90

These are the results for the list of words in median order, that is the list with the median first followed by the left and right sublists both in median order:

Test of 89546-word set in median order. In ops/s.
Data StructureInsertList AllFind AllFind Sp.Delete All
`Map`168,873.282,100,585.38251,426.31224,190.06318,687.33
`trie1`270,556.35352,670.961,669,707.662,139,124.18707,988.31
`trie2`329,939.38462,198.781,385,399.361,692,227.60804,071.63

These are the combined results, normalized so that the corresponding operations on `Map` are 1:

Test of 89546-word set both in dictionary and in median order. Relative to `Map`
Data StructureInsertList AllFind AllFind Sp.Delete All
`trie1` (dict)1.104810.1817373.860855.338360.340626
`trie2` (dict)1.392040.2276043.113554.372250.463403
`trie1` (median)1.602130.1678926.640949.541572.22158
`trie2` (median)1.953770.2200335.510167.548182.52307

Some things are easy to see and to explain:

• The insertion code is faster for ternary trees than for the height-balanced tree, as there is no need to maintain expensive shape invariants
• The ternary trees highly favor lookups, especially by non-present keys
• Insertion in sorted order is pessimal for ternary trees. Even so, they remain very fast, as claimed by Bentley and Sedgewick
• The `fold` reconstructs the keys one character at a time by appending to the current prefix, which is quadratic on the prefix length. This accounts for the speed being a 20% of the corresponding `Map.fold`
• The shape of the ternary trees depend on the order of key insertion in a dramatic way
• The shape of height-balanced trees also depend heavily on the order of key insertion

Some other things are not so easy to understand. In particular, it is not easy to see why `trie2` is 20%–30% faster than `trie1` except for lookups where it is a 20% slower.

For comparison, here's `trie1`'s `lookup`:

```let lookup k t =
let n = String.length k in
if n == 0 then failwith "lookup" else
let rec go i t =
if is_empty t then None else
let c = k.[i] in
if t.split > c then go  i    t.lokid else
if t.split < c then go  i    t.hikid else
if i + 1   < n then go (i+1) t.eqkid else
t.value
in go 0 t
```

And here's `trie2`'s:

```let lookup k t =
let n = String.length k in
let rec go i = function
| E                             -> None
| L  v              when i == n -> Some v
| L  _                          -> None
| B (   _, _, _, _) when i == n -> None
| B (   c, l, q, h)             ->
let c' = k.[i] in
if  c' < c then go  i    l else
if  c' > c then go  i    h else
go (i+1) q
| K (v, _, _, _, _) when i == n -> Some v
| K (_, c, l, q, h)             ->
let c' = k.[i] in
if  c' < c then go  i    l else
if  c' > c then go  i    h else
go (i+1) q
in go 0 t
```

(It's not that ugly). Twice the number of lines, a slew of cases, it should be obvious why the latter is slower, right? Then compare and contrast: here's `trie1`'s `insert`:

```let add k v t =
let n = String.length k in
if n == 0 then failwith "add" else
let rec go i t =
if is_empty t then
if i+1 < n then { empty with split = k.[i]; eqkid = go (i+1) t }
else { empty with split = k.[i]; value = Some v     }
else
let c = k.[i] in
if t.split > c then { t with lokid = go  i    t.lokid } else
if t.split < c then { t with hikid = go  i    t.hikid } else
if i + 1   < n then { t with eqkid = go (i+1) t.eqkid } else
{ t with value = Some v           }
in go 0 t
```

Here's `trie2`'s

```let add k v t =
let n = String.length k in
let rec go i = function
| E                 when i == n -> L  v
| E                             -> B (   k.[i], E, go (i+1) E, E)
| L  _              when i == n -> L  v
| L  v                          -> K (v, k.[i], E, go (i+1) E, E)
| B (   c, l, q, h) when i == n -> K (v, c    , l, q         , h)
| B (   c, l, q, h)             ->
let c' = k.[i] in
if  c' < c then B (   c, go i l,          q,      h) else
if  c' > c then B (   c,      l,          q, go i h) else
B (   c,      l, go (i+1) q,      h)
| K (_, c, l, q, h) when i == n -> K (v, c    , l, q         , h)
| K (v, c, l, q, h)             ->
let c' = k.[i] in
if  c' < c then K (v, c, go i l,          q,      h) else
if  c' > c then K (v, c,      l,          q, go i h) else
K (v, c,      l, go (i+1) q,      h)
in go 0 t
```

(Now this is not as pretty). They are exactly in the same shape and relation one with another. Some things I've observed:

• OCaml is not as clever as it should be compiling or-patterns. Many of the branches in `trie2`'s functions could be coalesced into those, but the speed hit is very consistently noticeable
• Refining the constructors for each code path pays off in algorithmic ways: the normalization required in `remove` can be restricted to those branches statically known to have modified children, and the sharing can be made explicit:
```let remove k t =
let prune = function
| B (   _, E, E, E) -> E
| K (v, _, E, E, E) -> L v
| t                 -> t
in
let n = String.length k in
let rec go i t = match t with
| E                             -> t
| L  _              when i == n -> E
| L  _                          -> t
| B (   _, _, _, _) when i == n -> t
| B (   c, l, q, h)             ->
let c' = k.[i] in
if  c' < c then prune (B (   c, go i l,          q,      h)) else
if  c' > c then prune (B (   c,      l,          q, go i h)) else
prune (B (   c,      l, go (i+1) q,      h))
| K (_, c, l, q, h) when i == n -> B (c, l, q, h)
| K (v, c, l, q, h) ->
let c' = k.[i] in
if  c' < c then prune (K (v, c, go i l,          q,      h)) else
if  c' > c then prune (K (v, c,      l,          q, go i h)) else
prune (K (v, c,      l, go (i+1) q,      h))
in go 0 t
```

Furthermore, the duality between `add` and `remove` is easy to see: the former's base cases introduce `L` and `K` nodes, the latter turns them into `E` and `B` nodes, respectively.

In any case, I look forward to read Mauricio's analysis and code. All this goes to show that these ternary trees are excellent data structures to represent string maps, as claimed by Bentley and Sedgewick.

2009-06-15

Simple, Efficient Trie Maps

I became interested in implementing Bentley and Sedgewick's Ternary Trees by reading the challenge proposed on Programming Praxis (a great blog, by the way). I thought the algorithms would be a good fit for a purely functional data structure, and I am happy with the results. Here is my implementation.

I follow the `Map.S` signature, so that this can be a drop-in replacement for `string`-keyed maps, so that the interface (.mli file) is simply:

```include Map.S with type key = string
```

For the implementation I restrict the key space to `string`s:

```type key = string
```

Trees are optional references to nodes, and these are records containing the mapped value whenever this node represents a full key, the character suffix for the tree and the three branches:

```type 'a t = 'a node option
and 'a node =
{ value : 'a option;
split : char;
lokid : 'a t;
eqkid : 'a t;
hikid : 'a t; }
```

In this I follow closely the paper, except for a crucial detail: in it, the authors make use of the terminator inherent to every C string, the null byte `'\0'`. This forces a design decision:

• emulate the termination character. This has the distinct disadvantage of having to prohibit it from occurring in string keys
• use a `char option` as the key. This forces boxing the key into a constructed type and complicates the logic in all the comparisons, which has a negative impact in the inner loops
• use a `char list` or string as the key. This has the advantage that it can be adapted to compress the search paths, but it requires generating substrings of everything
• guard on the string length. This is the choice I took. It complicates the comparison logic somewhat, but it has the advantage of making the node representation more regular and compact, and allowing the full range of characters in string keys

An empty tree is represented by a `None` reference:

```let empty = None
```

Consequently, the test for an empty tree is a match on the `option`:

```let is_empty = function None -> true | Some _ -> false
```

This means that I'll be forced to normalize trees on deletion, of which more later. The workhorse function is `lookup`:

```let lookup k t =
let n = String.length k in
let rec go i = function
| None   -> None
| Some t ->
let cmp = Char.compare k.[i] t.split in
if cmp < 0 then go  i    t.lokid else
if cmp > 0 then go  i    t.hikid else
if i+1 < n then go (i+1) t.eqkid else
t.value
in go 0 t
```

It is as simple as it can be given the choices I outlined above. The crucial bit is that the presence or absence of the value is the flag that indicates if this node represents a present or absent key in the tree. The signature's retrieval and membership functions are simply wrappers around this:

```let find k t = match lookup k t with
| Some x -> x
| None   -> raise Not_found

let mem k t = match lookup k t with
| Some _ -> true
| None   -> false
```

Inserting a key-value pair in the tree is a bit more involved:

```let add k v t =
let n = String.length k in
let rec go i = function
| None when i+1 < n ->
Some { value = None  ; split = k.[i]; lokid = None; eqkid = go (i+1) None; hikid = None }
| None              ->
Some { value = Some v; split = k.[i]; lokid = None; eqkid =          None; hikid = None }
| Some t            ->
let cmp = Char.compare k.[i] t.split in
if cmp < 0 then Some { t with lokid = go  i    t.lokid } else
if cmp > 0 then Some { t with hikid = go  i    t.hikid } else
if i+1 < n then Some { t with eqkid = go (i+1) t.eqkid } else
Some { t with value = Some v }
in go 0 t
```

The case when the end of a branch is reached is inlined, so that a descending branch keyed on the key's suffix is built until reaching its last character, which stores the keyed value. Otherwise, the suffix is compared to the node key, and the appropriate child is built. Note that the recursively built subtree replaces the corresponding child in the node; this is what makes this data structure purely functional and persistent. Removing a mapping is similar, but complicated by the need to normalize the tree so that an empty tree is represented by `None`:

```let remove k t =
let prune = function
| { value = None; lokid = None; eqkid = None; hikid = None } -> None
| t -> Some t
in
let n = String.length k in
let rec go i = function
| None   -> None
| Some t ->
let cmp = Char.compare k.[i] t.split in
if cmp < 0 then prune { t with lokid = go  i    t.lokid } else
if cmp > 0 then prune { t with hikid = go  i    t.hikid } else
if i+1 < n then prune { t with eqkid = go (i+1) t.eqkid } else
prune { t with value = None             }
in go 0 t
```

In this case, `prune` works somewhat as a smart constructor for node references.

The big advantage of ternary search trees is that, since they are modeled on three-way quicksort, they provide ordered key storage with relative insensitivity to the order of insertions. The big disadvantage is that they don't have a canonical form for a given key set, that is, their shape depends on the order in which the keys are inserted. This means that they provide a cheap `fold` but no easy way to `compare` trees:

```let fold f t e =
let rec go prefix e = function
| None   -> e
| Some t ->
let e   = go prefix e t.lokid in
let key = prefix ^ String.make 1 t.split in
let e   = match t.value with None -> e | Some v -> f key v e in
let e   = go key    e t.eqkid in
go prefix e t.hikid
in go "" e t
```

Note that the key is built character by character but only on the equal kids. Note also that the resulting fold is guaranteed to be ordered on the keys by the sequence in which the recursion is carried out. The rest of the iteration functions in the signature are trivial wrappers:

```let iter f t = fold (fun k v () ->        f k v ) t ()
and map  f t = fold (fun k v    -> add k (f v  )) t empty
and mapi f t = fold (fun k v    -> add k (f k v)) t empty
```

As I indicated above, ternary search trees have no canonical form, so to compare two trees you have to somehow traverse them in synchronization. One easy but expensive way would be to turn both into key-value lists and compare those; this would require O(n min m) time and O(n + m) space. The best way would be to turn the fold into a continuation-passing iterator that lazily streams keys and traverse both in tandem. I leave it as an exercise to the reader:

```let compare _ _ _ = failwith "Not comparable"

let equal eq t t' =
0 == compare (fun v w -> if eq v w then 0 else 1) t t'
```

2009-06-14

Algorithm-conscious, cache-oblivious

No matter how credentialed a writer is, he is bound to make a mistake in print sooner or later. Raymond Chen's latest entry was for me oddly synchronous and oddly at variance with my own investigations on Bentley and Sedgewick's Ternary Trees. His apology of the hash table is technically sound, but only on the surface: there is not a single measurement to back up his assertions. Here is my data.

I pitted four different data structures intended as finite maps on strings: the standard `Hashtbl`, my implementation of Cuckoo Hash, the standard height-balanced `Map` and my own implementation of ternary trees as Trie Maps. I used Mac OS X 10.5's /usr/share/dict/words file against a Spanish lexicon of 90.000 entries. I tested both successful and unsuccessful searches by looking up every entry in the original file first, and every entry in the alternate file second. In both cases the files were read into memory and then processed with the algorithm benchmarks. I used a bit of generic programming to have the same benchmark code operate on the four data structures, and in each case the mapped domain was `unit`, treating the maps effectively as sets.

Here are the results for reading the big file and testing against the small file:

Test of 234936-word set against 89546 queries (3.91 % success rate). In ops/s.
`Hashtbl`333,993.49798,238.10621,221.64
Cuckoo133,952.08904,978.43952,688.49
`Map`167,080.10256,780.34226,653.87
Trie Map118,932.34770,795.15991,782.69

As you can see, `Hashtbl` is competitive inserting data, and Cuckoo Hash is smoking fast looking up data. This is not a very fair comparison, as these are imperative data structures, whereas both the standard `Map` and the Trie Map are purely functional data structures. Even so, the Trie Map is slightly (3.5%) slower in successful searches and substantially (60%) faster in unsuccessful searches than `Hashtbl`.

Unfortunately the champion apparent, the Cuckoo Hash, has a rather fragile disposition. The Spanish lexicon thoroughly broke its insertion routine.

Aside: I used to view with suspicion the lack of termination analysis for the insertion routine, and the very scarce code floating around implementing Cuckoo Hashes. After seeing here a practical case of non-termination for insertion with a dataset of less than 4,000 keys I consider this algorithm completely unsuitable for use. I have confidence that the universal hash I used is essentially random, as it passes standard tests for this data set, and that I implemented the published algorithm faithfully.

Even so, the Ternary Trie remains the victor in both successful and unsuccessful searches, being 60% faster than `Hashtbl`s (and almost 4 times as fast as height-balanced trees) in the latter:

Test of 89546-word set against 234936 queries (1.49 % success rate). In ops/s.
`Hashtbl`283,416.41882,628.51808,735.29
Cuckoo
`Map`191,458.29280,610.94263,735.97
Trie Map108,718.90889,076.241,300,517.50

What is the take-away lesson in this? First, do not let expert advice lull you into complacency. Take the time to experiment and analyze for yourself: it is not only intellectually challenging and rewarding, it might be a store of surprises (and of blogging material) for you.

2009-06-12

Hash 'n Mix

How good is the hash function in your language of choice, particularly for strings? Chances are, not very. I applied the classical statistical tests popularized by Knuth and I was very surprised to see how bad OCaml's hash function on strings is.

I applied the chi-square test on the distribution of hashes over /usr/share/dict/words, for almost 235.000 words. I binned the integers from the most significant bit down, taking successively more and more bits into account, and tested the resulting distribution against a uniform distribution. Here are the results:

χ² test of 234936 hashes with `Hashtbl.hash`
Bins (ν)Pr[χ² ≤ X²]
25250.00170261.0000000
46491.63326181.0000000
89310.98511941.0000000
1620682.96482451.0000000
3242563.45421731.0000000
6488265.16375521.0000000
12894121.13978271.0000000
256155277.57251341.0000000
512209363.27578571.0000000
1024317909.05410841.0000000
2048389422.34508121.0000000
4096448253.52838221.0000000
8192663614.91660711.0000000
16384887812.23196101.0000000
327681037118.94275891.0000000

The first column indicates how many bins were used, which is the degrees of freedom in the probability distribution (ν). The second column shows the computed statistic. The third column shows the probability with which a χ² deviate with ν degrees of freedom would be at least this value. Highlighted in red are the failed entries according to Knuth's criterion (probability in first or last percentile) and in yellow the suspect entries (probability in first five or last five percentiles).

Pretty dismal, isn't it? The Kolmogorov-Smirnov test is equally appalling: the probability that entries are this higher than a uniform deviate is p+ = 0.0016593, the probability that entries are this lower than a uniform deviate is p- = 1.0000000 both of which are extreme.

Postprocessing the hashes with a simple mixing function helps matter considerably. I use the Murmur hash specialized to a single 32-bit value:

```let murmur_hash_32 k h =
let m = 0x5bd1e995l in
let k = Int32.mul k m in
let k = Int32.logxor k (Int32.shift_right_logical k 24) in
let k = Int32.mul k m in
let h = Int32.mul h m in
let h = Int32.logxor h k in
let h = Int32.logxor h (Int32.shift_right_logical h 13) in
let h = Int32.mul h m in
let h = Int32.logxor h (Int32.shift_right_logical h 15) in
h

let universal_hash i h =
let r = murmur_hash_32 (Int32.of_int h) (Int32.of_int (succ i)) in
Int32.to_int (Int32.logand r 0x3fffffffl)
```

Since Murmur hash achieves full 32-bit avalanche, I can use it to construct a universal hash function. The results of applying this mixing function to `Hashtbl.hash` are much more encouraging:

χ² test of 234936 hashes with mixing function
bins (ν)Pr[χ² ≤ X²]
20.03602680.1505399
44.94074980.8238125
811.46436480.8803928
1617.10573090.6874168
3225.67698440.2633473
6455.60717810.2655752
128121.71784660.3843221
256250.50784890.4322981
512480.15554870.1675241
1024934.75669970.0230148
20481983.74910610.1614639
40963951.44563630.0550442
81928032.66115030.1074982
1638416249.48792860.2308936
3276832526.78857220.1741126

Much better, except maybe for the distribution on 1024 bins (the 10 most significant bits). The Kolmogorov-Smirnov test shows that the probability that entries are this higher than a uniform deviate is p+ = 0.7688490, the probability that entries are this lower than a uniform deviate is p- = 0.4623243 which are much more balanced.

Know your hash function. It should be random.

2009-06-05

Euler's Power Riffs

Ed Sandifer is an exegete of Euler's methods. In his latest "How Euler Did It", he shows the method through which he arrived at closed polynomials for the sums of n-th powers of the first x integers:

In general, Euler has us multiply the terms of Σxn by (n + 1)/(n + 2) x, (n + 1)/(n + 1) x, (n + 1)/n x, … (n + 1)/2 x and (n + 1)/1 x, respectively, and add the appropriate linear term and to get the formula for Σxn+1.

where Σxn is Euler's notation for the sum in question. The algorithm is very easily translated to a short OCaml function. First, I'll need a way to generate an integer range:

```let range i j =
let rec go l j =
if j < i then l else
go (j :: l) (pred j)
in go [] j
```

Simple, eager, tail-call; almost imperative, definitely dirty. With that, Sandifer's rendition of Euler's method is simply:

```open Num

let rec eusum n =
if n = 0 then [Int 0; Int 1] else
let l = eusum (pred n)
(* n/1 n/2 n/3 ... n/(n+1) *)
and k = List.map (fun d -> Int n // Int d) (range 1 (n + 1)) in
(* the lowest coefficient is that of x^1 and always 0 *)
let _ :: xs = List.map2 ( */ ) k l in
(* normalize the sum of coefficients to 1 *)
let x = List.fold_left ( -/ ) (Int 1) xs in
Int 0 :: x :: xs
```

I want to clarify a couple of subtleties in the implementation. First, I use lists of `Num`s representing dense polynomials, least-order coefficient first. The base case is that for the sum of x ones (with exponent n = 0), namely x itself. As the article points out, the constant term is always zero, because a sum of zero powers is zero.

Then, the general case is built out of Σxn (called `l`) and the terms with which to multiply it (called `k`, in reverse order to that reported by Sandifer, which follows Euler in putting the highest power first). Both polynomials are multiplied term by term with `map2`; however, the degree is not yet raised at this stage. Also, the lowest term is always zero by the previous observation and I disregard it with an irrefutable pattern match (I could've used `List.tl`, but I prefer not to, in general). A simple `fold_left` computes the coefficient α for the linear term, and the polynomial is completed with it and the zero linear term, thus raising the degree by one (one term discarded, two added).

That's it, seven lines of code. Regrettably, it is not of much use as it is:

```# eusum 4 ;;
- : Num.num list =
[Int 0; Ratio <abstr>; Int 0; Ratio <abstr>; Ratio <abstr>; Ratio <abstr>]
```

Bummer, abstract data types. It turns out, pretty-printing polynomials is surprisingly tedious to get right:

```let pp_print_poly pp l =
let pow = ref 0
and any = ref false in
Format.pp_open_box pp 1;
List.iter (fun k ->
if k <>/ Int 0 then begin
if !any then Format.pp_print_space pp ();
Format.pp_open_hbox pp ();
(* Print coefficient *)
if k </ Int 0 then Format.pp_print_char pp '-' else
if !any then Format.pp_print_char pp '+';
if !any then Format.pp_print_space pp ();
let n = Num.abs_num k in
if n <>/ Int 1 || !pow = 0 then Format.pp_print_string pp (Num.string_of_num n);
(* Print formal power *)
if n <>/ Int 1 && !pow > 0 then Format.pp_print_char pp '*';
if !pow > 0 then Format.pp_print_char pp 'x';
if !pow > 1 then Format.fprintf pp "^%d" !pow;
Format.pp_close_box pp ();
any := true
end;
incr pow
) l;
Format.pp_close_box pp ()
```

I follow some very general rules for writing pretty printers. First, always accept the pretty printing channel `pp` as a parameter for maximum generality (you can then use it with #install_printer in the top-level). Second, always box everything. In this case I use a vertical-or-horizontal box around the entire polynomial, with one space for continuation lines, and a horizontal-only box around monomials.

As it is evident here, the usual rules for polynomials make the function logic-heavy: the first term uses the sign of the coefficient, but subsequent monomials are added or subtracted with the operation sign offset with spaces, and the coefficient is taken in absolute value. If the coefficient or the variable are 1 they are not shown, unless it is the constant term. If both the coefficient and the power are shown, they are separated by the multiplication sign.

With this it is easy to recreate the table in page 2 of the paper:

```# let () =
Format.open_box 1;
List.iter (fun i ->
pp_print_poly Format.std_formatter (eusum i);
Format.print_newline ()) (range 0 8);
Format.close_box ()
;;
x
1/2*x + 1/2*x^2
1/6*x + 1/2*x^2 + 1/3*x^3
1/4*x^2 + 1/2*x^3 + 1/4*x^4
-1/30*x + 1/3*x^3 + 1/2*x^4 + 1/5*x^5
-1/12*x^2 + 5/12*x^4 + 1/2*x^5 + 1/6*x^6
1/42*x - 1/6*x^3 + 1/2*x^5 + 1/2*x^6 + 1/7*x^7
1/12*x^2 - 7/24*x^4 + 7/12*x^6 + 1/2*x^7 + 1/8*x^8
-1/30*x + 2/9*x^3 - 7/15*x^5 + 2/3*x^7 + 1/2*x^8 + 1/9*x^9
```

As noted by Euler, the coefficient of the linear term is a Bernoulli number. There are more efficient ways to compute them, but in a pinch this serves:

```let bernoulli n = let _ :: b :: _ = eusum n in b
```

Immutability Redux

My last post generated a ton of very interesting discussion. I'd like to summarize it here to bring the topic to closure.

In view of the excellent analysis performed by Mauricio Fernández , Ketan's comment:

At this point, you don't have the data to say what you said. You could just as easily say OCaml penalizes mutable data, or, less judgmentally, that it is less well-optimized for that use case.

is entirely justified. His first observation that [I have] more data to support that the performance disparity is due to the idiom rather than the programming language" was as misunderstood by me as it was insightful.

Mauricio did an in-depth analysis of the disparity and reported about it first on reddit, then on his own blog (which I cannot recommend highly enough). His follow-up is a bit more in full than his first analysis, but I'll paraphrase it anyway for completeness's sake. OCaml's garbage collector is of the generational variety. This means that it works under the generational hypothesis: the most recently created objects are also those most likely to become unreachable quickly. In order to exploit this fact, it divides the heap in at least two regions: the young space with newly created objects that mostly point to themselves and to older objects, and the old space that comprise objects that survived a number of collection phases. Any time an old object points to a young object it must be added to the remembered set, a list of roots that allow the collector to quickly decide what's reachable without scanning the entire old space.

Mauricio shows that my "synthetic business entity update benchmark" violates the generation hypothesis: since `float`s are in general boxed, updating a `mutable float` field generates a boxed cell with the new value. After enough allocations, the fixed, mutable record is promoted to the old space, from which it points to the young space containing the box. Every mutation then adds to the remembered set via the `caml_modify` write barrier. His post shows ways to ameliorate this penalty by exploiting the conditions under which OCaml unboxes floats, in particular by using records comprised entirely of `floats`, mutable or immutable (in this case, just one suffices). He followed up with an in-depth analysis of the write barrier and a simple optimization that gains up to 50% the speed of synthetic benchmarks, and an excellent 32% on this particular benchmark.

It is hoped that the OCaml team continues improving the performance of the garbage collector (indeed, some people are asking very vocally for different kinds of overhaul to it). Meanwhile, instead of recommending workarounds or speed-ups, I'd say "go with the idioms best supported by the language". In this case, indeed, it pays off to use immutable records in OCaml. This shouldn't be extrapolated to a general methodology applicable to other programming languages.

2009-05-26

Is Immutable Data Slow?

Is immutatble data always slower than mutable state? I was challenged in a discussion over at Reddit about this issue, and I decided to perform a little experiment to back up a rough estimate with some numbers. I was surprised and delighted to see that I could make my case very strong indeed by showing that no, immutable data can be twice as fast as mutable data in a modern functional language.

I concocted a trivial synthetic benchmark using simple records intended to emulate typical business entities:

```module P0 = struct
type person = { name: string; age: int; balance: float }
end

module P1 = struct
type person = { name: string; mutable age: int; mutable balance: float }
end

let test0 =
let module E = struct
open P0
let rec test p iters =
if iters == 0 then p.balance else
test { p with balance = p.balance +. 1. } (iters - 1)
end in E.test { P0.name="John Doe"; P0.age=19; P0.balance=100.}

let test1 =
let module E = struct
open P1
let test p iters =
for i = 1 to iters do
p.balance <- p.balance +. 1.
done
end in E.test { P1.name="John Doe"; P1.age=19; P1.balance=100.}
```

The module `P0` encapsulates an immutable business record; `test0` is the attendant test that uses functional record updating and recursion in a classicaly pure idiom. The module `P1`, on the other hand, with its corresponding test `test1` show the typical imperative use of a mutable record. The test and timing harness for both functions is simple enough:

```let test times f iters =
let best = ref infinity in
for i = 1 to times do
let t = Sys.time () in
let _ = f iters in
let t = Sys.time () -. t in
if !best > t then
best := t
done;
float iters /. !best

let () =
let iters = 1_000_000 in
(test 5 test0 iters);
(test 5 test1 iters)

```

I tested this code with the OCaml bytecode compiler on a 1.6/2.0 GHz Intel Atom computer with Windows XP, and compiled with both the bytecode and native compilers, without optimization, on a 2.2 GHz Core2 Duo MacBook Pro computer with Mac OS X 10.5.7. The results are as follows:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (Δ%)
1.6 GHz Atom/Win XP/Bytecode3,048,7803,558,71916.73
2.0 GHz Atom/Win XP/Bytecode4,000,0004,566,210 14.16
2.2 GHz Core2 Duo/Mac OS/Bytecode17,750,32420,418,58115.03
2.2 GHz Core2 Duo/Mac OS/Native128,882,58858,400,981-54.69

The speedup afforded by imperative mutation in bytecode is modest, in all cases around 15%. On the other hand, the penalty incurred by mutation under the native compiler is more than double.

Of course, my correspondent was right in that the situation with current object-oriented languages is very different. I translated the benchmark straightforwardly into Java, and the result is what "common sense" would let us expect: Java is heavily oriented to mutation, or rather, it penalizes immutability quite heavily. I tried both client and server VMs, under the Windows environment detailed above:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (×)
1.6 GHz Atom/Win XP/Client22,831,050107,526,8824.71
1.6 GHz Atom/Win XP/Server37,735,849322,580,6458.55

The difference is dramatic: mutation is between 4.7 and 8.5 times faster than functional update in Java, and even twice as fast than native OCaml code on a faster machine. Moral of the story: test your assumptions with experiments. The benchmark code is:

```public final class MutTest {
private static final class Person0 {
public final String name;
public final int age;
public final double balance;

public Person0(String name, int age, double balance) {
this.name = name;
this.age = age;
this.balance = balance;
}

public Person0 deposit(double amount) {
return new Person0(this.name, this.age, this.balance + amount);
}
}

private static final class Person1 {
public String name;
public int age;
public double balance;

public Person1(String name, int age, double balance) {
this.name = name;
this.age = age;
this.balance = balance;
}

public void deposit(double amount) {
this.balance += amount;
}
}

private interface Test {
double test(int iters);
}

private static final Test TEST0 = new Test() {
public double test(int iters) {
Person0 person = new Person0("John Doe", 19, 100.0);
for (int i = 0; i != iters; i++) {
person = person.deposit(1.0);
}
return person.balance;
}
};

private static final Test TEST1 = new Test() {
public double test(int iters) {
Person1 person = new Person1("John Doe", 19, 100.0);
for (int i = 0; i != iters; i++) {
person.deposit(1.0);
}
return person.balance;
}
};

private static double test(int times, Test test, int iters) {
long best = Long.MAX_VALUE;
double balance;
for (int i = 0; i != times; i++) {
long now = System.currentTimeMillis();
balance = test.test(iters);
now = System.currentTimeMillis() - now;
if (best > now)
best = now;
}
return (double) iters / ((double) best / 1000.0);
}

public static void main(String[] args) {
final int iters = 10000000;
System.out.printf("Immutable record: %f updates/s\n", test(5, TEST0, iters));
System.out.printf("Mutable record:  %f updates/s\n", test(5, TEST1, iters));
}
}
```

2009-05-14

A Polymorphic Question

Chris Conway asked a question that I replied, unwittingly, with a reference to a thread he started on the OCaml mailing list exactly two years ago: what is the difference between types `'a . int -> 'a vec -> 'a` and `int -> 'a vec -> 'a`?

Strictly speaking, the latter is, in isolation, ill formed: the type variable `'a` is unbounded! The convention is to implicitly universally quantify all unbound type variables in the type. This is the so-called prenex restriction:

The most popular of these is the let-polymorphism of ML […] which is sometimes called prenex polymorphism because it can be viewed as a fragment of System F in which type variables range only over quantifier-free types (monotypes) and in which quantified types (polytypes, or type schemes) are not allowed to appear on the left-hand side of arrows.

Pierce, Benjamin C. Types and Programming Languages, p. 356.

Suppose the latter type were a complete type abbreviation, to avoid dealing with the label:

```type 'a t = int -> 'a vec -> 'a
```

which as a TaPL-style type (a "tree" type) would be equivalent to a definition

```∀X.(T = int→Vec(X)→X)
```

You cannot directly use such a definition as is; the only thing you can do is to instantiate it with ("apply it" to) some type before expanding the definition, for instance:

```   (∀X.(T = int→Vec(X)→X))[X:=U]
→β
T = int→Vec(U)→U
```

In other words, there is no way to expand the right-hand side of T while keeping the variable X uninstantiated.

The first type, on the other hand, written as a type abbreviation would be:

```type t = 'a . int -> 'a vec -> 'a
```

which as a tree type would be equivalent to the definition:

```T = ∀X.int→Vec(X)→X
```

Now with this equation you can immediately replace T by its right-hand side, which would introduce a universal quantification as a subterm of the type being substituted into.

Seen in another way, and to bring it closer to the actual examples being discussed, labels are functions from records to the corresponding type. For instance:

```type 'a t = { index : int -> 'a vec -> 'a }
```

introduces the function `index`: ∀X.T→int→Vec(X)→X (note the position of the quantification). This is exactly the first definition I wrote, but with definitional equality replaced by the functional arrow. Now a polymorphic record type

```type t = { index: 'a . int -> 'a vec -> 'a }
```

introduces a different function `index`: T→∀X.(int→Vec(X)→X), which corresponds term by term to the second "tree" definition.

To see it in a whole different way, game semantics help in understanding the differences between linear formulas. In game theoretic terms, the type ∀X.T→int→Vec(X)→X corresponds to the game: "You get to choose and fix an X whatsoever. I give you a function. You give me a T, and I'll give you another function. You give me an `int` and I'll give you yet another function. You give me a `Vec` of the Xs you yourself chose, and I'll give you a certain X."

The second type T→∀X.(int→Vec(X)→X) corresponds to a different game: "You give me a T, and I give you back a thing, with the proviso that I reserve the right to choose and fix an X, which if given to you would result in a function." Only then you can give me back an `int` and so on. This is what makes X opaque to you.

I hope this helps.

2009-05-13

Okasaki's Random Access Lists

To close the chapter on Okasaki's Random Access Lists, I must record that there is an errata on the bottom of page 145 of his Purely Functional Data Structures. The argument to the recursive call to `update` it must read `Zero (update (i div 2, p, ps)) end`. It took me a while to figure it out, thinking that I had made a mistake.

For completeness, here's the translation for the chapter's pseudo-SML into functional, typesafe OCaml:

```module Vec : sig
type 'a t
val nil : 'a t
val cons : 'a -> 'a t -> 'a t
val head : 'a t -> 'a
val tail : 'a t -> 'a t
val length : 'a t -> int
val index : int -> 'a t -> 'a
val update : int -> 'a -> 'a t -> 'a t
end = struct
type 'a t = Nil | Zero of ('a * 'a) t | One  of 'a * ('a * 'a) t

let nil = Nil

type cons = { cons : 'a . 'a -> 'a t -> 'a t }
let cons =
let rec cons = { cons = fun x l -> match l with
| Nil         -> One (x, Nil)
| Zero    ps  -> One (x, ps)
| One (y, ps) -> Zero (cons.cons (x, y) ps)
} in cons.cons

type uncons = { uncons : 'a . 'a t -> 'a * 'a t }
let uncons =
let rec uncons = { uncons = function
| Nil          -> failwith "uncons"
| One (x, Nil) -> (x, Nil)
| One (x, ps ) -> (x, Zero ps)
| Zero    ps   ->
let ((x, y), ps) = uncons.uncons ps in (x, One (y, ps))
} in uncons.uncons

let head l = fst (uncons l)
and tail l = snd (uncons l)

type length = { length : 'a . 'a t -> int }
let length v =
let rec length = { length = function
| Nil         -> 0
| One (_, ps) -> 1 + length.length (Zero ps)
| Zero    ps  -> 2 * length.length ps
} in length.length v

type index = { index : 'a . int -> 'a t -> 'a }
let index n =
let rec index = { index = fun n l -> match l with
| Nil                    -> failwith "index"
| One (x, ps) when n = 0 -> x
| One (x, ps)            -> index.index (n - 1) (Zero ps)
| Zero    ps             ->
let (l, r) = index.index (n / 2) ps in
if n mod 2 = 0 then l else r
} in index.index n

type update = { update : 'a . int -> ('a -> 'a) -> 'a t -> 'a t }
let update n e =
let rec update = { update = fun n f l -> match l with
| Nil                    -> failwith "update"
| One (x, ps) when n = 0 -> One (f x, ps)
| One (x, ps)            -> cons x (update.update (pred n) f (Zero ps))
| Zero ps                ->
let g (x, y) = if n mod 2 = 0 then (f x, y) else (x, f y) in
Zero (update.update (n / 2) g ps)
} in update.update n (fun _ -> e)
end
```

Enjoy!