2008-10-31

Group-By Redux

I sometimes like to defunctionalize a recursive function to see what shadow it projects against the wall of the cave. OCaml being strict, the tail recursive, eager, "imperative" version of the function is dual to the "pure", functional one. I've written before about the chain from direct style to CPS to defunctionalization, so I'll not make this a tutorial but a worked-out example. Starting from the last version of group ~by:

let rec group ~by =
  let rec split e = function
  | x :: xs when by e x ->
    let g, ys = split e xs in
    x :: g, ys
  | l       -> [], l
  in function
  | []      -> []
  | x :: xs ->
    let g, ys = split x xs in
    (x :: g) :: group ~by ys

Defunctionalization is a somewhat error-prone technique to apply by hand, but it is methodical and practice makes things relatively smooth. The first thing I like to do is to convert the function to A-normal form:

let rec group ~by l =
  let rec split e l = match l with
  | x :: xs when by e x ->
    let g, ys = split e xs in
    let g' = x :: g in
    g', ys
  | l       -> [], l
  in match l with
  | []      -> []
  | x :: xs ->
    let g, ys = split x xs in
    let g' = x :: g in
    let gs = group ~by ys in
    g' :: gs

Note how each function call is explicitely let-bound. Also, I've moved the parameter l so that it is explicitely named and it doesn't interfere with the next steps. Then, every recursive call receives the rest of the computation as an explicit continuation, using the introduced bindings:

let rec group ~by l k =
  let rec split e l k = match l with
  | x :: xs when by e x ->
    split e xs (fun (g, ys) -> k (x :: g, ys))
  | l       -> k ([], l)
  in match l with
  | []      -> k []
  | x :: xs ->
    split x xs (fun (g, ys) -> group ~by ys (fun gs -> k ((x :: g) :: gs)))

The CPS conversion of split is correct, but that of group went too far since the recursive call is buried in the initial continuation for split. This is why I mean by error-prone; I don't always get it right on the first attempt. Indeed, the call to split is irrelevant to the recursive call to group, and should be kept in its place. Also, I'll make group a helper so that its continuation is kept encapsulated:

let group ~by l =
  let rec split e l k = match l with
  | x :: xs when by e x ->
    split e xs (fun (g, ys) -> k (x :: g, ys))
  | l       -> k ([], l)
  in
  let rec group l k = match l with
  | []      -> k []
  | x :: xs ->
    let g, ys = split x xs (fun x -> x) in
    group ys (fun gs -> k ((x :: g) :: gs))
  in group l (fun x -> x)

Now defunctionalization involves reifying every continuation as an explicit data structure. I have two functions, so I'll need two types. Each has an initial identity continuation. Then split invokes its continuation with x of polymorphic type α free. Also, group invokes its continuation with x :: g free of polymorphic type α list. This dictates my types:

type 'a split_cont =
| SInit
| SSplit of 'a * 'a split_cont
and 'a group_cont =
| GInit
| GGroup of 'a list * 'a group_cont

Each continuation is now changed into a worker function that simulates invoking it with the supplied free values; a pair (g, ys) for split's, a list gs for that of group:

let rec split_apply (g, ys) = function
| SInit         -> (g, ys)
| SSplit (x, k) -> split_apply (x :: g, ys) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k

Now the continuations are replaced in the bodies of the functions by explicit constructors:

let group ~by l =
  let rec split e l k = match l with
  | x :: xs when by e x ->
    split e xs (SSplit (x, k))
  | l       -> split_apply ([], l) k
  in
  let rec group l k = match l with
  | []      -> group_apply [] k
  | x :: xs ->
    let g, ys = split x xs SInit in
    group ys (GGroup (x :: g, k))
  in group l GInit

Everything works, as it's easy to verify:

# group ~by:(fun x y -> y - x < 3) (iota 13);;
- : int list list = [[0; 1; 2]; [3; 4; 5]; [6; 7; 8]; [9; 10; 11]; [12]]

Here's where the magic occurs: note that in split_apply, the second member ys of the pair is passed around unchanged; this means that I can pull it out of the continuation argument and return it directly from split:

let rec split_apply g = function
| SInit         -> g
| SSplit (x, k) -> split_apply (x :: g) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k

Now both functions are identical modulo renaming, with α group_cont = α list split_cont. And α split_cont is isomorphic to α list, with SInit[], and SSplit:: or cons. Under this isomorphism, both functions split_apply and group_apply are List.rev_append in disguise! Applying the isomorphism to split and group, with the proviso that List.rev_append l [] = List.rev l:

let group ~by l =
  let rec split e l g = match l with
  | x :: xs when by e x ->
    split e xs (x :: g)
  | l       -> List.rev g, l
  in
  let rec group l gs = match l with
  | []      -> List.rev gs
  | x :: xs ->
    let g, ys = split x xs [] in
    group ys ((x :: g) :: gs)
  in group l []

And this is essentially the first version of group ~by I've written before.

No comments: