## 2008-10-31

### Group-By Redux

I sometimes like to defunctionalize a recursive function to see what shadow it projects against the wall of the cave. OCaml being strict, the tail recursive, eager, "imperative" version of the function is dual to the "pure", functional one. I've written before about the chain from direct style to CPS to defunctionalization, so I'll not make this a tutorial but a worked-out example. Starting from the last version of `group ~by`:

```let rec group ~by =
let rec split e = function
| x :: xs when by e x ->
let g, ys = split e xs in
x :: g, ys
| l       -> [], l
in function
| []      -> []
| x :: xs ->
let g, ys = split x xs in
(x :: g) :: group ~by ys
```

Defunctionalization is a somewhat error-prone technique to apply by hand, but it is methodical and practice makes things relatively smooth. The first thing I like to do is to convert the function to A-normal form:

```let rec group ~by l =
let rec split e l = match l with
| x :: xs when by e x ->
let g, ys = split e xs in
let g' = x :: g in
g', ys
| l       -> [], l
in match l with
| []      -> []
| x :: xs ->
let g, ys = split x xs in
let g' = x :: g in
let gs = group ~by ys in
g' :: gs
```

Note how each function call is explicitely `let`-bound. Also, I've moved the parameter l so that it is explicitely named and it doesn't interfere with the next steps. Then, every recursive call receives the rest of the computation as an explicit continuation, using the introduced bindings:

```let rec group ~by l k =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (fun (g, ys) -> k (x :: g, ys))
| l       -> k ([], l)
in match l with
| []      -> k []
| x :: xs ->
split x xs (fun (g, ys) -> group ~by ys (fun gs -> k ((x :: g) :: gs)))
```

The CPS conversion of `split` is correct, but that of `group` went too far since the recursive call is buried in the initial continuation for `split`. This is why I mean by error-prone; I don't always get it right on the first attempt. Indeed, the call to `split` is irrelevant to the recursive call to `group`, and should be kept in its place. Also, I'll make `group` a helper so that its continuation is kept encapsulated:

```let group ~by l =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (fun (g, ys) -> k (x :: g, ys))
| l       -> k ([], l)
in
let rec group l k = match l with
| []      -> k []
| x :: xs ->
let g, ys = split x xs (fun x -> x) in
group ys (fun gs -> k ((x :: g) :: gs))
in group l (fun x -> x)
```

Now defunctionalization involves reifying every continuation as an explicit data structure. I have two functions, so I'll need two types. Each has an initial identity continuation. Then `split` invokes its continuation with x of polymorphic type α free. Also, `group` invokes its continuation with `x :: g` free of polymorphic type `α list`. This dictates my types:

```type 'a split_cont =
| SInit
| SSplit of 'a * 'a split_cont
and 'a group_cont =
| GInit
| GGroup of 'a list * 'a group_cont
```

Each continuation is now changed into a worker function that simulates invoking it with the supplied free values; a pair `(g, ys)` for `split`'s, a list gs for that of `group`:

```let rec split_apply (g, ys) = function
| SInit         -> (g, ys)
| SSplit (x, k) -> split_apply (x :: g, ys) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k
```

Now the continuations are replaced in the bodies of the functions by explicit constructors:

```let group ~by l =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (SSplit (x, k))
| l       -> split_apply ([], l) k
in
let rec group l k = match l with
| []      -> group_apply [] k
| x :: xs ->
let g, ys = split x xs SInit in
group ys (GGroup (x :: g, k))
in group l GInit
```

Everything works, as it's easy to verify:

```# group ~by:(fun x y -> y - x < 3) (iota 13);;
- : int list list = [[0; 1; 2]; [3; 4; 5]; [6; 7; 8]; [9; 10; 11]; [12]]
```

Here's where the magic occurs: note that in `split_apply`, the second member ys of the pair is passed around unchanged; this means that I can pull it out of the continuation argument and return it directly from `split`:

```let rec split_apply g = function
| SInit         -> g
| SSplit (x, k) -> split_apply (x :: g) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k
```

Now both functions are identical modulo renaming, with `α group_cont` = `α list split_cont`. And `α split_cont` is isomorphic to `α list`, with `SInit``[]`, and `SSplit``::` or `cons`. Under this isomorphism, both functions `split_apply` and `group_apply` are `List.rev_append` in disguise! Applying the isomorphism to `split` and `group`, with the proviso that `List.rev_append l []` = `List.rev l`:

```let group ~by l =
let rec split e l g = match l with
| x :: xs when by e x ->
split e xs (x :: g)
| l       -> List.rev g, l
in
let rec group l gs = match l with
| []      -> List.rev gs
| x :: xs ->
let g, ys = split x xs [] in
group ys ((x :: g) :: gs)
in group l []
```

And this is essentially the first version of `group ~by` I've written before.