2008-10-31

Group-By Redux

I sometimes like to defunctionalize a recursive function to see what shadow it projects against the wall of the cave. OCaml being strict, the tail recursive, eager, "imperative" version of the function is dual to the "pure", functional one. I've written before about the chain from direct style to CPS to defunctionalization, so I'll not make this a tutorial but a worked-out example. Starting from the last version of `group ~by`:

```let rec group ~by =
let rec split e = function
| x :: xs when by e x ->
let g, ys = split e xs in
x :: g, ys
| l       -> [], l
in function
| []      -> []
| x :: xs ->
let g, ys = split x xs in
(x :: g) :: group ~by ys
```

Defunctionalization is a somewhat error-prone technique to apply by hand, but it is methodical and practice makes things relatively smooth. The first thing I like to do is to convert the function to A-normal form:

```let rec group ~by l =
let rec split e l = match l with
| x :: xs when by e x ->
let g, ys = split e xs in
let g' = x :: g in
g', ys
| l       -> [], l
in match l with
| []      -> []
| x :: xs ->
let g, ys = split x xs in
let g' = x :: g in
let gs = group ~by ys in
g' :: gs
```

Note how each function call is explicitely `let`-bound. Also, I've moved the parameter l so that it is explicitely named and it doesn't interfere with the next steps. Then, every recursive call receives the rest of the computation as an explicit continuation, using the introduced bindings:

```let rec group ~by l k =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (fun (g, ys) -> k (x :: g, ys))
| l       -> k ([], l)
in match l with
| []      -> k []
| x :: xs ->
split x xs (fun (g, ys) -> group ~by ys (fun gs -> k ((x :: g) :: gs)))
```

The CPS conversion of `split` is correct, but that of `group` went too far since the recursive call is buried in the initial continuation for `split`. This is why I mean by error-prone; I don't always get it right on the first attempt. Indeed, the call to `split` is irrelevant to the recursive call to `group`, and should be kept in its place. Also, I'll make `group` a helper so that its continuation is kept encapsulated:

```let group ~by l =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (fun (g, ys) -> k (x :: g, ys))
| l       -> k ([], l)
in
let rec group l k = match l with
| []      -> k []
| x :: xs ->
let g, ys = split x xs (fun x -> x) in
group ys (fun gs -> k ((x :: g) :: gs))
in group l (fun x -> x)
```

Now defunctionalization involves reifying every continuation as an explicit data structure. I have two functions, so I'll need two types. Each has an initial identity continuation. Then `split` invokes its continuation with x of polymorphic type α free. Also, `group` invokes its continuation with `x :: g` free of polymorphic type `α list`. This dictates my types:

```type 'a split_cont =
| SInit
| SSplit of 'a * 'a split_cont
and 'a group_cont =
| GInit
| GGroup of 'a list * 'a group_cont
```

Each continuation is now changed into a worker function that simulates invoking it with the supplied free values; a pair `(g, ys)` for `split`'s, a list gs for that of `group`:

```let rec split_apply (g, ys) = function
| SInit         -> (g, ys)
| SSplit (x, k) -> split_apply (x :: g, ys) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k
```

Now the continuations are replaced in the bodies of the functions by explicit constructors:

```let group ~by l =
let rec split e l k = match l with
| x :: xs when by e x ->
split e xs (SSplit (x, k))
| l       -> split_apply ([], l) k
in
let rec group l k = match l with
| []      -> group_apply [] k
| x :: xs ->
let g, ys = split x xs SInit in
group ys (GGroup (x :: g, k))
in group l GInit
```

Everything works, as it's easy to verify:

```# group ~by:(fun x y -> y - x < 3) (iota 13);;
- : int list list = [[0; 1; 2]; [3; 4; 5]; [6; 7; 8]; [9; 10; 11]; [12]]
```

Here's where the magic occurs: note that in `split_apply`, the second member ys of the pair is passed around unchanged; this means that I can pull it out of the continuation argument and return it directly from `split`:

```let rec split_apply g = function
| SInit         -> g
| SSplit (x, k) -> split_apply (x :: g) k
and group_apply gs = function
| GInit         -> gs
| GGroup (g, k) -> group_apply (g :: gs) k
```

Now both functions are identical modulo renaming, with `α group_cont` = `α list split_cont`. And `α split_cont` is isomorphic to `α list`, with `SInit``[]`, and `SSplit``::` or `cons`. Under this isomorphism, both functions `split_apply` and `group_apply` are `List.rev_append` in disguise! Applying the isomorphism to `split` and `group`, with the proviso that `List.rev_append l []` = `List.rev l`:

```let group ~by l =
let rec split e l g = match l with
| x :: xs when by e x ->
split e xs (x :: g)
| l       -> List.rev g, l
in
let rec group l gs = match l with
| []      -> List.rev gs
| x :: xs ->
let g, ys = split x xs [] in
group ys ((x :: g) :: gs)
in group l []
```

And this is essentially the first version of `group ~by` I've written before.

2008-10-29

Extreme Recursion

For a data processing application, I needed a way to group data records together according to some criteria. This is the "reduce" phase of the map-reduce transformation, or the group-by phase of a select-project-group query.

My specific problem called for a way to group records in an input list into output sublists by testing them against the first such record considered. More specifically, the records were timestamped, and I had to group them into one-hour segments. If the leading record was timestamped on, say, 12:23:15PM October 29th, 2008, it would be the first in a group including records timestamped up to 1:23:14PM October 29th, 2008.

There's a venerable technique, the control break with advanced reading to solve this problem. I reasoned that I needed to keep track of the current group, and of the list of as-yet-unexamined records. Using accumulating parameters, the following wrote itself:

```let group ~by = function
| []      -> []
| x :: xs ->
let rec scan g gs e = function
| []      -> List.rev ((e :: List.rev g)  ::  gs)
| x :: xs ->
if by e x
then scan (x :: g) gs e xs
else scan [] ((e :: List.rev g)  ::  gs) x xs
in scan [] [] x xs
```

The grouping function is by, and an empty list contains no groups. Otherwise, the first element x in the list would be the group leader e in the "advanced read", used to find how far that group would extend with a call to `scan`. This function keeps track of the current group g and the list of found groups gs, and examines the input list so far. If it's empty, it puts the group leader e at the head of the current group, and outputs the result as the last group of the output list. Since the accumulating parameters are built head-first, this requires a couple of reversals. Otherwise, the current record x is tested to see if it belongs in the same group as the leader e; if so, it is added to the current group; otherwise, the current group is closed and a new one is started with that record as the new group leader. In any case, recursion is in tail position, which betrays that this algorithm is imperative and sequential in nature.

All in all, correct by design:

```# group ~by:(fun e x -> x - e < 3) [1;2;3;4;5;6;7;8;9;10;11;12];;
- : int list list = [[1; 2; 3]; [4; 5; 6]; [7; 8; 9]; [10; 11; 12]]
```

and deeply unsatisfactory. What would a Haskeller do? That is, what would be a non-operational, divide-and-conquer truly recursive solution look like? Well, I reasoned, given a record e, split the remaining records into the group led by it and the remainder, and output the group and the result of grouping the remainder by its group leader. After some back-and-forth refactoring and a couple of tries, I came up with:

```let group ~by =
let rec split e l = match l with
| []      -> [], []
| x :: xs ->
if not (by e x) then [], l else
let g, ys = split e xs in
x :: g, ys
in
let rec scan e l =
let g, ys = split e l in
(e :: g) :: match ys with
| []      -> []
| x :: xs -> scan x xs
in function [] -> [] | x :: xs -> scan x xs
```

I actually sketched `scan` first: I supposed the function `split` was given as I described above, recurred on the remainder, and tacked the found group on the result. Refactoring the common `e :: g` that I originally wrote on both arms of the `match ys with…` resulted in the rather odd-looking function above. Then, writing `split` was relatively easy: an empty list gives an empty group and remainder, but a list with a record must be tested for membership on the current group. If not a member, the group is closed and the remainder includes that record; otherwise, split the rest and tack the record at the beginning of the found group.

In truth, I reversed that conditional on a later rewrite, because the arms of the `if` looked really unbalanced. I didn't think I had improved matters, however: first, OCaml, not being lazy, doesn't really benefit from using right recursion; second, this code is both longer and —to my old-fashioned, structured-programming brain— more roundabout.

But a bit of refactoring never hurts: note that the top-level match (the very last one) and that inside `scan` are exactly the same. In this case, factoring out that code involved a recursive call made directly to `group`; this made `scan` a non-recursive helper that I manually inlined, thus:

```let rec group ~by =
let rec split e l = match l with
| []      -> [], []
| x :: xs ->
if not (by e x) then [], l else
let g, ys = split e xs in
x :: g, ys
in function
| []      -> []
| x :: xs ->
let g, ys = split x xs in
(x :: g) :: group ~by ys
```

A guard pattern makes things a bit more functional, even to the point where I can merge two patterns into one:

```let rec group ~by =
let rec split e = function
| x :: xs when by e x ->
let g, ys = split e xs in
x :: g, ys
| l       -> [], l
in function
| []      -> []
| x :: xs ->
let g, ys = split x xs in
(x :: g) :: group ~by ys
```

Which is exactly right, and what I should've come up with in the first place! For comparison, here's again the first function, with a bit of clean-up done to it:

```let group ~by =
let snoc x xs = x :: List.rev xs in
let rec scan g gs e = function
| []      -> snoc e g  ::  gs
| x :: xs ->
if by e x
then scan (x :: g) gs e xs
else scan [] (snoc e g  ::  gs) x xs
in function
| []      -> []
| x :: xs -> List.rev (scan [] [] x xs)
```

A clear mongrel of a function. Refactoring functional code is not complete, I think, until the code is "just right", which sometimes is never. Hence the need to step back and rethink the whole point to the function in question; there shouldn't be a single ugly function in a program. The problem with this is that whereas I couldn't explain what I mean by "ugly" or "pretty" function, I know when I've reached aesthetic closure; and yet I'd like to know, what is the "process" by which great Haskell programmers write their code?

2008-10-15

Top Drawing

I wrote the last time about simple ways to "just get something onto the screen", and I ended by exploring a simple Top Draw program. I hinted at the ease with which Top Draw allows compositing a complex "painting", taking more or less by fiat the examples included with the program. I since took advantage of the basic rosette drawing subroutine to make a wallpaper that I could actually install:

In my opinion there are two advantages to using Top Draw:

1. It abstracts the compositing paradigm with a Photoshop-like `Layer` object,
2. It allows to interleave procedural drawing with arbitrary Core Image filters,

thus leveraging the built-in capabilities of Mac OS X Quartz subsystem.

I first refactored my script so that the drawing code is independent of the global transform being applied:

```
function wheel(m, n) {
if (m < 1 || n < 5) return;
var color = new Color();
color.alpha = 0.3;
var a = Math.PI / n;
var k = Math.cos(a), s = Math.sin(a);
var t = (k + s) / (k - s);
var d = k + s * t;
var r = Math.pow(k - s, m);
for (var e = 0; e != m; e++) {
color.saturation = 0.5*(1 + e/m); // 0.5 -- 1.0
color.brightness = 0.5*(2 - e/m); // 1.0 -- 0.5
for (var i = 0; i != n; i++) {
var b = (2*i + e) * a;
var x = r * Math.cos(b), y = r * Math.sin(b);
color.hue = i / n;
desktop.fillStyle = color;
desktop.beginPath();
desktop.moveTo(x, y);
desktop.lineTo(d * (k * x - s * y), d * (k * y + s * x));
desktop.lineTo(t * x, t * y);
desktop.lineTo(d * (k * x + s * y), d * (k * y - s * x));
desktop.lineTo(x, y);
desktop.fill();
}
r /= k - s;
}
}
```

Gone are centers and offsets in each parameter to the drawing commands, as gone is the global scaling of the drawing. Then, I use random number generators to place the rosettes all over the screen:

```desktop.fillLayer(new Color(0));
var db = desktop.bounds;

var xrnd = new Randomizer(0, db.width );
var yrnd = new Randomizer(0, db.height);
```

In order to give some depth to the final result, I'll blur each drawing a bit so that newer rosettes are sharper than older ones:

```var f = new Filter("CIGaussianBlur");
```

I'll draw a number of rosettes, four in this case, in random places:

```for (var i = 1; i <= 4; i++) {
var cx = xrnd.intValue;
var cy = yrnd.intValue;
```

I make sure that the rosette fits the screen with the given center:

```  var sz = Math.max(cx, cy, db.width - cx, db.height - cy);
```

In order to isolate each drawing from the modifications to the global transform, I save the drawing context prior to changing scale and center:

```  desktop.save();
desktop.translate(db.x + cx, db.y + cy);
desktop.scale(sz, sz);
```

I then draw the rosette and restore the context. I could easily give random parameters to the drawing routine, but I find that by having uniform shapes the rosettes nicely stack one over another:

```  wheel(8, 10);
desktop.restore();
```

Finally, I apply the prepared filter to the resulting image:

```  desktop.applyFilter(f);
}
```

As you can see, it's really easy to get good results with remarkably little code. Keep hitting Command-R until you find a result you like, and install it on your desktop.

2008-10-13

Procedural Drawing

Pretty pictures, again. There are now a number of "toys" with which to "recreate" the experience of turning on the old computer (C-64, Spectrum, Apple II or what had you) and being greeted with a BASIC prompt. I've written before about how SDL can give you a simple, direct (!) buffer to which to write pixels; the problem it has is that it does not know anything about drawing any object more complicated than a pixel. I'd like to explore some alternatives.

I was inspired by this image and wanted to replicate it. I happen to have Mathematica at hand, and sometimes it's my fall-back choice. After some fudging, I came up with:

```wheel[m_Integer?Positive, n_Integer?Positive] :=
With[{a = Pi/n},
With[{ka = Cos[a], sa = Sin[a]},
With[{ta = (ka + sa)/(ka - sa)},
With[{d = ka + sa*ta},
Graphics[
Table[
With[{r = 1/(ka - sa)^e},
Table[
With[{k = Cos[a (2 i + e)], s = Sin[a (2 i + e)]},
{Hue[i/n, 0.5*(1 + e/m), 0.5*(2 - e/m)],
Polygon[
r {{k, s},
d {k*ka - s*sa, s*ka + k*sa},
ta {k, s},
d {k*ka + s*sa, s*ka - k*sa},
{k, s}}]}],
{i, 0,  n - 1}]],
{e, 0, m - 1}]]]]]]
```

I won't delve into the semantics of this function, except to say that it is pure (it returns a `Graphics` object), and that I tend to use immutable bindings introduced by `With` as much as possible, but this not always affords me readable code. An invocation of `wheel[12, 16]` generates the following:

Pretty! I was satisfied with this for maybe twelve hours. I thought that it could be nice if it could be animated, or something. Doing this in OCaml was out of the question; I have no intention of installing Cairo and a host of dependencies and bindings just to be able to. Using Processing would have been an alternative; I find it, however, crude and unresponsive. Flash would have been ideal, as I'm impressed with Krazydad's wizardry. This would have the unfortunate prerequisite of me having to learn to program Flash in the first place, which is something I have not in my plans for the foreseeable future. I could have used Java, I could…

I remembered I had downloaded Top Draw. I don't think it has animation capabilities; it sits, I think, squarely in the category of "toys". It has, however, a strong graphics model, with filters and compositing inherited from Quartz. Its documentation is succint but informative, and it's Javascript, so I would feel at home. It turns out that Mathematica's retained model means that I could forget about scaling and layout; with Top Draw, those considerations must be taken into account:

```var bounds = desktop.bounds;
var cx = bounds.x + bounds.width  / 2;
var cy = bounds.y + bounds.height / 2;
```

I had a choice between showing most of the graphic or filling most of the screen; it's a matter of a `min` over a `max`:

```var sz = Math.max(bounds.width, bounds.height) / 2;
```

Again, the drawing should be generated with a simple call:

```desktop.fillLayer(new Color(0));
wheel(12, 16);
```

The function `wheel` is where the fun begins (pun not intended). It takes two parameters, the number m of layers and the number n of rhombs to place around the circle. It starts by validating the inputs:

```function wheel(m, n) {
if (m < 1 || n < 5) return;
```

Each rhomb has a color determined by its position on the graphic. Each would be placed two n-ths of the way around the circle, or α = π / n. Some trigonometry shows that the diagonal of the rhomb is t = (1 + tan α)/(1 - tan α), and d is the distance to the center of the rhomb. As you can see, the rhombs grow exponentially with a factor of 1/(cos α - sin α); in order to make the entire drawing fit the prescribed size I must calculate the starting radius r accordingly:

```  var color = new Color();
var a = Math.PI / n;
var k = Math.cos(a), s = Math.sin(a);
var t = (k + s) / (k - s);
var d = k + s * t;
var r = sz * Math.pow(k - s, m);
```

Each tier of rhombs e has a particular saturation and brightness computed as a fraction of the total:

```  for (var e = 0; e != m; e++) {
color.saturation = 0.5*(1 + e/m); // 0.5 -- 1.0
color.brightness = 0.5*(2 - e/m); // 1.0 -- 0.5
```

The rhombs spiral out of the center, since each tier is started at an angle of α from the previous one. This makes the new rhombs interlock between a pair of preexisting rhombs at the current point x, y:

```    for (var i = 0; i != n; i++) {
var b = (2*i + e) * a;
var x = r * Math.cos(b), y = r * Math.sin(b);
```

(As you might have noticed, all angles are relative to the semicircle; that is, they are actually half-angles). The hue is relative to the position around the circle:

```      color.hue = i / n;
desktop.fillStyle = color;
desktop.beginPath();
```

Two of the rhomb's four corners are aligned with the current point; the two others are at an angle of α at either side. Instead of computing six sines and cosines, I use the formulas for the sum and difference of angles:

```      desktop.moveTo(cx + x, cy + y);
desktop.lineTo(cx + d * (k * x - s * y), cy + d * (k * y + s * x));
desktop.lineTo(cx + t * x, cy + t * y);
desktop.lineTo(cx + d * (k * x + s * y), cy + d * (k * y - s * x));
desktop.lineTo(cx + x, cy + y);
desktop.fill();
}
```

As I've indicated above, the drawing is done relative to the center of the desktop. And that's it. It only remains to scale the radius for the next tier:

```    r /= k - s;
}
}
```

Now it is clear that is important that n > 4 so that 0 < cos α - sin α < 1. The result of this program is essentially identical to the previous one:

With the added bonus that now I can start playing with filters and tweaks in real-time until I'm satisfied, or tired, or both. Let the games begin!