## 2009-03-07

### Simple Future

A future is a computation that encapsulates the notion of asynchronously computing a value. Let me deconstruct the buzzwords:

• By "computation" I mean a mechanism that, eventually, reduces to or contains a value
• By "value" I mean a primitive object that cannot be reduced further, or rather a computation that is trivial in the sense that it returns itself when run
• By "asynchronous" I mean that the context of the computation doesn't have to wait for it to complete before continuing

By applying Moggi's insight, you can view futures as a monad `'a future` encapsulating values of type `'a`. There are two important operations distinguishing futures from any other monad: the first attempts to "redeem" or unwrap the encapsulated value, blocking if its computation didn't yet finish:

```val redeem : 'a future -> 'a
```

The second constructs a future from a function by running it asynchronously in a separate thread:

```val delay : ('a -> 'b) -> ('a -> 'b future)
```

(I use extra parentheses to emphasize the functorial aspects of computation). It is important that futures be composable, in the sense that there must exist an operation that combines a list of futures into a single future:

```val select : 'a future list -> 'a t
```

with the strong condition that the returned future is the first in the list to complete, so that it can be redeemed without waiting. This operation is sufficiently powerful to construct an efficient polling operation on a future (that is, testing in constant time whether a future is immediately redeemable or not) as a derived operation. Of course, a very inefficient selection operation can also be built based on polling, so that the two primitives are semantically inter-definable.

Given than futures form a monad, they come equipped with the usual operations:

```val unit : 'a -> 'a future
val bind : ('a -> 'b future) -> ('a future -> 'b future)
val fmap : ('a -> 'b) -> ('a future -> 'b future)
```

(If the type of `bind` seems puzzling, it's that I prefer working with Kleisli triples. Again, I've parenthesized the declarations in order to highlight the categorial aspects of the operations on futures). The key point is that these operations must "run in the future" and not block the calling thread. Then, `poll` can be written as:

```let poll f = redeem (select [
fmap (fun _ -> true) f;
unit false
])
```

The first element is a future that replaces the result of running `f` with `true`. The second future immediately returns `false`. If the first is already completed, `select` would return its `true` value. Since futures are immutable, `f` is still available for redeeming. If, on the other hand, the first is not yet completed, `select` would return the second future, by the strong guarantee of deterministic (left-to-right) parallelism, and would return `false` on the spot.

```module type MONAD = sig
type 'a t
val unit : 'a -> 'a t
val bind : ('a -> 'b t) -> ('a t -> 'b t)
val fmap : ('a -> 'b) -> ('a t -> 'b t)
end
```

Of course `fmap` is derivable in a monad because every monad is a (categorial) functor, but it is simple to include it in the implementation, and it's often more efficient to use a specialized version of it. Now, the futures themselves:

```module type FUTURE = sig
val redeem : 'a t -> 'a
val delay  : ('a -> 'b) -> ('a -> 'b t)
val select : 'a t list -> 'a t
end
```

To derive `poll` I can use a functor:

```module Future (F : FUTURE) = struct
include F
let poll f = redeem (select [
fmap (fun _ -> true) f;
unit false
])
end
```

With this specification, the minimal semantics for futures is to perform completely synchronous computations; a valid, although not very interesting starting point.

```module NullFuture = Future(struct
type 'a t = 'a
let unit x = x
let bind f x = f x
let fmap f x = f x
let redeem x = x
let delay f x = f x
let select = function [] -> failwith "select" | x :: _ -> x
end)
```

For a truly multi-threaded experience, I must call in the services of the `Thread` module. The problem with OCaml's threads is twofold. First, threads don't return a value, but just run a function unattended until completion, discarding the result. This is remedied easily enough by providing the thread with a reference in which to deliver its result. The other, more serious problem is that there's no built-in way to wait on more than one thread.

Both drawbacks can be overcome by associating the references with a condition variable and waiting for the first thread that signals it, in effect creating a synchronizing slot. I follow mostly Bartosz Milewsky's description of `MVar`s, but I depart of his implementation in that slots are more restricted. First, the interface is simple enough:

```module Slot : sig
type 'a t
val create : unit -> 'a t
val put : 'a -> 'a t -> unit
val get : 'a t -> 'a
end = struct
```

The idea is that `get` waits for a value to be present, but `put` has an effect only if the slot is empty. To wait on a value, I use a `Condition.t` that must always be paired with a `Mutex.t`:

```  type 'a t = { m : Mutex.t; c : Condition.t; mutable ref : 'a option }
```

Creating a slot is simple enough:

```  let create () = { m = Mutex.create (); c = Condition.create (); ref = None }
```

In order to ensure that all accesses to a slot are synchronous, and that the lock is properly released even in the event of failure, I use a wrapper:

```  let locked f s =
Mutex.lock s.m;
try let x = f s in Mutex.unlock s.m; x
with e -> Mutex.unlock s.m; raise e
```

Putting a value in the slot succeeds only if the slot is empty; otherwise, it has no effect:

```  let put x =
locked (fun s ->
if s.ref = None then begin
s.ref <- Some x; Condition.signal s.c
end)
```

Getting from an empty slot blocks until another thread puts a value into it; there is no wait involved when it is full:

```  let get s =
let rec go s = match s.ref with
| None   -> Condition.wait s.c s.m; go s
| Some x -> x
in locked go s
end
```

The net effect is that a slot becomes immutable once set. This is key to ensure that `select` has fair semantics and to avoid race conditions.

Now futures must wait on the slot to be filled with the result of the computation, which must proceed in a separate thread if it is to be truly asynchronous:

```module AsyncFuture = Future(struct
type 'a t = { t : Thread.t; s : 'a Slot.t }
```

Redeeming a future is simply waiting on its slot to become full:

```  let redeem m = Slot.get m.s
```

In order to `delay` a computation by turning it into a future, spawn a thread that runs it and deposits its value in the corresponding slot:

```  let delay f x =
let s = Slot.create () in
{ t = Thread.create (fun x -> Slot.put (f x) s) x; s = s }
```

That's it. Now `select`ing from a set of futures is a bit more involved. Of course, it makes no sense to treat an empty list of futures as anything else but an error. Otherwise, I turn every future into a delayed computation that runs it to completion and signals termination by setting a slot with a reference to itself. The semantics for slots ensure that the first future to complete will be the one that will be present as the slot's value:

```  let select = function
| [] -> failwith "select"
| l  ->
let s = Slot.create () in
List.iter (fun m ->
ignore (Thread.create (fun m -> ignore (redeem m); Slot.put m s) m))
l;
Slot.get s
```

As a slight optimization I directly create `Thread`s instead of `delay`ing the thunks, because the results are completely uninteresting.

Finally, the monadic semantics is very simple to provide with these primitives:

```  let unit x =
let s = Slot.create () in Slot.put x s;
{ t = Thread.self; s = s }

let bind f m = delay (fun m -> redeem (f (redeem m))) m
let fmap f m = delay (fun m -> f (redeem m)) m
end)
```

I had originally written `unit` as `delay id`; but since the value is already present and it barely needs a slot, much less a thread, I placehold the future with the current thread. If the slot were to be waited upon, this could lead to deadlock; as it is already full, there is no danger of the current thread becoming blocked because of it.

This implementation is low-level enough that can be used as a guide for implementing composable futures in other languages. In fact, Apocalisp has a series of articles presenting this very design applied to Java. Of course, this is far more basic than that, in that there is no provision for a strategy for running threads, which are in fact paired one-to-one with thunks. Building a work queue, a thread pool to run it and layering a future library on that is left as an exercise for the reader.

## 2009-03-05

### Small Sorts

I have seen in more than one occasion sorting problems of small sizes being solved with general-purpose sorting routines. The case should be analogous to FFTs of small size, where the cases for 1-point, 2-point and 4-point transforms are special, straight-line code segments; but I haven't found the equivalent special-casing of small-size sorts. One reason for that could be that the structure of FFTs is very regular, and specializing the recursion for small base cases is relatively straightforward. Another reason might be the fact that FFT code tends to be rather specialized, and written once, while practically everybody with a formal education in CS is expected to know how to write a general-purpose sort routine.

If this is the case, let this post serve as a container for cut-and-paste-ready code.

## The Case for Three

There are 3! = 6 permutation of three elements, and 2² ≤ 3! < 2³, which means that an exhaustive comparison tree for three elements must make between two and three comparisons to completely sort them. The easiest way to see that this is the case is by inspecting the comparison tree of order three:

Note that sorting three elements immediately gives us the minimum, maximum and median of the set, as the latter is uniquely determined by the former. The corresponding code is:

```let sort3 (a, b, c) =
if a <= b then
if b <= c then (a, b, c) else
if a <= c then (a, c, b) else
(c, a, b) else
if a <= c then (b, a, c) else
if b <= c then (b, c, a) else
(c, b, a)
```

As the sets are fixed in size, it is sufficient to verify every permutation to prove (in the strictest sense of the word) that the code is correct. There are several algorithms for generating permutations, but for these sizes, the following is sufficient:

```let rec permute =
let remove e = List.filter ((<>) e) in
let perms_with a = List.map (fun y -> a :: y) % permute % remove a
in function
| [] -> [[]]
| l  -> List.concat (List.map (fun a -> perms_with a l) l)
```

With that, verification of the code is straightforward:

```let p3 = List.map (function [a;b;c] -> a,b,c) (permute [1;2;3]) in
List.for_all ((=) (1, 2, 3)) (List.map sort3 p3)
```

## The Case for Four

There are 4! = 24 permutations of four elements, and 24 < 4! < 25. The decision tree has become decidedly bushier. Now the tasks of sorting four elements and finding the extrema among these four are different, in which the first requires between four and five comparisons to sort, but the second can do the job with four comparisons in every case, as you can see by collapsing terminal nodes with equal extrema in the following comparison tree:

Whereas anybody with five minutes to spare could derive the decision tree to sort three elements, the code corresponding to this tree is a prime candidate for cut-and-paste:

```let sort4 (a, b, c, d) =
if a <= b then
if c <= d then
if a <= c then
if b <= d then
if b <= c then (a, b, c, d) else
(a, c, b, d) else
(a, c, d, b) else
if b <= d then   (c, a, b, d) else
if a <= d then (c, a, d, b) else
(c, d, a, b) else
if a <= d then
if b <= c then
if b <= d then (a, b, d, c) else
(a, d, b, c) else
(a, d, c, b) else
if b <= c then   (d, a, b, c) else
if a <= c then (d, a, c, b) else
(d, c, a, b) else
if c <= d then
if b <= c then
if a <= d then
if a <= c then (b, a, c, d) else
(b, c, a, d) else
(b, c, d, a) else
if a <= d then   (c, b, a, d) else
if b <= d then (c, b, d, a) else
(c, d, b, a) else
if b <= d then
if a <= c then
if a <= d then (b, a, d, c) else
(b, d, a, c) else
(b, d, c, a) else
if a <= c then   (d, b, a, c) else
if b <= c then (d, b, c, a) else
(d, c, b, a)
```

Verification can proceed as before, by checking exhaustively correctness over all 4! permutations:

```let p4 = List.map (function [a;b;c;d] -> a,b,c) (permute [1;2;3;4]) in
List.for_all ((=) (1, 2, 3, 4)) (List.map sort4 p4)
```

## The Case for Five

The fact that five comparisons suffice to sort four elements is not entirely straightforward to derive, as poring over 5.3.1 of The Art of Computer Programming shows. Knuth shows that it is still less clear that seven comparisons suffice to sort five elements, as 26 < 5! = 120 < 27, as the upper bound is rather tight. The recipe given is very high level, and although I'm confident it can be used to build a code generator, it is not immediately obvious to me how to convert that intuition into actual code.

This is a good cut-off point to resort to a general-purpose sort; Jon Bentley argues that such small cases are best served by straightforward insertion sort. I went a step further and unrolled a 5-element insertion sort, which has a worst-case complexity of 10 comparisons. The downside to this code is that it is imperative, and destructive: it sorts by swapping the elements in an array:

```let inssort5 v =
let a = v.(1) in
if v.(0) > a then begin
v.(1) <- v.(0);
v.(0) <- a
end;
let a = v.(2) in
if v.(1) > a then begin
v.(2) <- v.(1);
if v.(0) > a then begin
v.(1) <- v.(0);
v.(0) <- a
end else
v.(1) <- a
end;
let a = v.(3) in
if v.(2) > a then begin
v.(3) <- v.(2);
if v.(1) > a then begin
v.(2) <- v.(1);
if v.(0) > a then begin
v.(1) <- v.(0);
v.(0) <- a
end else
v.(1) <- a
end else
v.(2) <- a
end;
let a = v.(4) in
if v.(3) > a then begin
v.(4) <- v.(3);
if v.(2) > a then begin
v.(3) <- v.(2);
if v.(1) > a then begin
v.(2) <- v.(1);
if v.(0) > a then begin
v.(1) <- v.(0);
v.(0) <- a
end else
v.(1) <- a
end else
v.(2) <- a
end else
v.(3) <- a
end
```

Sorting a 5-tuple requires an adapter:

```let sort5 (a, b, c, d, e) =
let v = [| a; b; c; d; e |] in inssort5 v;
v.(0), v.(1), v.(2), v.(3), v.(4)
```

Which can, again, be verified by exhaustively testing it against all 120 permutations of 5 elements:

```let p5 = List.map (function [a;b;c;d;e] -> a,b,c,d,e) (permute [1;2;3;4;5])
in List.for_all ((=) (1, 2, 3, 4, 5)) (List.map sort5 p5)
```

Edit: Indeed, it is possible to directly translate Knuth's instructions for merge-inserting 5 elements into a 7-comparison straight-line sort. As in the text, it is convenient to first rank a, b, c and d so that, say, a < b < d and c < d, then insert e among a, b, d, and finally inserting c. After the ranking is found, merging e is the same in all cases modulo renaming, so it can be considered a sub-routine. This is the code:

```let sort5 (a, b, c, d, e) =
let merge5 a b c d e =
(* abd cd e *)
if e <= b then
if a <= e then (* aebd cd *)
if c <= e then
if c <= a then
(c, a, e, b, d)
else
(a, c, e, b, d)
else
if c <= b then
(a, e, c, b, d)
else
(a, e, b, c, d)
else (* eabd cd *)
if c <= a then
if c <= e then
(c, e, a, b, d)
else
(e, c, a, b, d)
else
if c <= b then
(e, a, c, b, d)
else
(e, a, b, c, d)
else
if e <= d then (* abed cd *)
if c <= b then
if c <= a then
(c, a, b, e, d)
else
(a, c, b, e, d)
else
if c <= e then
(a, b, c, e, d)
else
(a, b, e, c, d)
else (* abde cd *)
if c <= b then
if c <= a then
(c, a, b, d, e)
else
(a, c, b, d, e)
else
(a, b, c, d, e)
in
if a <= b then
if c <= d then
if b <= d then (* abd cd *)
merge5 a b c d e
else (* cdb ab *)
merge5 c d a b e
else
if b <= c then (* abc dc *)
merge5 a b d c e
else (* dcb ab *)
merge5 d c a b e
else
if c <= d then
if a <= d then (* bad cd *)
merge5 b a c d e
else (* cda ba *)
merge5 c d b a e
else
if a <= c then (* bac dc *)
merge5 b a d c e
else (* dca ba *)
merge5 d c b a e
```

The comments show the ranking implied by the result of the comparisons, so that `bac dc` means b < a < cd < c, for instance. Note also that the very last case of `merge5` makes use of transitivity to coalesce two cases into one, for a total of (7⋅14 + 6)/15 = 6.9333 comparisons on average.