## 2008-06-27

### Bitonic Sorting

The riffle shuffle I've written about is not a mere card trick. It is a basic building block in the art of devising parallel networks for efficient use of SIMD computers. A typical application is the so-called bitonic sort. An elementary introduction to the algorithm with animations might make its working clear, but I find Knuth's exposition transparent:

Let us say that a sequence ⟨z1, … zp⟩ of p numbers is bitonic if z1 ≥ … ≥ zk ≤ … ≤ zp for some k, 1 ≤ kp […] A bitonic sorter of order p is a comparator network capable of sorting any bitonic sequence of length p into non-decreasing order […] [W]e can construct a bitonic sorter of order p […] by first sorting the bitonic subsequences ⟨z1, z3, z5, …⟩ and ⟨z2, z4, z6, …⟩ independently, then comparing and interchanging z1:z2, z3:z4, …

(Knuth, Sorting and Searching, p 232)

In other words, Knuth describes the shape of a bitonic sequence as that of an inverted vee, but in most general terms it is the concatenation of two monotonic (that is, ascending or descending) sequences, any of which can be empty. This means that a sequence with the shape of a vee is also monotonic bitonic, and this is the definition I'll be using, for reasons that will be apparent later.

A comparator is a box with two inputs and two (ordered) outputs such that the upper, or first one is the lesser of the two inputs, and the lower, second one is the greater of said inputs. In other words, given a pair (x, y), a comparator outputs (xy, xy). Knuth denotes above the comparator by a colon; note that, strictly speaking, the inputs to it are unordered since x:y = y:x. A comparator can be straightforwardly defined as:

```let swap (x, y) = if x <= y then x, y else y, x
```

The problem with sorting networks is that they are most naturally applied to inputs that are powers of two in length; any other than that, the definitions lose their pretty symmetry. Consider two lists l = ⟨l1, l2, …⟩ and r = ⟨r1, r2, …⟩ of possibly unequal length. Given a binary operation ⊕ its pointwise extension is ⟨l1r1, l2r2, …⟩. What of left-over elements? Well, in that case ⊕ better have a unit ε to pad the remainder. If it is a left and a right unit of ⊕, that is, ε ⊕ x = x = x ⊕ ε, then ⊕ must have type α×α→α. In practical terms, the pointwise extension of ⊕ must operate between lists of the same type and must result in a list of that very type, with left-over stragglers passing unchanged into the result. This is not as general as it could be.

Fortunately `sort` is regular enough. Unfortunately, it results in a pair, and while a list of pairs can be flattened easily enough, it wouldn't mix with the leftover singleton. A specially written function would do, but I'd prefer something more composable. Monads to the rescue! I'll use just the necessary machinery to work in the list monad, without all the generic scaffolding. First the unit:

```let unit x = [x]
```

Then the join:

```let join = List.concat
```

With this, pairing is straightforward:

```let rec pairup = function
| [], l | l, [] -> List.map unit l
| a :: l, b :: r -> [a; b] :: pairup (l, r)
```

Note that the extra elements appear as singletons at the end of the resulting list; that is, they are left-aligned. Zipping two lists together is as easy now as:

```let zip p = join % pairup \$ p
```

With a little lifting:

```let lift2 f = function [x; y] -> f (x, y) | l -> l
```

and some injections:

```let inj2 (x, y) = [x; y]
```

(note that `lift2 inj2``id`) `exchange` is a point-free one-liner:

```let exchange p = join % List.map (lift2 \$ inj2 % swap) % pairup \$ p
```

Note how `lift2` and `inj2` take care of the argument and the return type of `swap` separately, so that either in isolation makes sense. Note also that, since the unpaired elements are last, the net result is as if the first sequence was padded with negative infinities (so that the maxima pass unscathed to the right), and the right with positive infinities (symmetrically, so that the minima remain on the left). The `unzip` function I wrote the last time isn't quite correct, as it will put the odd last element of a list in the even sublist! Yes, it is buggy. Here's the correct version:

```let rec unzip = function
| []           -> [], []
| [x]          -> [x], []
| x :: y :: xs ->
let l, r = unzip xs in
x :: l, y :: r
```

To facilitate the recursive application of the procedure to both halves of a sequence, let me introduce a combinator:

```let (>>>) f (x, y) = (f x, f y)
```

Now to the point of this post. By Knuth's definition, the bitonic sort of a bitonic sequence is two half-length bitonic sorts sandwiched between an even-odd shuffle and a sorting network. This translates straightforwardly into:

```let rec bitonic = function
| []  -> []
| [x] -> [x]
| l   -> exchange (bitonic >>> unzip l)
```

It is clear that if `exchange` is to be feed the result of `unzip` (directly or indirectly), the right half will always be the shorter one, and the padding infinities will go there. This presents me with a problem, since Knuth's procedure to sort a bitonic sequence in the shape of an inverted vee will make the descending right-hand side padded with an infinity, and thus not bitonic at all. It is necessary to ensure that all bitonic sequences fed to the network are first descending and then ascending:

```let merge (p, q) = bitonic (List.rev_append p q)
```

Assuming that p and q are sorted, `List.rev_append p q` does exactly that. Now, to sort an arbitrary list it only suffices to split it, sort both halves and merge them as a bitonic sequence:

```let rec sort = function
| []  -> []
| [x] -> [x]
| l   -> merge (sort >>> unzip l)
```

This is not a very efficient serial sorting algorithm, as it makes more comparisons than are strictly necessary. However, the `unzip` is not a parallel operation but the connection topology of the network and so "free"; the same can be said about `rev_append`. The recursive calls to `sort` on both halves can be done in parallel, as can be the recursive calls to `bitonic`. Finally, `exchange` can be performed simultaneously on pairs of elements; this makes Batcher's bitonic sort extremely attractive for parallel sorting networks.

In closing, let me do a quick check to see if I didn't made a mistake. Knuth proves (in an exercise!) that a sorting procedure is correct if it sorts correctly every binary sequence. Recursively generating all binary sequences is easy:

```let rec bin_sequences n =
if n = 0 then [[]] else
let s = bin_sequences (n - 1) in
List.map (fun l -> 0 :: l) s @ List.map (fun l -> 1 :: l) s
```

As is checking if a sequence is ascending:

```let rec is_ascending = function
| [] | [_] -> true
| x :: y :: l -> x <= y && is_ascending (y :: l)
```

With that, a check to see if, for instance, all binary sequences of length 10 or less are correctly sorted is a one liner:

```# List.for_all is_ascending % List.map sort % bin_sequences \$ 10 ;;
- : bool = true
```