2008-08-04

Sorting out Evolution

As a belated end to my exploration of sorting networks (1, 2, 3, 4, 5), I experimented with genetic algorithms to find minimal sorting networks for n elements. And when I write experimented, I mean a couple of weeks of tweaking parameters and code exploring ways for my program to run faster, and to find better and larger networks. Before you continue reading, let me warn you that I wasn't entirely successful: this program works fine for small networks (n ≤ 8), but I couldn't extend this approach to larger, more interesting ones. As it is, though, I had plenty of fun.

The problem of finding minimal sorting networks of a given width is both old and hard; Knuth's examples show uncertainty about minimality for sorters as small as for 16 elements. Hugues Juillé shows that Genetic Algorithms can find minimal networks, but he regrettably keeps the how of it to himself. That said, the table he presents is a valuable starting point as it constitutes an objective target for my program to aim towards:

```let known_best_bounds = [|
(* size, depth L.B., depth U.B.) *)
(*  0 *)  0, 0, 0;
(*  1 *)  0, 0, 0;
(*  2 *)  1, 1, 1;
(*  3 *)  3, 3, 3;
(*  4 *)  5, 3, 3;
(*  5 *)  9, 5, 5;
(*  6 *) 12, 5, 5;
(*  7 *) 16, 6, 6;
(*  8 *) 19, 6, 6;
(*  9 *) 25, 7, 7;
(* 10 *) 29, 7, 7;
(* 11 *) 35, 7, 8;
(* 12 *) 39, 7, 8;
(* 13 *) 45, 7, 9;
(* 14 *) 51, 7, 9;
(* 15 *) 56, 7, 9;
(* 16 *) 60, 7, 9;
|]
```

The network size is the über-parameter of the simulation that determines all other tweaks and knobs; since I want to run the program to look for networks of various widths without having to recompile, I make it a reference:

```let network_width = ref 7
```

Part of the difficulty of effectively applying GAs to solving hard combinatorial problems is deciding on a good genomic encoding of the problem's parameters. Such an encoding must be readily amenable to the two basic operations in a GA search, namely crossover and mutation. I settled on representing directly a sorting network as an array of pairs (i, j) with 0 ≤ ij < network_width, each representing an exchange box between wires i and j. If i = j, the "exchange" box is the null operation; this will allow for encoding sorting networks of actual varying size on a fixed-length genome.

```type genome = (int * int) array
```

I use an array not only for easy picking of the individual "genes" (the exchange boxes), but so that each genome is actually mutable; this simplifies the bookkeeping needed to complete a tournament (of which more, later). Even so, many operations on a genome as a whole are mostly traversals, with some operation applied to each gene; I found it convenient to abstract the traversal as a generic fold:

```let fold_genome f e (g : genome) =
let bounds = Array.make !network_width 0 in
fst (Array.fold_left (fun (e, depth) (i, j) ->
if i == j then (e, depth) else
let d = max depth (1 + max bounds.(i) bounds.(j)) in
bounds.(i) <- d; bounds.(j) <- d;
(f (d > depth) e (i, j), d)) (e, 0) g)
```

I had presented this algorithm before in "Picturing Networks"; the gist of it is that `bounds` keeps tab of the "stage" or depth a given sorter must be scheduled at. For instance, the basic measure of a genome is the number of productive genes it has; in other words, how many exchange boxes are actually attached to a different pair of wires. The other basic metric of a network is its depth. This fold handily lets me find both quantities at once:

```let count_genes =
fold_genome (fun is_sequential (size, depth) _ ->
succ size, if is_sequential then succ depth else depth)
(0, 0)
```

Another rather trivial application of this fold is to print a genome in a way that highlights each sequential stage:

```let string_of_genome (g : genome) =
let buffer = Buffer.create 16 in
ignore (fold_genome (fun is_sequential any (i, j) ->
if any then
Buffer.add_char buffer (if is_sequential then ';' else ',');
true) false g);
Buffer.contents buffer
```

Genomes must randomly sample the space of possibilities; a truly (that is, uniformly) random genome must draw from a space of random genes. To construct a random pair 0 ≤ i < j < n with uniform probabilities, the most expedient way is to use rejection:

```let rec random_ordered_pair n =
let i, j = Random.int n, Random.int n in
if i <= j then i, j else random_ordered_pair n
```

There is an exact method using the inverse of the cumulative distribution function, but it involves taking square roots, and this method is actually faster (this was an investigation of its own). With that, a random gene is an appropriately parameterized random exchange box:

```let new_gene () = random_ordered_pair !network_width
```

Once a random exchange box is in place in a network, the mutation operator must controllably change it into another box. One alternative is to replace it with a completely different one; a more parsimonious approach is to detach a wire and reattach it at random. Two things must be ensured, though: first, the change must preserve the basic property that the exchange box be a nondecreasing pair of numbers; second, it must be uniform in the space of possible changes so defined. Each wire can be chosen with equal probabilities, but then which of the two terminals gets reattached depends on the placement of the exchange box relative to the selected wire:

```let mutate_gene (i, j) =
let k = Random.int !network_width in
if k < j then (k, j) else (j, k)
```

My preoccupation with ensuring that all choices are uniform is not just theoretic; any skew in the search has readily observable consequences. I had an error in the choice of random genes which prevented the program to explore further than networks of width 6; once I corrected it, I could test wider networks. With this, there's the problem of selecting a random genome, which is trivial except for the choice of genome length. How many exchange boxes should a genome have? Too small a number, and the network would never sort; too large and it will take forever to minimize and, what's worse, the probability that a crossover would improve the offspring would plummet (in other words, the search space would be just too large). I found a number both reasonable and completely arbitrary:

```let genome_length = !network_width * !network_width
```

Reasonable because it's easy to sort in quadratic time (witness Bubble Sort); arbitrary since it only depends on the network width. The next completely arbitrary but vital choice is in deciding how to compare networks. The fitness of a genome is the sole measure that determines in which direction the search proceeds; even more, the choice of a fitness function is what makes a Genetic Algorithm go boom or bust. In this case, the complication of selecting a good fitness evaluation resides in that I'm trying to minimize two quantities at once: size and depth of the sorting network. There's much literature on multiobjective optimization that I should have read; what's really important is that the combination of individual objectives must be "monotone", or Pareto dominant if it is to succeed; this is a complicated way to say that the fitness must be a total order on objective tuples, in this case pairs of (size, depth):

```let target_fitness (size, depth) =
let min_size, _, min_depth = known_best_bounds.(!network_width) in
if size == min_size
then float (genome_length - size + 1 - (depth - min_depth))
else float (genome_length - size)
```

I aim for known_best_bounds for the given network_width. The nearer the metrics of the genome are to those numbers, the fitter it will be deemed. Note that, for n ≥ 2, `target_fitness (n, 1)` - `target_fitness (n, n)` < `target_fitness (n-1, n-1)` - `target_fitness (n, n)`; hence the fitness function is monotone in size and in depth, in that order, as required. Now, how to determine if a network is fit or not? The least I can expect of a sorting network is for it to actually sort its input. Trying all network_width! (that's a factorial) possible inputs is unfeasible; fortunately, Knuth showed that it suffices to test all 2network_width binary sequences of length network_width. The Zero-One Principle says that a network sorting the latter is a sufficient condition for it sorting the former. What's more, the 0-1 principle implies that a sorting network can be modeled by a binary circuit. Each exchange box is then a binary function with table:

in0in1out0out1
0000
0101
1001
1111

That is, f(in0, in1) = (in0in1, in0in1).

Aside: I find endlessly confusing the fact that the maximum ↑ and minimum ↓ operators are respectively the join ∨ and meet ∧ of the integer lattice, but the arrows go backwards. Who to blame? Who came with the logical connectives?

An exchange box, hence, needs to select bits and apply the corresponding functions according to a mask. To sort a binary word, it suffices to build the masks for each exchange box, and with that apply the sorting function f defined above:

```let sort a word =
Array.fold_left (fun word (i, j) ->
(* i ≤ j *)
word
lor  (word lsr (j - i) land     (1 lsl i))
land (word lsl (j - i) lor lnot (1 lsl j))
) word a
```

Note that I actually sort in reverse order, that is, according to f′(in0, in1) = (in0in1, in0in1). This is so that testing for sortedness is now as simple as checking that the result is a block of contiguous ones, from the least significant bit upwards, using the old trick that relies on the fact that, in two's complement, a block of k contiguous 1's represents exactly 2k - 1:

```let is_sorted word = word land (word + 1) == 0
```

But obviously the fitness of a sorting network is not just determined by its ability to sort, but by the number of exchange boxes it employs to do so. To evaluate the first criterion, generate all 2network_width binary words and test that all are correctly sorted. If they aren't, the computed fitness is, rather arbitrarily, a fraction based on the first sequence the network didn't sort. This is so that the fitness is a total order on the space of genomes:

```exception Unsorted of int

let evaluate_fitness (g : genome) =
let tests = 1 lsl !network_width in
try
for i = 0 to pred tests do
if not (is_sorted (sort g i)) then raise (Unsorted i)
done;
target_fitness (count_genes g)
with Unsorted i -> float i /. float tests
```

So much for genes and genomes; an individual is a genome and a measure of its fitness:

```type individual = { mutable score : float; genes : genome; }
```

Its score is as mutable as its genome is. A new individual is born with a menagerie of random genes:

```let new_individual () =
let g = Array.init genome_length (fun _ -> new_gene ())
in { score = evaluate_fitness g; genes = g; }
```

Analogously, a population is a collection of individuals of which there is a unique best champion:

```type population = { mutable best : individual; pool : individual array; }
```

And now for another arbitrary parameter, namely the size of the population, or rather gene pool. As with the genome size, it is a function of the network_width to be minimized:

```let pool_size = ref (100 * !network_width * !network_width)
```

I've experimented with a pool size cubic in the network_width, but I settled on a quadratic dependence, appropriately scaled. This is too big for small widths, but appropriate for larger ones. In any case, I found it convenient to try different values for it from run to run, so it is a reference. A a population is an array of genomes of this size:

```let find_best pool =
Array.fold_left (fun w g -> if w.score > g.score then w else g) pool.(0) pool

let new_population () =
let p = Array.init !pool_size (fun _ -> new_individual ())
in { best = find_best p; pool = p; }
```

Now would the knobs to tweak end? I'm almost there; but these are fundamental to the performance of the program, and can be the focus of endless tweaking. The GA probabilities:

```let crossover_rate = 0.6
and mutation_rate = 0.01
```

determine the rate of application of the GA operators to the genomes in what's called a tournament, the Darwinian show-down that makes the core of GAs. I used the so-called microbial tournament as it is really simple and easy to implement. The losing genome will have some of its genes replaced with the corresponding ones from the winner, and sometimes mutated according to these numbers:

```let microbial_tournament p (i, j) =
let g1, g2 = p.pool.(i), p.pool.(j) in
let winner, loser =
if g1.score < g2.score || g1.score = g2.score && g2 == p.best
then g2, g1
else g1, g2 in
assert (p.best != loser);
(* Recombination *)
for i = 0 to genome_length - 1 do
if Random.float 1. < crossover_rate then
loser.genes.(i) <- winner.genes.(i);
done;
(* Mutation *)
for i = 0 to genome_length - 1 do
if Random.float 1. < mutation_rate then
loser.genes.(i) <- mutate_gene loser.genes.(i)
done;
loser.score <- evaluate_fitness loser.genes;
if p.best.score <= loser.score then p.best <- loser
```

In a microbial tournament between two individuals i and j in the population p, both are evaluated and deemed a winner and loser. The latter is never the best individual in the population, by definition. Genes from the winner are copied to the loser's genome, according to crossover_rate; then the loser genes are mutated according to mutation_rate. The key in this type of tournaments is that the loser individual is replaced by the offspring resulting from this process, which is evaluated, and if found better than the best so far, promoted.

Now this process is repeated for pairs of individuals drawn at random from the population. It is equivalent, but faster, to partition the pool into halves, perform a random pairing between both, and do a tournament between each pair:

```let shuffle a =
for j = 1 to Array.length a - 1 do
let k = Random.int (j + 1) in
let t = a.(j) in
a.(j) <- a.(k);
a.(k) <- t
done

let full_microbial_tournament p =
shuffle p.pool;
let i = ref 0 in
while !i < !pool_size - 1 do
microbial_tournament p (!i, !i + 1);
i := !i + 2
done
```

Now I apply the `full_microbial_tournament` iteratively until the best individual in the population reaches the target fitness. Whenever there's a fitness bump the individual promoted is printed out, together with some statistics. In the event of a console break, the loop is terminated cleanly. In any case, the result of the function is the number of iterations spent in the loop:

```let tournament fitness p =
let iter = ref 0
and last = ref 0. in
Sys.catch_break true;
begin try while p.best.score < fitness do
full_microbial_tournament p;
incr iter;
if 1. <= p.best.score && !last < p.best.score then begin
let (s, d) = count_genes p.best.genes in
last := p.best.score;
let sum, act = Array.fold_left (fun (s, n as p) g ->
if g.score < 1. then p else (s +. g.score, succ n))
(0., 0) p.pool in
Printf.fprintf stderr "%7d act = %5.2f%%, avg = %7.5f, max = %7.5f (%2d/%2d)\n"
!iter
(100. *. float act /. float !pool_size)
(sum /. float act)
!last
s d;
flush stderr
end
done with Sys.Break -> prerr_endline "Interrupted.\n" end;
!iter
```

A driver function selects the target fitness given by the best known bounds for the chosen network_width, prints out some starting information and builds an initial population and runs the iterative `tournament` with it:

```let test () =
let (size, _, depth) = known_best_bounds.(!network_width) in
let fitness = target_fitness (size, depth) in
Printf.fprintf stderr "Searching for a %d-network of size %d and depth %d (fitness = %7.5f)\n"
!network_width size depth fitness;
flush stderr;
let pool = new_population () in
let iter = tournament fitness pool in
let n, d = count_genes pool.best.genes in
Printf.fprintf stderr "Solution in %d rounds with fitness %f\n"
iter pool.best.score;
Printf.fprintf stderr "Found a %d-network of size %d and depth %d\n"
!network_width n d;
print_endline (string_of_genome pool.best.genes)
```

The results are printed out, and the best individual found is displayed. The `main` function makes running the GA from the command line convenient:

```let main () =
let usage = "usage - ganet [-seed <n>] [-pool <n>] -width <n>" in
let spec = Arg.align [
"-seed",  Arg.Int (fun n -> Random.init n), " random seed";
"-pool",  Arg.Set_int pool_size,            " population pool";
"-width", Arg.Set_int network_width,        " network width";
] in
Arg.parse spec (fun _ -> raise (Arg.Bad "extra argument")) usage;
if !network_width < 2 || !network_width > 16 then begin
prerr_endline "network width must be between 2 and 16";
exit 2
end;
if !pool_size < !network_width then begin
prerr_endline "population pool must be as large as the width";
exit 2
end;
test ()

let () = main ()
```

As with any experiment, repeatability must be designed in; programs that use random numbers should take the random seed as a parameter. You should get the same results from this run:

```% ./ganet -seed 127 -pool 100 -width 5
Searching for a 5-network of size 9 and depth 5 (fitness = 41.00000)
1 act = 84.00%, avg = 16.03571, max = 24.00000 (25/21)
4 act = 93.00%, avg = 18.39785, max = 25.00000 (24/18)
5 act = 88.00%, avg = 19.15909, max = 26.00000 (23/18)
6 act = 86.00%, avg = 19.54651, max = 27.00000 (22/18)
12 act = 79.00%, avg = 23.13924, max = 28.00000 (21/18)
14 act = 82.00%, avg = 23.80488, max = 29.00000 (20/16)
15 act = 85.00%, avg = 24.15294, max = 30.00000 (19/17)
17 act = 86.00%, avg = 25.38372, max = 33.00000 (16/15)
29 act = 73.00%, avg = 29.27397, max = 35.00000 (14/10)
33 act = 80.00%, avg = 30.32500, max = 36.00000 (13/ 9)
41 act = 70.00%, avg = 32.14286, max = 38.00000 (11/ 9)
45 act = 67.00%, avg = 33.16418, max = 39.00000 (10/ 6)
67 act = 69.00%, avg = 36.24638, max = 40.00000 ( 9/ 6)
70 act = 76.00%, avg = 36.64474, max = 41.00000 ( 9/ 5)
0:4;0:2,1:3;2:4,0:3;3:4,1:2;2:3,0:1
Solution in 70 rounds with fitness 41.000000
Found a 5-network of size 9 and depth 5
```

As I disclaimed at the beginning, this program finds minimal sorting networks for 8 elements, but gets stuck searching for networks for 9 elements. I'd love to hear any improvements you find.

bluestorm said...

It's a very uninteresting thing to say, but you could probably improve string_of_genome readability by using Printf.bprintf (printf into a buffer).

Matías Giovannini said...

Not at all uninteresting, thank you for the feedback. Maybe it's a bit clearer:

let string_of_genome (g : genome) =
let buffer = Buffer.create 16 in
ignore (fold_genome (fun is_sequential any (i, j) ->
if any then
Buffer.add_char buffer (if is_sequential then ';' else ',');
Printf.bprintf buffer "%d:%d" i j;
true) false g);
Buffer.contents buffer

but the gnarly doubly-conditional separator is still there.

Wouter said...

The logical connective for "or" is "v" for the Latin "velle". At least that's what I was told in school.

Matías Giovannini said...

It's "vel", actually, and the disjunction was often paired "vel… aut…" like "either… or…"; "velle" is the infinitive form of "volo", or "to want" (volo, vis, velle, volui).

The things I shouldn't remember… Anyway, your explanation is plausible.