Alaska Ataca a Kamtchatka: 2007

2007-12-13

Naturally Sorted

Well, Jeff Atwood has weighed in the discussion, with a clear explanation for "the rest of them". Seeing Batchelder's Python code, I realized I didn't do as well as I could with my OCaml version. Specifically, I didn't look hard enough for a simple way to split the string between decimal and non-decimal parts. The idea is not to delimit parts by looking for transitions between one kind and the other, but to use one of the kinds as the delimiter itself:

let split =
  let delim = Str.regexp "[0-9]+" in
    List.map (function Str.Delim s -> s | Str.Text s -> s)
  % Str.full_split delim

As is to be expected with OCaml, the result of full_split is rather odd, in that it uses an ADT to distinguish between text and delimiters. I use a forgetful mapping to coalesce all the parts into a plain string list.

Another point where the comparison could be more discriminating is when comparing numbers with leading zeros. While it is obvious that "010" > "08", it is also to be expected that "010" > "10". The fix for this is to compare first by value, then by length:

let cmp_alnum x y =
  try
    let m, n = int_of_string x, int_of_string y in
    let c = Pervasives.compare m n in
    if c != 0 then c else
    Pervasives.compare (String.length x) (String.length y)
  with Failure _ -> Pervasives.compare x y

There's always opportunity for polish…

2007-12-11

The Alphanumeric Sort

A discussion over reddit about Dave Koelle's "Alphanum Algorithm" to sort "naturally" strings that include numerals. The problem to solve is, in Koelle's words:

For example, in a sorted list of files, "z100.html" is sorted before "z2.html". But obviously, 2 comes before 100!

First, it's clear that it's a comparison function between (rather, a total order on) strings what's at issue here; second, as pointed out in reddit, it's what OS X does in the Finder (I wouldn't swear it, but I think iTunes sorts it like this, too). Martin Pool's Natural String Order comparison is another implementation of this idea, one that is special in that (in his words) [p]erformance is linear: each character of the string is scanned at most once, and only as many characters as necessary to decide are considered. His page also lists a number of alternative implementations, giving priority to Stuart Cheshire.

I'll not aim that high; my implementation is also linear but I won't bother trying to visit each character at most once. The idea is to separate numbers from non-numbers, and compare corresponding numbers according to numeric value and not as strings. In other words, 100 > 20 even when "100" < "20" (as the string comparison stops as soon as the first character '1' on the left compares less than the first character '2' on the right). The first thing needed would then be a way to split a string between numeric and non-numeric components. OCaml regexps are somewhat crude, but a multipass (cue Milla Jovovich's Leeloo Dallas multipass!) approach works: I'll insert a delimiter between a sequence of digits followed by a sequence of non-digits; then I'll insert the same delimiter between a sequence of non-digits followed by a sequence of digits. Finally, I'll split the string on the delimiter. The only problem is that any character is potentially valid in a string; I cop out and use a null byte:

let (%) f g x = f (g x)

let split =
  let re_d_D = Str.regexp "\\([0-9]+\\)\\([^0-9]+\\)"
  and re_D_d = Str.regexp "\\([^0-9]+\\)\\([0-9]+\\)"
  and re_NUL = Str.regexp_string "\000"
  and tp_spl = "\\1\000\\2" in
    Str.split re_NUL
  % Str.global_replace re_D_d tp_spl
  % Str.global_replace re_d_D tp_spl

Next, I need to lift a comparison on elements to a comparison on a list of those elements. There is no standard lexicographical comparison on lists, but the implementation is trivial. As long as elements compare as equal, recur on the rest of the lists until a difference is found or the end of the shortest list is reached, whichever comes first:

let rec cmp_list cmp l m = match l, m with
| [], [] ->  0
| [], _  -> -1
| _, []  ->  1
| a :: l, b :: m ->
  let c = cmp a b in
  if c == 0 then cmp_list cmp l m else c

Finally comes the alphanumeric comparison. The idea is to try first to compare both strings as numbers, and if that isn't possible to fall back on plain string comparison. For that, I take advantage of the fact that int_of_string fails with exception on non-numeric input:

let cmp_alnum x y =
  try Pervasives.compare (int_of_string x) (int_of_string y)
  with Failure _ -> Pervasives.compare x y

Putting all together, I perform a Schwartzian Transform on the list of strings to be sorted, splitting on numbers to give a list of lists. Then, I sort using the standard sort on lists with a comparison function built by lifting the alphanumeric comparison to a lexicographic ordering. Lastly, I concatenate the pieces back into strings:

let sort_alnum =
    List.map (String.concat "")
  % List.sort (cmp_list cmp_alnum)
  % List.map split

To prove my claim of linearity, note that split is (likely—I haven't studied Str's code) linear since the regexes are deterministic, that cmp_list is obviously linear on the size of the lists hence linear on the size of the original string, and that cmp_alnum is linear since conversion of string to numbers is linear just as is lexicographical comparison. Simple, huh?

2007-12-05

Functional Refactoring

A common and simple enough task: count the words present in a text file. For our purposes (and as it is generally agreed), a word is a contiguous sequence of alphabetic characters, delimited by non-alphabetic characters. We should be mindful of edge cases: both the beginning and the end of the file are treated as non-alphabetic characters; in other words, the first and last words of the file, if any, must not be lost.

I found lying around in my code folder (you'd be surprised) the following OCaml code:

let iter_words f inch =
  let ibf = String.create buflen
  and obf = Buffer.create 80
  and any = ref true
  and inw = ref false in
  while !any do
    let nread = unsafe_input inch ibf 0 buflen in
    for i = 0 to pred nread do
      let c = String.unsafe_get ibf i in
      if !inw then begin
        if isalpha c then
          Buffer.add_char obf c
        else begin
          f (Buffer.contents obf);
          Buffer.clear obf;
          inw := false
        end
      end else if isalpha c then begin
        Buffer.add_char obf c;
        inw := true
      end
    done;
    any := nread != 0
  done;
  if !inw then f (Buffer.contents obf)

The first thing to notice is that this is an iterator over the file's words: for every one found, the parameter f is called with its text. What f does with the word is its problem; the advantage of this approach is that you can process files without having to load the entire result on memory; the main disadvantage is that you are forced to process words in sequence, and handle the intermediate state by yourself.

Inside the function, the code is unabashedly imperative: a pair of nested loops process the file, the outer reading a bufferful at a time over ibf (buflen is implied, and defined somewhere else with a suitable value; so is unsafe_input), the inner scanning the buffer a character at a time and accumulating eventual words in obf. Inside the inner loop, a state machine uses inw to track whether it is scanning a word or a non-word, changing according to predicate isalpha (also suitably defined). This state must survive entire buffers, as a word might stride one. A further state variable, any, controls the outer loop termination.

All very straightforward (this code has modification date October 1st, 2006, and I could read it after all this time without problem), but also very, very ugly: this style may be appropriate for C, but it is not especially defensible in OCaml. So, I set out first to replace the condition variable by a functional state machine. The idea is to have a number of recursive functions, each for every state, which, instead of transitioning, returns the next function to call. This is very similar to the object-oriented GoF pattern "State Machine".

The only complication is that in order to express this pattern we need recursive types: a state is the type of functions of a parameter of type, say, α returning a state! In other words, the type is α → β as β. For OCaml to accept this, it needs the -rectypes option; the problem with this extension is that the type-checker accepts a lot of junk along with the valid definitions, making development more fragile. The alternative is to wrap the recursion so that it goes through a type constructor: either an algebraic data-type, or a record type. I choose the latter:

type 'a state = { fn : 'a -> 'a state }

The function signature is the same, and I'll keep for now the input and output buffers:

let iter_words f inch =
  let ibf = String.create buflen
  and obf = Buffer.create 80 in

The first state is "scanning outside a word"; if the next character is alphabetic, it must start a new word and transition, otherwise it keeps looking:

  let
  rec no_word = { fn =
    fun c ->
      if isalpha c then begin
        Buffer.add_char obf c;
        in_word
      end else
        no_word
  }

The second state is "scanning inside a word"; if the next character is alphabetic, it accumulates it and keeps scanning, otherwise it ends the word, sends it to the accumulating function f and transitions:

  and in_word = { fn =
    fun c ->
      if isalpha c then begin
        Buffer.add_char obf c;
        in_word
      end else begin
        let () = f (Buffer.contents obf) in
        Buffer.clear obf;
        no_word
      end
  }
  in

A buffer is scanned character by character (I've renamed the variables formerly known as nread and i). It must, however, conserve the last known state, so that words spanning buffers are not incorrectly split:

  let rec scan pos len state =
    if pos == len then state else
    let c = String.unsafe_get ibf pos in
    scan (succ pos) len (state.fn c)
  in

Note how the state handler returns its successor. The file must then be read a buffer at a time. Again, it must conserve the state across calls to the buffer scanning loop, terminating when there is no more data to read:

  let rec ioread state =
    let nread = unsafe_input inch ibf 0 buflen in
    if nread == 0 then state else
    ioread (scan 0 nread state)
  in

Finally, the entire device is set into motion by reading outside a word (remember, the beginning-of-file is taken as an implied non-word character). If the ending state is in_word, the last word must be taken into account and processed (again, the end-of-file is taken as an implied non-word character):

  if ioread no_word == in_word then f (Buffer.contents obf)

From 26 lines to 38 lines, a modest increase in code length for a great improvement in modularity and (I'd argue) understandability: higher-order functions are a powerful abstraction device. I could, however do better: there are many explicitly recursive functions that beg to be generalized as recursion schemes, or patterns.

The first recursion scheme that should be pulled out is the reading loop. It uses as a free variable the input buffer that it shares with the rest of the code. Abstracting out the work done on each buffer, and accumulating on an explicit parameter:

let fold_in f e inch =
  let ibf = String.create buflen in
  let rec fold e =
    let nread = unsafe_input inch ibf 0 buflen in
    if nread == 0 then e else
    fold (f e ibf nread)
  in fold e

The parameter f will take the the last accumulated value e, and the newly read data in the form of the buffer and the count of valid characters in it; it must return the updated result for the next iteration. This means that I must manage to supply scan with the relevant state. But I will also abstract out the iteration over a string. Unfortunately, the String module doesn't define a fold on strings, much less one on a substring:

let fold_string f e str pos len =
  let rec fold pos e =
    if pos == len then e else
    let c = String.unsafe_get str pos in
    fold (succ pos) (f e c)
  in fold pos e

I submit that both functions are independently useful, especially the last one. The input iteration is just:

  let ioread state =
    fold_in scan state inch

In turn, the string iteration (with parameters suitably rearranged to allow for η-conversion) is:

  let scan state buf len =
    fold_string (fun state -> state.fn) state buf 0 len

Now there is not much point in naming the simple folds; η-converting and inlining:

  let ioread state =
    fold_in (fun state buf -> fold_string (fun state -> state.fn) state buf 0)
      state inch

The state machine doesn't change, and the output buffer can still be kept free ("global", so to speak) in it. The entire code is:

let iter_words f inch =
  let obf = Buffer.create 80 in
  let
  rec no_word = { fn =
    fun c ->
      if isalpha c then begin
        Buffer.add_char obf c;
        in_word
      end else
        no_word
  }
  and in_word = { fn =
    fun c ->
      if isalpha c then begin
        Buffer.add_char obf c;
        in_word
      end else begin
        let () = f (Buffer.contents obf) in
        Buffer.clear obf;
        no_word
      end
  }
  in
  let ioread state =
    fold_in (fun state str -> fold_string (fun state -> state.fn) state str 0)
      state inch
  in
  if ioread no_word == in_word then f (Buffer.contents obf)

The code is, now, 44 lines long, counting the abstracted folds. There is nothing, however, that prevents the state to include both the callback function to the overall word iteration and the output buffer. The result is not as pretty, because all those lifted parameters clutter the function signatures a bit; it has, however the advantage that each function is top-level:

let
rec no_word = { fn =
  fun (f, buf, c) ->
    if isalpha c then begin
      Buffer.add_char buf c;
      in_word
    end else
      no_word
}

and in_word = { fn =
  fun (f, buf, c) ->
    if isalpha c then begin
      Buffer.add_char buf c;
      in_word
    end else begin
      let () = f (Buffer.contents buf) in
      Buffer.clear buf;
      no_word
    end
}

let ioread =
  fold_in (fun st str ->
    fold_string (fun (f, buf, state) c -> (f, buf, state.fn (f, buf, c)))
      st str 0)

let iter_words f inch =
  let (_, buf, state) = ioread (f, Buffer.create 80, no_word) inch in
  if state == in_word then f (Buffer.contents buf)

The code is more compact at 31 lines, thanks to reusing the folds. A tweak that could make the code a bit tidier is to change the type of the states to be transducing instead of just accepting. The result of a transducing state includes the information passed to it updated by the state as needed; this means that, whereas the former code worked because the state (as represented by the Buffer) was mutable, the following is general:

type ('a, 'b) trans = { st : 'a -> 'b -> 'a * ('a, 'b) trans }

let
rec no_word = { st =
  fun (f, buf as s) c ->
    if isalpha c then begin
      Buffer.add_char buf c;
      s, in_word
    end else
      s, no_word
}

and in_word = { st =
  fun (f, buf as s) c ->
    if isalpha c then begin
      Buffer.add_char buf c;
      s, in_word
    end else begin
      let () = f (Buffer.contents buf) in
      Buffer.clear buf;
      s, no_word
    end
}

let ioread =
  fold_in (fun e str ->
    fold_string (fun (t, state) -> state.st t)
      e str 0)

let iter_words f inch =
  let buf = Buffer.create 80 in
  let (_, state) = ioread ((f, buf), no_word) inch in
  if state == in_word then f (Buffer.contents buf)

And this is essentially the 34 lines of code that you would write in Haskell, except that the folds would be provided to you in the corresponding monads. The only imperative bits are the file input and the use of an extensible buffer for performance.

Speaking of which, what is the cost of abstracting out the structure? I tested all five versions with the Project Gutember e-text for Beowulf. I ran the tests seven times, took their mean time, and normalized it to that of the first version. First, bytecode:

Abstraction Cost, bytecode (`ocamlopt.opt -inline 10 -unsafe -o ioread.exe unix.cmxa ioread.ml`)
Version	Speed (Rel.)	Speed-down
	1.00	00.0 %
	1.27	21.3 %
	1.38	27.5 %
	1.81	44.8 %
	2.24	55.4 %

Then, naïve native code:

Abstraction Cost, native code, non-inlined (`ocamlopt.opt -o ioread.exe unix.cmxa ioread.ml`)
Version	Speed (Rel.)	Speed-down
	1.00	00.0 %
	1.24	19.1 %
	1.38	27.8 %
	1.79	44.3 %
	2.30	56.4 %

Lastly, optimized native code:

Abstraction Cost, native code, inlined (`ocamlopt.opt -inline 10 -unsafe -o ioread.exe unix.cmxa ioread.ml`)
Version	Speed (Rel.)	Speed-down
	1.00	00.0 %
	1.27	21.3 %
	1.38	27.5 %
	1.78	43.9 %
	2.24	55.4 %

As you can see, the differences are pretty consistent and independent of the target architecture. Furthermore, the cost of a fully functional, monadic read is more than half more that of the imperative double-loop. The good thing about OCaml is not that it is an efficient functional language, but that it lets you move seamlessly between functional and imperative style of coding, according to your preferences and your needs.

2007-12-01

The Genuine Sieve of Eratosthenes

The Sieve of Eratosthenes is a staple demo of the expressive power of lazy functional languages. It is usually encountered in its tersest Haskell form:

sieve [] = []
sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

primes = sieve [1..]

However, in a feisty paper, Melissa O'Neill clearly shows that this is not the implementation of a sieving process, much less of Eratosthenes's algorithm (see also her rather forceful Haskell Café message). As is well-known, Eratosthenes' Sieve begins with an unmarked "grid" (or rather, a list) of integer numbers, starting from 2. Then, taking in turn the least unmarked number d, it "crosses out", or marks every multiple 2⋅d, 3⋅d…, of d. The unmarked numbers are not multiple of anything else and so, by definition, prime.

As O'Neill explains, in order to replicate this process with a computer program, there must be somehow a record of all "crossed-out" numbers. Her idea is to use a priority queue to readily access the next number to eliminate from the list. The key aspect of using a priority queue is that we can access the least element (according to some total order between them) in constant time. Then, for each prime p found, the queue is updated to record its multiples k⋅p needing crossing out. I'll show an implementation of this idea, using this time object-oriented imperative queues, in OCaml.

Suppose an object q of class queue possessing, at least, an operation q#peek to look at the least element, an operation q#drop to eliminate it, and an operation q#put to add an element to it. In order to test if a number n is prime, we need to see if the queue holds some prime multiple equal to it or not. The queue will store every known prime p together with its least current multiple d as a pair (p, d). When n is tested against d and either found prime or marked, the code eliminates d from the queue and updates it with p's next multiple:

let is_prime q n =
  let rec test pass =
    let (p, d) = q#peek in
    if d > n then begin
      if pass then q#put (n, n + n);
      pass
    end else begin
      q#drop;
      q#put (p, d + p);
      test (d < n)
    end
  in test true

Either n passes the test, that is, is prime, or not. In any case, the queue's elements are inspected in order to update them with their next multiple; if any is equal to n we know from that mark that it is composite; otherwise, it is added as a formerly unknown prime. The type of is_prime is:

val is_prime : < drop : 'a; peek : int * int; put : int * int -> unit; .. > -> int -> bool

that is, as explained above, the only requirements on the queue class is that it has the methods drop, peek, put of the right type. A naïve but suitable implementation using lists is:

class ['a] queue (cmp : 'a -> 'a -> int) = object
  val mutable elts : 'a list = []
  method elements = elts
  method clear = elts <- []
  method is_empty = elts == []
  method peek = match elts with
  | [] -> failwith "peek"
  | x :: _ -> x
  method drop = match elts with
  | [] -> failwith "drop"
  | _ :: xs -> elts <- xs
  method put (a : 'a) =
    elts <- List.merge cmp [a] elts
end

The queue is generic on the type of the arguments, and the constructor takes the comparison function needed to make the elements totally ordered. It keeps them in a list, and its core operation is put, which simply inserts the new element into the existing list of elements while keeping it sorted. The extra methods will come handy later. With this, it is simple to find out all the primes up to a given limit lim:

let sieve q lim =
  let rec loop i =
    if i > lim then [] else
    if is_prime q i
      then i :: loop (succ i)
      else loop (succ i)
  in
  q#clear;
  q#put (2, 4);
  2 :: loop 3

The downside of using an imperative queue is that we must be mindful of state, and make sure that we start from a known point: we have to prime (!) the sieve with a queue containing just the first known candidate, namely two; from there the process is straightforward. So much so that we can easily do better. First of all, note that there is no need whatsoever to keep track of the found primes in a list, as the queue itself already does that for us. Also, O'Neill's program uses a flywheel, a recurring list of increments to skip over known composite numbers. The paper uses this large flywheel to eliminate multiples of 2, 3, 5 and 7; I'll use an infinite (circular) list:

let sieve q lim =
  let rec loop i (h :: t) =
    if i > lim then () else
    let _ = is_prime q i in loop (i + h) t
  in
  let rec flywheel =
    2 :: 4 :: 2 :: 4 :: 6 :: 2 :: 6 :: 4 ::
    2 :: 4 :: 6 :: 6 :: 2 :: 6 :: 4 :: 2 ::
    6 :: 4 :: 6 :: 8 :: 4 :: 2 :: 4 :: 2 ::
    4 :: 8 :: 6 :: 4 :: 6 :: 2 :: 4 :: 6 ::
    2 :: 6 :: 6 :: 4 :: 2 :: 4 :: 6 :: 2 ::
    6 :: 4 :: 2 :: 4 :: 2 ::10 :: 2 ::10 :: flywheel
  in
  q#clear;
  q#put (11, 22);
  loop 13 (List.tl flywheel);
  2 :: 3 :: 5 :: 7 :: List.sort compare (List.map fst q#elements)

This is faster, but only by a constant amount; the time is dominated by the queue operations. Even though removing the minimum is done in constant time (that is, is O(1)), the cost C_list of inserting a new element is linear in the size of the heap, or O(π(n)). This π(n) is the number of primes less than n, which is O(n/log n), and must be done once for every element in the queue, for every integer up to n, for a total cost of n⋅π(n)⋅C_list = O(n³/log² n).

By lowering the cost of an insertion, even by offsetting it with an increased cost per removal we can do asymptotically better. The best imperative implementation of a priority queue is a heap: a totally balanced binary tree with the property that every non-leaf node's children are greater than the parent. I'll use an array to store the nodes, so that the node stored at index i has its children stored at indexes 2⋅i and 2⋅i + 1. Since the binary tree is totally balanced, it has at most ⌊lg n⌋ + 1 levels.

At any instant, the heap condition must be kept; that is, in a heap every parent is always less than or equal to its left child, which in turn is always less than or equal to its sibling. This, given the layout of nodes in the array, means that the minimum element is the root, which is found at the very beginning of the array:

I will use a class with a member array that will be dynamically resized to make room for new elements. Initially, it holds no elements; however, the initial array is constructed by "magic"-ing dummy values of the correct type. The elements of the heap are returned as a list in an arbitrary order:

class ['a] heap (cmp : 'a -> 'a -> int) = object (self)
  val mutable elts : 'a array = Array.make 8 (Obj.magic 0 : 'a)
  val mutable count = 0

  method elements =
    let res = ref [] in
    for i = 0 to count - 1 do res := elts.(i) :: !res done;
    !res

Now clear, is_empty and peek are trivial:

  method clear = count <- 0
  method is_empty = count == 0

  method peek =
    if self#is_empty then failwith "peek";
    elts.(0)

In order to remove the least element, we move the last element in the heap to the root, replacing it. After that, it must "sift down" the tree to its proper resting place; the method siftdown restores the heap condition:

  method drop =
    if self#is_empty then failwith "drop";
    count <- count - 1;
    elts.(0) <- elts.(count);
    self#siftdown

A simple variant avoids duplicating the effort needed to keep the heap condition between a removal of the minimum and the insertion of a new element:

  method replace (a : 'a) =
    if self#is_empty then failwith "replace";
    elts.(0) <- a;
    self#siftdown

In order to insert a new element, the code first checks that there is enough space for it and inserts it as the last node. Then siftup is called to bubble it up the tree until the heap condition is restored:

  method put (a : 'a) =
    self#check;
    elts.(count) <- a;
    count <- count + 1;
    self#siftup

Space is made for a further element by doubling the array and copying its items to the new array:

  method private check =
    if count == Array.length elts then begin
      let sz = 2 * count in
      let ar = Array.make sz elts.(0) in
      Array.blit elts 0 ar 0 count;
      elts <- ar
    end

siftup is the simpler of the two restoring procedures. It repeatedly compares the unsorted child with its parent, exchanging both if they are out of order and recurring:

  method private siftup =
    let rec sift i =
      if i > 1 then begin
        let p = i lsr 1 in
        if cmp elts.(p - 1) elts.(i - 1) > 0 then begin
          let t = elts.(i - 1) in
          elts.(i - 1) <- elts.(p - 1);
          elts.(p - 1) <- t;
          sift p
        end
      end
    in sift count

(this is essentially the iteration for an insertion sort). siftdown is complicated by the fact that it must compare and exchange each node with the greatest of its two children:

  method private siftdown =
    let rec sift i =
      let c = ref (i lsl 1) in
      if !c <= count then begin
        if !c + 1 <= count
        && cmp elts.(!c - 1) elts.(!c) > 0 then incr c;
        if cmp elts.(!c - 1) elts.(i - 1) < 0 then begin
          let t = elts.(i - 1) in
          elts.(i - 1) <- elts.(!c - 1);
          elts.(!c - 1) <- t;
          sift !c
        end
      end
    in sift 1
end

Both siftup and siftdown dominate the complexity of drop and put, respectively. Both must traverse every level of the tree, so they have a cost of C_heap = ⌊lg n⌋ + 1 = O(log n). By using this data structure, the sieving cost drops from O(n³/log² n) to n⋅π(n)⋅C_heap = O(n²).

Given this class heap, the code for sieve remains unchanged; however, I will take advantage of replace in is_prime to avoid a siftup after a drop and a put:

let is_prime q n =
  let rec test pass =
    let (p, d) = q#peek in
    if d > n then begin
      if pass then q#put (n, n + n);
      pass
    end else begin
      q#replace (p, d + p);
      test (d < n)
    end
  in test true

All this, of course, is very far from O'Neill's functional implementation, as the present code is decidedly imperative. In fact, I'm not sure that this performs better than the naïve imperative sieve using an array of booleans (or a bit vector) to record which number is crossed. It has, however, the advantage of using memory proportional to the number π(n) of primes found, not to the range n, and since the heap grows dynamically, can be used to find primes less than an arbitrary limit without the need to have to hard-code it.

2007-11-28

Expressiveness or Performance?

Don't pick just one!

Over reddit, a lively discussion on which language, Python or Ruby, is faster. In the comments, the usual suspects and the expected opinions. Except for Slava Pestov's (of Factor and jEdit fame), whose pointedness is on par with his stature as a programmer. I concur wholeheartedly with his opinion; in particular, his observation that

Slow language implementations are fundamentally less expressive because they give you less freedom in how you implement your code. You have to optimize […] a slow language is by definition not very expressive because it constrains your coding style […]

is right on the money. To which I'd like to add: when out shopping for a programming language, caveat emptor and choose a compiled, native one. The usual argument about turnaround time and programmer productivity is bogus: after all, machines were getting equally faster both for dumb interpreters and multi-pass compilers. And if what bogs you down are the compile-time errors, an interpreted language will not make you any more productive: as much as you'd like to, you cannot sink the costs of debugging and failing at execution time.

A More Elegant Necklace

In my previous post, I showed how to use rplacd to create a circular list with the elements of a given list. I used a "roving pointer", a reference to the last cell added to keep track of the next element to insert. There is a solution a little more elegant using a fold. The idea is to use the result of the fold as an accumulating parameter for the last element. This forces the fold to be a fold_left, and the resulting code is:

let necklace = function
| [] -> invalid_arg "necklace"
| h :: t ->
  let rec r = h :: r in
  List.tl (List.fold_left (fun p h ->
    let s = h :: r in
    ignore (rplacd p s);
    s) r t)

On return from the fold we get the last element of the list. Since it is circular, the element past the last one is the first, so we only need to take the tl. Of course, the alternative is to ignore the result of the fold_left, and simply return r.

2007-11-27

The Unsafe Clasp

Over the OCaml Group, Till Varoquaux asks if an arbitrary list can be tied over itself strictly, like you can do with recursive value definitions:

let rec arpeggio = 1 :: 2 :: 3 :: 2 :: arpeggio

In other words, a function necklace is sought such that:

⟨∀ l : |l| > 0 : ⟨∀ k ≥ 0 :: take k (necklace l) = take k (drop |l| (necklace l))⟩⟩

As Alain Frisch points out, it cannot be done safely, as only values of fixed shape (i.e., statically known) can be built recursively. However, with a parsimonious amount of unsafeness it can be done quite simply. I've written before of reversing trees in-place. The only operation needed is rplacd, that replaces the tail of a list with another one. With that, the only care needed is to copy the original list before closing the loop:

let necklace = function
| [] -> invalid_arg "necklace"
| h :: t ->
  let rec r = h :: r in 
  let p = ref r in
  List.iter (fun h ->
    let s = h :: r in
    ignore (rplacd !p s);
    p := s) t;
  r

Given a nonempty list, the code begins by building a singleton circular list on r. At each point, the code keeps a reference to the "tail" (i.e., the last node) of the circular list on p. With each further element of the original list, a new node pointing back to the beginning of the circular list is built on s, so that it both replaces and becomes the current tail.

Since at each step the cons cell is fresh, the original list is not modified and the circular list is built from scratch, cell by cell, always pointing back to itself. Hence the unsafe operation is completely hidden, and the code is safe.

2007-11-18

Counting Working Days (mostly for Profit)

The week seems such an orderly thing. Calendrical calculations are littered with conditionals and corner cases, and the little humble week with its staid repetition of seven days might appear deceptively manageable. Pun intended, since it hides a surprising difficulty when trying to count a number of working (or business) days from a given date in either direction. In the Western hemisphere, Saturdays and Sundays are almost everywhere days of rest. In any case, since every definition holds a kernel of arbitrariness in itself, in what follows I will assume a function weekday that, given a date t, returns the corresponding week day. Also, and as a sort of oracle, I have a predicate is_wday that returns true for a given date.

Aside: this might seem overcomplicated; the reason I abstract those out is that there are varying definitions for the starting day of the week. Business weeks start at Monday; with that proviso, let is_wday treat every multiple of seven as a Monday, so that
is_wday t ≡ 0 ≤ weekday t mod 7 < 5
And I say that is_wday is a sort of oracle since, by treating it as a black box, it can be made to return false not only for weekends but also for non-working holidays.

In what follows, I'll start with the naive way to skip a number of weekdays forwards or backwards, and I'll progressively refine the algorithm. The starting point is simple; I traverse the calendar from the given date, checking which of the days are work days, and accounting for them appropriately:

let wday_add tm delta =
  let i = if delta < 0 then -1 else 1 in
  let s = ref 0
  and d = ref delta in
  while !d != 0 do
    s := !s + i;
    if is_wday (dt_add tm !s) then d := !d - i
  done;
  dt_add tm !s

(The highlighted portion is the core of the algorithm; I'll massage it while keeping the rest invariant. In what follows, replace the block of code for the highlighted original.) The variable i is bound to the sign of the shift delta, so that the same loop traverses the calendar forwards or backwards. Then s is the accumulated total shift, and d is the number of weekdays yet to be accounted for. This quantity is brought to zero (accounting for direction) whenever s weekdays starting from the initial date are, in fact, working days. The last line finally applies this difference to the initial date and returns it.

This code works but is quite slow, since it carefully checks day by day if it is working or not. The first big insight is that every seven days we can account for exactly five workdays. In the limit, the ratio of workdays to weekdays approaches exactly 5/7; however, for small (that is, finite) shifts this ratio is imprecise. But with a little care we can skip ahead whole weeks:

while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i;
  while is_wday (dt_add tm !s) && i * !d >= 5 do
    s := !s + i;
    if is_wday (dt_add tm !s) then d := !d - i
  done;
done;

I kept the body of the original loop, and added another loop that, whenever there is a workday week or more left to count starting from a workday, counts whole weeks. This seems like a step backwards. It is a stepping-stone, a small rewrite that allows assuming in the inner loop that workdays are counted by fives and weekdays by seven:

while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i;
  while is_wday (dt_add tm !s) && i * !d >= 5 do
    s := !s + 7 * i;
    d := !d - 5 * i
  done
done;

This is actually faster, and the inner loop is evidently a division in disguise: d is decremented, say q times by 5q until the result is less than 5; s is the shift correspondingly incremented by 7q.

Note: This speedup is bought at a cost, however: the code would have stepped over non-working holidays if is_wday accounted for them; now, by leaping up by fives and seven, it decidedly assumes that every working week is five days long without exception. All is not lost, though; it is possible to tweak it to account for non-working holidays later.

Now the greatest q such that d - 5q < 5 is precisely, q = ⌊d/5⌋:

while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i;
  if is_wday (dt_add tm !s) && i * !d >= 5 then begin
    let q = i * !d / 5 in
    s := !s + 7 * i * q;
    d := !d - 5 * i * q
  end
done;

The inner loop is gone, but the loop guard must be kept to ensure that the skipping ahead by whole weeks is actually possible and valid. Note that the division involves i×d where i is d's sign; in other words the absolute value |d|. This is necessary to ensure that the division rounds to zero and the quotient is positive. Otherwise, negative (that is, towards past dates) deltas would overshoot the last week, leaving d negative and decreasing, and the loop would never end. Had the integer division built-in round-to-zero semantics all this would be, however, unnecessary. Fortunately, this is the case in virtually all modern computer architectures and languages, and the code simplifies a little bit more:

while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i;
  if is_wday (dt_add tm !s) && i * !d >= 5 then begin
    let q = !d / 5 in
    s := !s + 7 * q;
    d := !d - 5 * q
  end
done;

I could stop here. It might not be evident, but this loop will never execute more than seven times, since the remainder after the weekly jump does not exceed five, and the worst case for that is hitting a weekend either before or after it. There is, however, place for further simplification, and that comes from the fact that the loop condition and the inner guard overlap in a non-quite-obvious way. The standard technique to simplify the conditions is to repeat the loop, maintaining the guards:

while !d != 0 && not (is_wday (dt_add tm !s)) do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i
done;
if i * !d >= 5 && is_wday (dt_add tm !s) then begin
  let q = !d / 5 in
  s := !s + 7 * q;
  d := !d - 5 * q
end;
while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i
done;

The first copy of the loop prepares for the jump by skipping over a possible weekend. Then comes the jump and finally the last copy of the loop tidies up the remaining days to account for. Now, it is obvious that the first loop finishes by having shifted to a weekday, hence the second conjunct of the conditional is superfluous. Also, if |d| < 5, q = 0 (with the usual, round-to-zero semantics mentioned above), and the code can avoid a conditional at the price of two multiplications by zero.

while !d != 0 && not (is_wday (dt_add tm !s)) do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i
done;
let q = !d / 5 in
s := !s + 7 * q;
d := !d - 5 * q;
while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i
done;

There is not much more simplification to be done, except in the first loop, whose condition again overlaps with that of the inner conditional. Again, by splitting it in two and simplifying:

if !d != 0 && not (is_wday tm) then begin
  while not (is_wday (dt_add tm !s)) do
    s := !s + i
  done;
  d := !d - i
end;
let q = !d / 5 in
s := !s + 7 * q;
d := !d - 5 * q;
while !d != 0 do
  s := !s + i;
  if is_wday (dt_add tm !s) then d := !d - i
done;

From now on we are in the Land of Diminishing Returns. The first loop can execute at most twice, the only possibilities being a Saturday or a Sunday. Those two days can be special-cased with conditionals. However, the deltas must also be conditional on the direction in which we are counting days; the cascade is unwieldy. This code is satisfactory enough for me.

There is the added complication of non-working holidays, though. We need to account for the missed holidays after the division. It is not enough to adjust the shift s by the number of holidays missed by the jump, and let the last loop tidy things up: it could be that the extra shift landed us in a non-working day even if there are no remaining days left to account for. In this case, we could simply return the shifted date as is, or adjust it to the next working day; in this case, we must be careful not to enter the loop with zero days. The trouble is that, if the division was exact, the date overshoots by one day:

if !d != 0 && not (is_wday tm) then begin
  while not (is_wday (dt_add tm !s)) do
    s := !s + i
  done;
  d := !d - i
end;
let t0 = dt_add tm !s in
let q = !d / 5 in
s := !s + 7 * q;
d := !d - 5 * q;
s := !s + i * holidays_between t0 (dt_add tm !s);
while not (is_wday (dt_add tm !s)) do
  s := !s + i
done;
if !d != 0 then begin
  d := !d - i;
  while !d != 0 do
    s := !s + i;
    if is_wday (dt_add tm !s) then d := !d - i
  done
end;

(note that d might be off by one.) The alternative is to check whether the division was exact or not, and back up enough weeks to allow for the comprised holidays. By carrying this far enough, we'd land exactly where we started: there is no alternative but to check each calendar date individually to see if it is a working day or not.

2007-11-13

Gödelizing Turing

Down at O'Reilly, Nitesh Dhanjani makes an extremely well-reasoned point that you cannot outright dismiss static verification tools by an unthinking appeal to Turing completeness, even if in principle (or shall I say, in the limit) you are right. He argues that every static analysis is an approximation, and you should take at face value the analytic power of that approximation even though it never ceases to be one, and in truth it never fails to advertise that it is, in fact, an approximation in the first place.

The halting problem is a direct consequence of Gödel's Incompleteness Theorem. Turing titled his paper "On computable numbers, with an application to the Entscheidungsproblem", that is, he started from Hilbert's Decision Problem much as Gödel did. The upshot of this relationship is that decidable = halting, or incomplete = not provably halting (note, not provably non-halting). Thus, one way to make a meaningful halting analysis (or other kinds of static verification) possible is to use a restricted enough computing semantics to weaken the formal system to the point it is rendered decidable.

This is precisely what static typing systems do. A typing system is a logic (witness the Curry-Howard correspondence) that is powerful enough to be useful, but not so powerful that it becomes undecidable. The typechecker then is a theorem verifier in the logic induced by the typing rules. A typechecker for an undecidable type system would hang, that is, not halt, for some programs and their types, which is not quite desirable. Note that a typing system could be sound and complete, that is, decidable for checking but not for inferring; that is, even though all programs could be verified type-correct or not, some (or most) might not be proven so, without manual intervention from the programmer in the form of typing annotations. This is the case with Haskell's extended typing rules (overlapping instances of types), I believe; that's why Haskell programs tend to be more heavily annotated with types than ML programs, even though both nominally use the Hindley-Milner type system, which is decidably inferrable.

There are many kinds of approximate static verifications that are possible, useful and even in current use: aliasing (and the special case, linearity, that analyzes and removes allocation and garbage generation whenever possible), bounds-preserving in indexed accesses, side-effecting and purity, security, and of course, type soundness. Inasmuch these analysis treat data and control flow statically, in exactly the same way that a theorem verifier checks every step of the derivation tree, the end result is the proof that the verified properties are upheld, or, in the case of the approximations, an "I don't know" result that could be refined either with more information supplied by the programmer to the tool, or with further checks that can be automatically introduced to corroborate that the stated properties are met in runtime.

This, in other words, is the gist of the raging debate between static- and dynamic-typing proponents. The point being denounced by Dhanjani's post, and that I think is more often missed, is that this is not an "either-or" proposition, but only the extreme points in a continuum in which language designers have a conscious choice to make their language lie in.

2007-11-12

Tallying Grass

Or rather, how to become a vegetarian by discounting beings with a stomach.

In the last post, I showed how to count a kind of flat critters living on the integer-tiled plane (the two-dimensional integer lattice) called ominos. The ominos are sets of cells with integer coordinates that are connected horizontally or vertically; in other words, they are formed by square tiles that share an edge. This is the so-called F pentomino:

For every integer n ≥ 1, I showed how to generate (and count) all the n-celled ominos by extending the (n-1)-ominos, one cell at a time. Some of these ominos have holes, or stomachs in them; that is, they completely surround a region of the plane such that it remains disconnected from the outside region, or exterior. For instance, this 9-omino (or enneomino) has a stomach:

This isn't the simplest animal with a cavity. The obvious mutilation that results in a square ring isn't either, surprisingly. The simplest non-simply-connected omino (that's the medical, I mean technical term) is the following heptomino:

I want to wean my program off meat; that is, I want it to count only simple ominos. These I could call trees, but trees are a special kind of branching ominos that lack "blocks"; I can then call them vegetables, or veggies for short, as trees are a special case of vegetables. The problem is how to determine the realm an omino belongs to; in other words, I need to come up with a predicate that distinguishes the ominos that are simply-connected from those that aren't.

Inspecting cells one by one might seem like a plausible way to do that; however, simplicity is a global property of the omino, and no matter how clever the cell inspection is, it can be fooled by some configuration. The simplicity condition is geometrical in nature, and the test must somehow take this into account. Enter the Jordan Curve Theorem: a simply-connected region has a boundary that neatly divides the plane in just two regions, namely the interior and the exterior. By taking into account the simple geometry of the integer lattice and the connectivity properties of an omino's cells, a consequence of the theorem is that an omino is simple if and only if the exterior is comprised of one connected region.

I don't need to take into account the entire integer lattice; it is enough to consider a representative sample of the exterior cells. These are the cells that completely surround the omino; since our representation includes the omino's extents, including a further border of one cell is sufficient.

This rectangular region consists of all cells whose coordinates are, each independently, in range, so I first need to generate the list of all integers comprised by said range:

let rec range i j =
  if i > j then [] else
  i :: range (succ i) j

Then, I need to combine two ranges, each for the independent x- and y-coordinates into pairs, giving the list of cells. This is the familiar Cartesian product:

let cart l m =
  List.fold_right (fun i ->
    List.fold_right (fun j r -> (i, j) :: r) m) l []

Aside: This is perhaps more recognizable when expressed as a list comprehension:
cart l m = r
 where r = [(i, j) | i <- l, j <- m]

With that, a region is simply the set of comprised cells between the given coordinates:

let region (im, jm) (ix, jx) : C.t =
  C.of_list (cart (range im ix) (range jm jx))

Given an omino, its exterior is the set of cells in the including region not belonging to it:

let exterior o : C.t =
  C.diff (region (-1, -1) (o.wid, o.hgt)) o.cel

(Note how the exterior completely surrounds the omino by extending its bounds.) Now the hard part: determining if the resulting set contains one connected component, or more than one. If the former, the exterior is connected; if the latter, some cells are separated from the exterior by the omino's cells and it is non-simple. A graph traversal that "marks" or remembers visited cells is what I need:

let find_component s =
  let neighbors = C.inter s % flower in
  let rec visit visited pending =
    if C.is_empty pending then visited else
    let c = C.choose pending in
    let pending = C.remove c pending in
    let unvisited = C.diff (neighbors c) visited in
    visit (C.add c visited) (C.union unvisited pending)
  in
  visit (C.empty) (C.singleton (C.choose s))

The algorithm keeps a set of visited cells in parallell with a set of cells pending of visit. If there are no cells of the latter kind, we are done; if not, we choose arbitrarily one, discount it, find its yet-unvisited neighbors in the region and recur, accounting for the cell as visited and remembering its neighbors for further consideration. We jumpstart the process by beginning with an arbitrary cell in the set, and without cells yet visited. Since it starts with an arbitrary cell, the returned connected component is arbitrary. This is the standard algorithm, except that the choice of cells, instead of following a first-in, first-out discipline (breadth-first), or a last-in, first-out discipline (depth-first) is completely non-deterministic.

Now, our predicate is, quite simply:

let is_connected o =
  let t = exterior o in
  C.equal t (find_component t)

That is, if any of the connected components of the exterior comprise the entire exterior, there are no holes and the omino is simple. I'll make a modification to the extensions predicate to filter candidates by an arbitrary predicate:

let extensions p o =
  let candidates c : C.t = C.diff (flower c) o.cel
  and extend c : omino = canon (normalize (C.add c o.cel)) in
  let extend_at c : O.t -> O.t = C.fold (fun c s ->
    let o = extend c in
    if p o then O.add o s else s) (candidates c) in
  C.fold extend_at o.cel O.empty

(For efficiency, the predicate is fused with the fold.) Now I must modify extend_set accordingly:

let extend_set p s = O.fold (O.union % extensions p) s O.empty

To test everything so far, I'll check that, given the true predicate (that is, that which is true everywhere) I get the same results as the last time. For that, I will use the constant function which returns the same value for all its arguments:

let const k _ = k

Then, exactly as before:

# List.map O.cardinal (nest 10 (extend_set (const true)) o1) ;;
- : int list = [1; 1; 2; 5; 12; 35; 108; 369; 1285; 4655]

That is, A000105. But now, by using is_connected:

# List.map O.cardinal (nest 10 (extend_set is_connected) o1) ;;
- : int list = [1; 1; 2; 5; 12; 35; 107; 363; 1248; 4460]

Which agrees with A000104, the number of n-ominos without holes. This takes a while; an obvious optimization would be to use sets with O(1) membership test; for instance, hash sets.

Thanks to Alejandro and Boris for the insightful discussion that led me to this solution.

2007-11-06

Counting Cattle

Well, not exactly cattle, but a different kind of animals, the polyominos (or "ominos", for short). These are the kind of tiles you can form by gluing squares by their edges. The article "Generating Polyominoes" in The Monad Reader 5 presents a direct implementation of the naïve algorithm for generating the set of polyominos comprised of n squares. I decided to try my hand at doing the same in OCaml, while trying to write the simplest code I could come up with that solved the problem.

I try to use, whenever possible (and the monomorphic restriction allows!) composition of higher order functions. To make it more pleasant, I use the following operator:

let (%) f g x = f (g x)

Whenever introducing an operator, I find it sound practice to add to it enough structure to make it as generally applicable as possible. The identity of the composition is, unsurprisingly, the identity function:

let id x = x

I will be using sets extensively for computing ominos and their (ha!) sets. Unfortunately, the standard Set module in OCaml lacks a couple of useful functions. Fortunately, OCaml's module system is extremely flexible, and allows a kind of "extension by subclassing" that models pretty well the object-oriented mechanism of extension by inheritance:

module FSet(O : Set.OrderedType) = struct
  include Set.Make(O)
  let map f s = fold (add % f) s empty
  let of_list l = List.fold_right add l empty
end

I simply include the original module, and add the functions I need. The first addition to this module FSet (short for "functorial set") is a monomorphic map that applies a function f to each element of the set s, returning the set of results. The second function is the fix for a plain oversight of the module authors, that is, the lack of a function that returns the list of elements of a set, which is the inverse of Set.elements.

A cell is simply a pair of integer coordinates:

type cell = int * int

Cells will be treated individually, or as cell sets:

module C = FSet(struct
  type t = cell
  let compare (i, j) (i', j') =
    let c = compare j j' in
    if c <> 0 then c else
    compare i i'
end)

Cells are sorted so that those in the first row come before those in the rows above; in other words, the one with the least y-coordinate is lesser. An omino is a cell set together with its integer extents:

type omino = { wid : int; hgt : int; cel : C.t; }

Knowing how many cells high and wide the region occupied by the cells is will make some of the operations on ominos more efficient. Now, an omino set is, plainly, an FSet of ominos:

module O = FSet(struct
  type t = omino
  let compare o o' =
    let c = compare o.hgt o'.hgt in
    if c <> 0 then c else
    let c = compare o.wid o'.wid in
    if c <> 0 then c else
    C.compare o.cel o'.cel
end)

Again, the lesser omino is the shorter one, or the skinnier one, or the one whose cell set comes first. But, how do we know what the region occupied by a set of cells is? By traversing the cell set and recording the minimum and maximum coordinates seen:

let extents s =
  C.fold (fun (i, j) (mi, mj, xi, xj) ->
    (min mi i, min mj j, max xi i, max xj j))
    s
    (max_int, max_int, min_int, min_int)

Since I am not storing the origin of the region but just its extents, I need a way to normalize the cell set by translating it so that it lies in the first quadrant:

let normalize s =
  let (mi, mj, xi, xj) = extents s in
  { wid = xi - mi + 1; hgt = xj - mj + 1;
    cel = C.map (fun (i, j) -> (i - mi, j - mj)) s; }

(Here's where the extended map in FSet comes handy. Note that the extents are one more than the maximum x- and y-coordinates in the set.) Now, given a list of otherwise unconstrained cells, I build an omino from it simply by normalizing the resulting cell set:

let of_list = normalize % C.of_list

I want to enumerate ominos disregarding plane symmetries. That is, if two ominos can be transformed one into the other by a sequence of rotations and reflections, they are treated as one and the same omino. The first symmetry we have already seen, it is the identity transformation, as implemented by id. A rotation by a quarter turn transforms coordinates (x, y) into (-y, x). By taking into account the omino's extents, I don't need to normalize the result:

let rot o =
  { wid = o.hgt; hgt = o.wid;
    cel = C.map (fun (i, j) -> (o.hgt - 1 - j, i)) o.cel; }

Of course, the old width is the transformed height and vice versa. The third basic transformation is a flip that swaps cells around the horizontal axis by changing coordinates (x, y) into (x, -y):

let flip o =
  { o with
    cel = C.map (fun (i, j) -> (i, o.hgt - 1 - j)) o.cel; }

Again, I take advantage of the extents to perform the transformation in such a way that avoids the need for normalization. These basic transformations are sufficient to carry all the plane symmetries of a rectangular region. In technical terms they finitely generate, or give rise to the so-called dihedral group of order 4, or D₄. The remaining transformations are built up from these by composition:

let transforms = [
  id;
  rot;
  rot % rot;
  rot % rot % rot;
  flip;
  rot % flip;
  rot % rot % flip;
  rot % rot % rot % flip
]

The symmetries of a polyomino is the set resulting from applying to it each of these transforms in turn:

let symmetries o = O.of_list (List.map (fun x -> x o) transforms)

Aside: I could have fused the map and the fold implied in O.of_list, but doing this is mechanical and would have obscured this rather simple expression.

Given that I want to treat ominos related by a symmetry as identical, I need a way to pick a canonical representation of an omino. This is an arbitrary but fixed element in the set of transforms of the omino. Since ominos are totally ordered by O.compare, this can mean either the least or the greatest element of said set. I opted for the former:

let canon = O.min_elt % symmetries

I am now ready to start growing legs on our animals. I mean, start adding cells to our ominos. Since cells must be connected by an edge, the flower of a cell is the set of all four such neighbors:

let flower (i, j) =
  (C.add (i + 1, j) %
   C.add (i - 1, j) %
   C.add (i, j + 1) %
   C.add (i, j - 1)) C.empty

To extend an omino, we first need to find the candidate set of neighbors of any of its cells that actually fall outside the omino. Then, given a valid neighbor, I extend the omino by normalizing its cell set extended with the new cell. Then, focusing on a particular cell in the set, the set of extensions at that cell is generated by the flower of candidate neighbors. Note that the result of extend is a function from omino sets to omino sets; we will use it to accumulate into the final result the ominos generated by considering in turn all the extensions possible at each of the original omino's cells (I annotate the range types of the functions for documentation):

let extensions o =
  let candidates c : C.t = C.diff (flower c) o.cel
  and extend c : omino = canon (normalize (C.add c o.cel)) in
  let extend_at c : O.t -> O.t = C.fold (O.add % extend) (candidates c) in
  C.fold extend_at o.cel O.empty

The lifting of the extension of an omino to the extension of an omino set is straightforward:

let extend_set s = O.fold (O.union % extensions) s O.empty

(This is what is usually called a flat map, or a monadic join.) We are now ready to enumerate!

I start with the simplest possible omino set, the set of 1-celled ominos. Its only member is the (unique) one-celled omino:

let o1 = O.singleton (of_list [0,0])

To make things easy, instead of manually constructing the extension, the extension of the extension and so on, I'll use the iterated composition function:

let rec nest n f x =
 if n == 0 then [] else
 if n == 1 then [x] else
 x :: nest (pred n) f (f x)

This repeatedly applies the function f to its argument, up to n times. Instead of just returning the n-th iterate, fⁿ(x), it returns the list of partial iterates. This makes simple to compute the enumeration:

let enumerate_to n = List.map O.cardinal (nest n extend_set o1)

For instance, let's enumerate the ominos up to the decominos (the critters with exactly ten tiles in their bodies):

# enumerate_to 10 ;;
- : int list = [1; 1; 2; 5; 12; 35; 108; 369; 1285; 4655]

This takes a while, but the result does indeed coincide with the beginning of the On-Line Encyclopedia of Integer Sequences' A000105!

I'm usually surprised at the conciseness and elegance most Haskell programs achieve, through a combination of laziness, a great standard library and higher-order combinators. OCaml programs, in my experience, are much more low-level, and they tend to have a "procedural" flavor to them caused by too much explicit scaffolding being visible. Here, a judicious combination of the proper data structure (Sets) and some general-purpose recursive higher order functions make this implementation short, simple and manifestly correct.

Plane Geometry in Java (III)

See also Part I and Part II.

Now that we have vectors and points, I can show how a line on the plane can be implemented:

final class Line {
  public final Pt p;
  public final Vec v;

  public Line(Pt p, Vec v) {
    if (p == null || v == null)
      throw new IllegalArgumentException("Arguments cannot be null");
    this.p = p;
    this.v = v;
  }

  public boolean equals(Object o) {
    if (o == null || !(o instanceof Line)) return false;
    else if (this == o) return true;
    final Line that = (Line) o;
    return this.p.equals(that.p) && this.v.equals(that.v);
  }

  public int hashCode() {
    return 127 * p.hashCode() + v.hashCode();
  }

  public String toString() {
    return String.format("%s-->%s", p, v);
  }

A difference with previous value classes is that we must make sure that a Line is properly constructed with actual values; other than that, the implementation is unsurprising. A basic property of a line is its slope:

  public double slope()            { return v.arg(); }

Given two lines, we might want to know whether they are parallel or perpendicular:

  private static final double EPSILON = 1e-11;

  public boolean isPerpendicular(Line l) {
    return Math.abs(v.dot(l.v)) < EPSILON;
  }

  public boolean isParallel(Line l) {
    return isPerpendicular(l.perp(l.p));
  }

This is necessarily an approximation, since it involves a subtraction. In any case, the angle between two lines is the angle between its respective direction vectors:

  public double angle(Line l)      { return v.angle(l.v); }

Our first construction on lines gives the perpendicular through a given point:

  public Line perp(Pt q)           { return new Line(q, v.perp()); }

In particular, the median between to points is the perpendicular to the line comprising both and passing through their midpoint:

  public static Line median(Pt p, Pt q) {
    return p.through(q).perp(p.mid(q));
  }

Note how the definition formalizes the English description. The second, fundamental construction is the intersection point between two lines:

  public Pt meet(Line l) {
    final Vec u = l.v.perp();
    return p.at(v.scale(u.dot(p.to(l.p)) / u.dot(v)));
  }

This is a direct translation of Property 20. If the lines are parallel, u.dot(v) is zero and the division gives the point at infinity. Another usual construction is to find the foot of a point on a given line; that is, the nearest point to it on the line:

  public Pt foot(Pt q) {
    final Vec u = v.unit();
    return p.at(u.scale(u.dot(p.to(q))));
  }

To test whether two segments (that is, the subset of the line comprised between the base point and its translate by the direction vector) intersect, check if the quadrilateral formed by said points is convex or not:

  public boolean meets(Line l) {
    final Vec u = l.v.perp();
    final Vec w = p.to(l.p);
    final double r = u.dot(w);
    final double s = v.perp().dot(w);
    final double t = u.dot(v);
    return 0 <= r && r <= t && 0 <= s && s <= t;
  }

Sometimes is useful to specify a line in implicit form, as a⋅x + b⋅y + c = 0. This constructor does so:

  public static Line implicit(double a, double b, double c) {
    final Pt p = b != 0 ? new Pt(0, c / b) : new Pt(c / a, 0);
    final Vec v = b > 0 ? new Vec(b, -a) : new Vec(-b, a);
    return new Line(p, v);
  }

Conversely, the implicit coefficients of a line can be retrieved with:

  public double[] coeffs() {
    final Vec u = v.perp();
    return new double[] { u.i, u.j, -u.dot(p.to(Pt.ORIGIN)) };
  }
}

Now we have everything in place to program simple Plane Geometry constructions. For instance, we can check that the altitudes of a triangle meet at a point. First, it is convenient to have a way to produce triangles that checks for us that we passed sensible values:

public class Geom {
  static Pt[] triangle(double a, double b, double c) {
    double t;
    if (a < b) { t = a; a = b; b = t; }
    if (b < c) { t = b; b = c; c = t; }
    if (a < b) { t = a; a = b; b = t; }
    // Now a >= b >= c
    if (a >= b + c)
      throw new IllegalArgumentException("Unsatisfied triangle inequality");

Our triangle will have the origin as its first vertex, its second vertex on the positive x-axis, and the third on the first quadrant. To find the latter, a simple system of equations involving the Pythagorean Theorem suffices:

    final double x = 0.5 * (a * a + (b + c) * (b - c))/a;
    final double y = Math.sqrt(b * b - x * x);
    return new Pt[] { Pt.ORIGIN, new Pt(a, 0), new Pt(x, y) };
  }

Since we have to find the three altitudes, another utility function will save repetition. This one-liner reads exactly as a mathematical description:

  static Line altitude(Pt a, Pt b, Pt c) {
    return a.through(b).foot(c).through(c);
  }

(albeit backwards!) Maybe this justifies writing a Tri class? Finally, the test:

  public static void main(String[] args) {
    final Pt[] tri = triangle(3, 6, 8);
    final Line u = altitude(tri[0], tri[1], tri[2]);
    final Line v = altitude(tri[1], tri[2], tri[0]);
    final Line w = altitude(tri[2], tri[0], tri[1]);
    final Pt r = u.meet(v);
    final Pt s = v.meet(w);
    System.out.printf("r = %s, s = %s\nDistance between points: %f\n", r, s, r.dist(s));
  }

The result shows, as expected, the coincidence of both points of intersection:

r = (5.68750 , 6.88204), s = (5.68750 , 6.88204)
Distance between points: 0.000000

2007-11-05

Plane Geometry in Java (II)

See also Part I and Part III.

We have our vectors ready. Now, points are entirely analogous, insofar they are value objects:

final class Pt {
  public final double x;
  public final double y;

  public Pt(double x, double y)    { this.x = x; this.y = y; }

  public static final Pt ORIGIN = new Pt(0, 0);

  public boolean equals(Object o) {
    if (o == null || !(o instanceof Pt)) return false;
    else if (this == o) return true;
    final Pt that = (Pt) o;
    return this.x == that.x && this.y == that.y;
  }

  public int hashCode() {
    return (int) ((Double.doubleToLongBits(x) >> 32) & 0xffffffffL)
         ^ (int) ((Double.doubleToLongBits(y) >> 32) & 0xffffffffL);
  }

  public String toString() {
    return String.format("(%g , %g)", x, y);
  }

There is a distinguished point, the origin.

It is important to understand the need to duplicate code that is essentially the same as in Vec; in other words, it is not appropriate to refactor common code in classes Vec and Pt, for two reasons:

First, a vector is not a point, nor vice-versa. There is not even a meaningful common supertype between them. The whole point of using the typed algebra is to manipulate formulas with the added security afforded by the typing rules
Second, even if a private base class could abstract some code, the bulk of the code needed to correctly implement a base type is in equals. This method needs to cast to the static class of the object in order for the comparison to be meaningful (else it would be comparing a derived class to the base class), so the duplicate equals are essential anyway

The Algebra has two "constructors" linking points and vectors: the point constructor and the vector constructor. Both operate on points, so naturally they are members of the Pt class. Also, as Axiom 1 states, both operations are some kind of additive inverses one of the other:

  public Pt at(Vec v)              { return new  Pt(x + v.i, y + v.j); }
  public Vec to(Pt p)              { return new Vec(p.x - x, p.y - y); }

A natural operation on points is to find the distance between them. It is, quite simply, p.to(q).norm(). However for convenience I include it directly in Pt:

  public double dist(Pt p)         { return to(p).norm(); }

Linear interpolation, or lerp mixes two points to give a third:

  public Pt lerp(Pt p, double k)   { return at(to(p).scale(k)); }

In particular, the midpoint between two given points is:

  public Pt mid(Pt p)              { return lerp(p, 0.5); }

The set of all linear interpolants is obviously a line. The simplest way to determine one is however to give a point through which it passes and the direction it takes:

  public Line towards(Vec v)       { return new Line(this, v); }
  public Line through(Pt p)        { return new Line(this, to(p)); }
}

I will show next how to use lines.

2007-11-03

On the Integer Spiral

The problem: find a procedure to map the integers 0, 1, 2 … uniquely into points (i, j) ∈ Z² with integer coordinates, starting with 0 ↦ (0, 0), and continuing in a counterclockwise spiral pattern. The first few points are (0, 0), (1, 0), (1, 1), (0, 1), (-1, 1), (-1, 0), (-1, -1), (0, -1), (1, -1), (2, -1), (2, 0), (2, 1), (2, 2), (2, 1), (2, 0) … The idea is to find a bijective mapping between N and Z². Of course there are simpler ones, but the integer spiral is particularly pretty:

An instant's doodling on graphing paper shows that every complete turn of the spiral completely surrounds the origin with a square, each having 1, 8, 16, 24 … new points. The first, or "zero-th" such "square" is the origin itself, having "side" 1. Each new square's boundary points are at a distance 1 from those of the older one (except for the corner), so each new square has side two plus the older square's side, or l = 2⋅n + 1. An appeal to the inclusion-exclusion principle proves that the perimeter of a square is four times the side minus the four corners counted twice; hence, the perimeter is comprised of p = 8⋅n points. The total area of this square is, obviously a = (2⋅n + 1)² points.

The i-th point lands on the n-th square's perimeter if and only if it is not inside it, that is, if it doesn't land on the (n-1)-th square's area. The index starts at zero, so this condition is equivalent to a.(n-1) ≤ i < a.n. Massaging the inequality to extract n:

   a.(n-1) ≤ i < a.n
≡ { Definition a }
   (2⋅(n-1) + 1)² ≤ i < (2⋅n + 1)²
≡
   (2⋅n - 1)² ≤ i < (2⋅n + 1)²
≡ { i is nonnegative }
   2⋅n - 1 ≤ √i < 2⋅n + 1
≡ { Algebra }
   n ≤ (√i + 1)/2 < n + 1
≡ { Definition floor }
   n = ⌊(√i + 1)/2⌋

Let d = i - (2⋅n - 1)² be how far around the n-th square's perimeter the index falls. There are four sides to it, each thus having 2⋅n points. If d is less than 2⋅n it is mapped to the first (i.e., left) side; if it is less than 4⋅n it's mapped to the second (top) one, and so on. Let this side index be k = ⌊d/(2⋅n)⌋.

As the spiral goes, each side has start- and end-points that depend on k like this:

k	Start	End
0	(`n`, 1-`n`)	(`n`, `n`)
1	(`n`-1, `n`)	(-`n`, `n`)
2	(-`n`, `n`-1)	(-`n`, -`n`)
3	(1-`n`, -`n`)	(`n`, `n`)

So, if sx, sy, dx, dy are the signs and offsets applied to each starting point (sx⋅(n - dx), sy⋅(n - dy)) as a function of k, we have:

k	`dx`	`dy`	`sx`	`sy`
0	0	1	+1	-1
1	1	0	+1	+1
2	0	1	-1	+1
3	1	0	-1	-1

Note that dy.k = dx.(k - 1) and sy.k = sx.(k - 1), modulo 4. It is not difficult to see that dx.k = k % 2, and sx.k = 1 - k % 4 + k % 2.

Aside: To explain it in full: first, sx.k must have period four, hence the reduction of the argument modulo 4. Second, it takes two consecutive values in pairs; this means that 0 and 1, and 2 and 3 must be treated the same, hence the reduction modulo 2. Finally, the result is shifted so that the range is ±1.

Some algebraic manipulation gives the alternative sx.k = 1 - 2⋅⌊(k % 4)/2⌋, which when implemented with bitwise operators is the very compact 1 - (k & 2).

Similarly, we need to find two characteristic functions, or multipliers, to apply to the corner the side offset, call it o = d % (2⋅n); so that points are traversed first up, then left, then down and finally down:

k	`ox`	`oy`
0	0	+1
1	-1	0
2	0	-1
3	+1	0

But this is simply -sx⋅dx (resp. -sy⋅dy)!. Expanding and collecting equal terms, we have that the i-th point has coordinates (sx⋅(n - dx⋅(o + 1)), sy⋅(n - dy⋅(o + 1))).

Aside: Had I chosen instead to subtract the offset from the side's endpoint, I would have arrived to this directly, without finding the need to simplify the expression.

Hence, it all translates directly to OCaml as:

let spiral i =
  let n  = truncate (0.5 *. (sqrt (float i) +. 1.)) in
  let d  = i - (2*n - 1) * (2*n - 1) in
  let k  = d / (2*n)
  and o  = d mod (2*n) in
  let sx = 1 -  k    land 2
  and sy = 1 - (k+3) land 2
  and dx =      k    land 1
  and dy =     (k+3) land 1 in
  (sx * (n - dx * (o + 1)), sy * (n - dy * (o + 1)))

2007-11-02

Plane Geometry in Java (I)

See also Part II and Part III.

Algebras have computational content, and the Plane Geometry Algebra I presented is no exception. It's time to put it to test by implementing a DSL embodying it. I'll use value objects to represent geometric objects in the plane. The reason for this decision, if it is not obvious enough, is that points, vectors and lines, as geometric entities, have no meaningful state.

As pointed out in C2, value objects should be immutable; this implies that there is no point in hiding the instance variables behind setters, as the values are determined at the moment of construction. On the other hand, it is necessary to ensure proper value semantics by correctly implementing equals and hashCode.

I find it easiest to start with the class Vec, as vectors have a very well-known, rich structure that you will probably find familiar. But first, the basic Object contract:

final class Vec {
  public final double i;
  public final double j;

  public Vec(double i, double j)   { this.i = i; this.j = j; }

  public static final Vec ZERO = new Vec(0, 0);

  public boolean equals(Object o) {
    if (o == null || !(o instanceof Vec)) return false;
    else if (this == o) return true;
    final Vec that = (Vec) o;
    return this.i == that.i && this.j == that.j;
  }

  public int hashCode() {
    return (int) ((Double.doubleToLongBits(i) >> 32) & 0xffffffffL)
         ^ (int) ((Double.doubleToLongBits(j) >> 32) & 0xffffffffL);
  }

  public String toString() {
    return String.format("[%g , %g]", i, j);
  }

There is a distinguished vector, the zero vector.

In order for equals to be an equivalence relation, it must be closed on its domain (that is, it must compare meaningfully only Vecs, line 1), reflexive (line 2), symmetric and transitive. The equality operator in Java fulfills these properties, and so (line 4), using it to compare by value the coordinates ensures that the overall result conforms.

With respect to hashCode, there are many proposed methods to hash IEEE 754 doubles, but most aim at "chunking" buckets of nearby values by chopping off the least significant bits of the mantissa. To avoid clustering more in one coordinate than over the other, I mix symmetrically bits from both values using an exclusive-or.

Now, let's flesh out Vec's structure, not forgetting the perp operation in the Algebra:

  public Vec add(Vec v)            { return new Vec(i + v.i, j + v.j); }
  public Vec neg()                 { return new Vec(-i, -j); }
  public Vec scale(double k)       { return new Vec(k * i, k * j); }
  public Vec perp()                { return new Vec(-j, i); }

and endow it with a norm and a scalar product:

  public double dot(Vec v)         { return i * v.i + j * v.j; }
  public double norm()             { return Math.hypot(i, j); }
  public double arg()              { return Math.atan2(j, i); }
  public Vec unit()                { return scale(1/norm()); }

A natural operation on vectors is to find the enclosed (convex) angle between them. The dual operation is to rotate a vector by a given angle. It is well-known that the dot product between two plane vectors is proportional to the cosine of the comprised angle. By Theorem 16, the perp-dot product is proportional to the sine of the same angle. The cross product captures both proportionalities in vectorial form:

  public Vec cross(Vec v)          { return new Vec(dot(v), perp().dot(v)); }

This operation satisfies the following properties:

(0.0)  u × (v + w) = u × v + u × w
(0.1)  (u + v) × w = u × w + v × w
(0.2)  (k ⋅ u) × v = k ⋅ (u × v)
(0.3)  u × (k ⋅ v) = k ⋅ (u × v)
(0.4)  (-u) × v = u × (-v) = -(u × v)
(0.5)  (u × v)^⊥ = u × v^⊥
(0.6)  u × v = u^⊥ × v^⊥
(0.7)  u × v^⊥ = -u^⊥ × v

Properties (0.0) to (0.3) show that cross is linear on both arguments. This is a consequence of both the linearity of dot product and that of perp. From this, Property (0.4) is an easy consequence. Properties (0.4) to (0.7) capture the rotational invariance of the cross product, at least for quarter rotations and reflections. With this, we can define:

  public static Vec rotor(double a) {
    return new Vec(Math.cos(a), Math.sin(a));
  }

  public double angle(Vec v)       { return cross(v).arg(); }
  public Vec rotate(double a)      { return rotor(-a).cross(this); }
}

That angle correctly captures the intended functionality is clear from the definitions:

   u.angle(v)
= { Definition }
   u.cross(v).arg()
= { Definition }
   (new Vec(u.dot(v), u.perp().dot(v))).arg()
= { Properties of dot, perp-dot }
   (new Vec(|u|⋅|v|⋅cos.(θ.u.v), |u|⋅|v|⋅sin.(θ.u.v)).arg()
= { Definition arg, atan2 }
   atan.((|u|⋅|v|⋅sin.(θ.u.v)) / (|u|⋅|v|⋅cos.(θ.u.v)))
= { Algebra, definition tan }
   θ.u.v

Other properties of angle are:

(1.0)  θ.u.v = -θ.v.u
(1.1)  θ.u.v = θ.(k⋅u).v = θ.u.(k⋅v)
(1.2)  |v|⋅u.rotate(θ.u.v) = |u|⋅v
(1.3)  u × (u.rotate(a)) = |u|²⋅rotor(a)
(1.4)  θ.u.(u.rotate(a)) = a

Next, I will present points.

2007-10-30

Plane Affine Algebra (III)

See also Part I and Part II.

Finally, a theorem, in the style of the Elements:

To trisect a segment. Let the endpoints of the segment be P, Q. Let R be any point not on PQ. Let M be the midpoint of PR, N be the midpoint of QR, O the midpoint of PQ. Then let S be the midpoint of MO, and T be the midpoint of NO. Produce the line RS to PQ in X; similarly, produce the line RT to PQ in Y. Then PX, XY, YZ are equal to one-third of PQ.

Proof: We begin by calculating S and T:

   S
= { definition with M, O := ray.P.R.½, ray.P.Q.½ }
   ray.(ray.P.R.½).(ray.P.Q.½).½
= { (18), twice }
   ray.(P → ½·‹R−P›).(P → ½·‹Q−P›).½
= { (18) }
   P → ½·‹R−P› → ½·‹(P → ½·‹Q−P›)−(P → ½·‹R−P›)›
= { (3); linearity }
   P → ½·(‹R−P› + ‹(P → ½·‹Q−P›)−(P → ½·‹R−P›)›)
= { (9) }
   P → ½·(‹R−P› + ‹P−P› + ½·‹Q−P› − ½·‹R−P›)
= { (6); linearity }
   P → ¼·(‹R−P› + ‹Q−P›)

From considerations of symmetry, T = S[P, Q := Q, P]. We have:

(21)  S = P → ¼·(‹R−P› + ‹Q−P›)
(22)  T = Q → ¼·(‹R−Q› + ‹P−Q›)

Finally, we calculate X and Y:

 X
= { definition }
 meet.P.Q.R.S
= { (20); (18) }
 P → (‹S−R›^⊥⋅‹R−P›)/(‹S−R›^⊥⋅‹Q−P›)·‹Q−P›
= { (21) }
 P → (‹(P → ¼·(‹R−P› + ‹Q−P›))−R›^⊥⋅‹R−P›)/(‹(P → ¼·(‹R−P› + ‹Q−P›))−R›^⊥⋅‹Q−P›)·‹Q−P›
= { (2), twice }
 P → ((‹P−R› + ¼·(‹R−P› + ‹Q−P›))^⊥⋅‹R−P›)/((‹P−R› + ¼·(‹R−P› + ‹Q−P›))^⊥⋅‹Q−P›)·‹Q−P›
= { (5), twice; linearity throughout }
 P → ((¼·‹Q−P› − ¾·‹R−P›)^⊥⋅‹R−P›)/((¼·‹Q−P› − ¾·‹R−P›)^⊥⋅‹Q−P›)·‹Q−P›
= { (10) and (11), twice }
 P → ((¼·‹Q−P›^⊥ − ¾·‹R−P›^⊥)⋅‹R−P›)/((¼·‹Q−P›^⊥ − ¾·‹R−P›^⊥)⋅‹Q−P›)·‹Q−P›
= { linearity of inner product, twice }
 P → (¼·‹Q−P›^⊥⋅‹R−P› − ¾·‹R−P›^⊥⋅‹R−P›)/(¼·‹Q−P›^⊥·‹Q−P› − ¾·‹R−P›^⊥⋅‹Q−P›)·‹Q−P›
= { (14), twice }
 P → (¼·‹Q−P›^⊥⋅‹R−P›)/(−¾·‹R−P›^⊥⋅‹Q−P›)·‹Q−P›
= { (12) }
 P → (¼·‹Q−P›^⊥⋅‹R−P›)/(¾·‹Q−P›^⊥⋅‹R−P›)·‹Q−P›
= { algebra }
 P → ⅓·‹Q−P›

To finalize, from considerations of symmetry, Y = meet.Q.P.R.T = (meet.P.Q.R.S)[P, Q := Q, P]. Hence:

(23)  X = P → ⅓·‹Q−P›
(24)  Y = Q → ⅓·‹P−Q›

Alaska Ataca a Kamtchatka

2007-12-13

Naturally Sorted

2007-12-11

The Alphanumeric Sort

2007-12-05

Functional Refactoring

2007-12-01

The Genuine Sieve of Eratosthenes

2007-11-28

Expressiveness or Performance?

A More Elegant Necklace

2007-11-27

The Unsafe Clasp

2007-11-18

Counting Working Days (mostly for Profit)

2007-11-13

Gödelizing Turing

2007-11-12

Tallying Grass

2007-11-06

Counting Cattle

Plane Geometry in Java (III)

2007-11-05

Plane Geometry in Java (II)

2007-11-03

On the Integer Spiral

2007-11-02

Plane Geometry in Java (I)

2007-10-30

Plane Affine Algebra (III)

About Me

Statement of Rights

Labels

Blog Archive