2010-02-21

Braun Trees

I had a couple of interesting comments on my post about heaps and priority queues. The first is by Jérôme and is an indictment against my wrong, naïve use of Obj.magic to fake initializing extensible Vecs in the presence of double-word unboxed float arrays. There are a number of avenues to repair the defect:

  1. Use a functorial interface and monomorphize the type of vectors
  2. Use an exemplar of the intended type to initialize the backing array
  3. Specialize the code for float arrays
  4. Use an explicit box to store values, most probably an option

A second reader, Damien Guichard suggested sidestepping the problem entirely and using a dynamic data structure, Braun trees as a flexible array. There aren't many online references about the data structure, the most notable being Okasaki's "Three Algorithms on Braun Trees" (Functional Pearl), but Damien provided code that served as a starting point. A Braun tree is a complete binary tree with the property that for any given node the left subchild has exactly the same size of the right subchild or one more element than it. In other words, a Braun tree of size n + 1 is a node and two children l and r, both Braun trees, such that |l| = ⌈n/2⌉ and |r| = ⌊n/2⌋, which makes |r| ≤ |l| ≤ |r| + 1. As Okasaki remarks, Braun trees have always minimum height, and their shape is determined only by the number of nodes in them.

There is a choice between an imperative implementation and a purely functional one. Damien's code in the comment is imperative, and I found that rewriting it to introduce tail-recursion was tedious and error-prone; Okasaki's "exceptionally simple and elegant" algorithms devolve into a tangle of special cases. I opted by a purely functional implementation upon which to build an imperative, mutable interface compatible with the previous version of the priority queue. The important thing to consider is to maintain in every case the invariant for the trees. I'll use a different signature than before, one specific to this data structure:

module Braun : sig
  type 'a t
  val empty     : 'a t
  val singleton : 'a -> 'a t
  val is_empty  : 'a t -> bool
  val min       : 'a t -> 'a
  val get       : 'a t -> int -> 'a
  val size      : 'a t -> int
  val rep       : ('a -> 'a -> int) -> 'a -> 'a t -> 'a t
  val ins       : ('a -> 'a -> int) -> 'a -> 'a t -> 'a t
  val del       : ('a -> 'a -> int) -> 'a t -> 'a t
end = struct (* … *)

Note that rep, ins and del take a comparison function to maintain the heap property, namely, that the root of the tree is less than the elements in either children. This is unsafe as nothing precludes passing different comparison functions to manipulate the tree; this is why I will wrap it in the final Heap signature. That said, the constructor for trees is a straightforward binary tree ADT:

type 'a t = E | N of 'a * 'a t * 'a t

Construction, testing for emptiness and returning the minimum are trivial:

let empty       = E
and singleton x = N (x, E, E)

let is_empty = function E -> true | N _ -> false

let min = function
| N (e, _, _) -> e
| E           -> failwith "empty heap"

To calculate the size of a Braun tree I follow Okasaki verbatim:

let rec diff h n = match h with
| E           when n = 0 -> 0
| E                      -> assert false
| N (_, _, _) when n = 0 -> 1
| N (_, l, r)            ->
  if n mod 2 = 1 then diff l ((n - 1) / 2) else diff r ((n - 2) / 2)

let rec size = function
| E           -> 0
| N (_, l, r) -> let m = size r in 2 * m + 1 + diff l m

For the details I defer to the explanation in the paper; the idea is to avoid traversing the entire tree and exploiting the Braun property by calculating the size from the shape of each left child of the right spine; this reduces the complexity to O(log² n). For completeness, Braun trees support random access in logarithmic time:

let rec get h i = match h with
| E                      -> failwith "index out of bounds"
| N (e, _, _) when i = 0 -> e
| N (_, l, r)            ->
  if i mod 2 = 1 then get l ((i - 1) / 2) else get r ((i - 2) / 2)

The meat of the stew is the heap operations: insertion, and deletion or replacement of the minimum. All three operations must simultaneously keep two invariants: the Braun property and the heap property. Replacing the minimum doesn't change the shape of the tree, only the values stored in it, so it only has to maintain the heap property. The algorithm is the functional analog to the binary heap's siftdown:

let rec rep compare e = function
| E           -> failwith "empty heap"
| N (_, l, r) -> siftdown compare (N (e, l, r))

and siftdown compare n = match n with
| N (e, (N (el, _, _) as l), E) ->
  if compare e  el < 0 then n
    else N (el, rep compare e l, E)
| N (e, (N (el, _, _) as l), (N (er, _, _) as r)) ->
  if compare e  el < 0
  && compare e  er < 0 then n else
  if compare el er < 0
    then N (el, rep compare e l, r)
    else N (er, l, rep compare e r)
| _ -> n

It performs a case analysis to determine the least among the left and the right nodes (if the latter exists) to swap the greater root value downwards the tree. In contrast, insertion does change the shape of the tree. From |r| ≤ |l| ≤ |r| + 1 we can conclude that |l| ≤ |r| + 1 ≤ |l| + 1; this means that inserting on the right child and exchanging it with the left one maintains the Braun property. The value in the root bubbles down to its proper place to maintain the heap property:

let rec ins compare x = function
| E           -> singleton x
| N (e, l, r) ->
  if compare x e < 0
    then N (x, ins compare e r, l)
    else N (e, ins compare x r, l)

This definitely is elegant. Deletion is almost as streamlined, but with a twist:

let rec del compare = function
| E           -> failwith "empty heap"
| N (_, t, E) -> t
| N (_, E, _) -> assert false
| N (_, (N (e, _, _) as l), (N (e', _, _) as r)) ->
  if compare e e' < 0
    then N (e ,               r, del compare l)
    else N (e', rep compare e r, del compare l)

The first and second case are the trivial base cases. By the Braun property, if the right child is empty the left child can at most hold one item, and the third case exhausts the pattern asserting the impossibility. The fourth, last case is the recursive step. To eliminate a node while maintaining the Braun property we must essentially undo the swap in the insertion by deleting on the left child and exchanging it with the right one. To maintain the heap property, however, we must select the lesser of both children. If it happens to be the left one we can do just that. If it is the right one we can't blindly delete its minimum as we could end up in violation of the Braun invariant by widening the size difference to 2. This makes necessary to replace the minimum on the right with the minimum on the left, which in effect amounts to a tree rotation. With this the module is complete:

end

To build an imperative heap on Braun trees is a simple matter of wrapping the module, as I've remarked above:

module Heap : sig
  type 'a t
  val make       : ('a -> 'a -> int) -> 'a t
  val is_empty   : 'a t -> bool
  val length     : 'a t -> int
  val min        : 'a t -> 'a
  val replacemin : 'a t -> 'a -> unit
  val removemin  : 'a t -> unit
  val insert     : 'a t -> 'a -> unit
end = struct
  type 'a t = { compare : 'a -> 'a -> int; mutable root : 'a Braun.t; }
  let make compare = { compare = compare; root = Braun.empty }
  let is_empty   h   = Braun.is_empty h.root
  let length     h   = Braun.size h.root
  let min        h   = Braun.min h.root
  let replacemin h x = h.root <- Braun.rep h.compare x h.root
  let removemin  h   = h.root <- Braun.del h.compare   h.root
  let insert     h x = h.root <- Braun.ins h.compare x h.root
end

This is exactly the same interface as before. That said, I'm not very clear on the advantages of Braun trees compared to, say, binomial heaps or pairing heaps supporting merge operations in constant time. In any case, if you need the fastest priority queues and you don't mind using an imperative implementation, array-based binary heaps are the simplest, most understood data structure around. You just need a good implementation of extensible arrays.

As an aside, I have a confession to make. To reach this result I've refactored the code several times and at each step I checked that I didn't break the Braun heap invariants by running a unit test fixture against every change, with the help of a modified version of this simple test harness. I'm not a fan of TDD, as I prefer the logic and invariants to guide me in coding. The tests were a good sanity check to assert quickly if I was careful all along or if I made a mistake. In this sense it is a good safety net for exploratory programming, after the type-checker and the compiler verify that everything is right. I stress the fact that I still think that rigorous reasoning about the code comes first, types come second and exploratory programming using test is a follows from that.

3 comments:

bluestorm said...

"That said, I'm not very clear on the advantages of Braun trees compared to, say, binomial heaps or pairing heaps supporting merge operations in constant time. In any case, if you need the fastest priority queues and you don't mind using an imperative implementation, array-based binary heaps are the simplest, most understood data structure around. You just need a good implementation of extensible arrays."

This is calling for some benchmarks.

Matías Giovannini said...

@bluestorm: I know, but I'm trying to shirk my way out of actually doing that ;-) Next week I'll be on vacation so I'll have plenty of time to tinker. We'll see.

Damien Guichard said...

You have improved on my half-hearted code that preserved only the Heap property to a thoughtful Braun Heap datatype that maintain both Heap and Braun properties all the way.

Thanks you for the effort.