Proofs and Rabbits

(This is a translation of a 2002 article)

A colleague asked me, "How can I tell how many bits a number needs without dividing repeatedly by two?", to which I quickly replied with:

ν.N = ⌊log.N / log.2⌋ + 1

and, after staring at it for a while I added, "It's pretty easy to see where this comes from."

Mentally trying to organize the proof I met a first hurdle: how to justify the term "+ 1". Since I was committed to a proof outline, I resorted to pulling a rabbit out of the hat.

For starters, we know that the definition of the floor function is:

(0)  ⌊x⌋ = N  ≡  Nx < N + 1

with N an integer; as a consequence, for N an integer:

(1)  ⌊x + N⌋ = ⌊x⌋ + N

I'm aiming at putting in symbols the following definition: if the number N is greater than or equal to a power of two 2k, then it needs ν.N = k + 1 bits for storing it; so (and this is the rabbit) I premultiply to compensate for the extra bit:

  2k ≤ 2·N < 2k + 1
≡ { taking logarithms on both sides }
  k·log.2 ≤ log.2 + log.N < (k+1)·log.2
≡ { dividing on both sides by log.2 }
  k ≤ 1 + log.N / log.2 < k + 1
≡ { (0) }
  k = ⌊1 + log.N / log.2⌋
≡ { (1), commutativity }
  k = ⌊log.N / log.2⌋ + 1

The justification for starting with N is more heuristic than anything else (meaning, since I know where I want to arrive I can define from where I will start). The key fact that 2k needs k + 1 in its binary representation, one "1" bit and k "0" bits doesn't change. Hence, the following alternative proof:

  2k - 1N < 2k
≡ { taking logarithms on both sides }
  (k-1)·log.2 ≤ log.N < k·log.2
≡ { dividing both sides by log.2 }
  k - 1 ≤ log.N / log.2 < k
≡ { (0) }
  k - 1 = ⌊log.N / log.2⌋
≡ { algebra }
  k = ⌊log.N / log.2⌋ + 1

This proof is essentially the same as the former, but it not only avoids the rabbit but also having to appeal to (1).


Borges' "The Golem"

Prompted by this nice but rather free translation of Borges' "El Golem", here's my attempt. I tried to be as literal as possible; I've changed some words that were obviously chosen by Borges not for their meaning but to preserve the rhyme ("soga" in the eleventh stanza, for instance, is rather gratuitous).

If (as one Greek states in the Cratyle)
the name is archetype for the thing,
in the letters for rose is the rose
and all of the Nile in the word Nile.

So, made of consonants and vowels,
there'd be a terrible Name, the essence
of God its cipher, that Omnipotence
guards in letters and syllables full.

Adam and the stars knew it
in the Garden. Sin's stain
(so the kabbalists say) erased it
and the many generations lost it.

The cunning and candor of man
have no end. We know that in their day
God's own people searched for the Name
in the small hours of the Jewry.

Unlike that of some other vague
shadow betrayed in vague history,
there is still fresh and living memory
of Judah Loew, a rabbi in Prague.

Thirsty to see what God would see,
Judah Loew gave in to permutations
with letters in such complex variations
that he at last uttered the Name that is Key.

Portal, Echo, Host and Palace,
upon a doll with clumsy hands
he engraved, and taught it the strands
of Word, of Time and Space.

Through dreamy lids was this likeness
confounded by forms and colors,
utterly mixed in subtle rumors
and made its first timid movements.

By small degrees, like us it was
imprisoned in this resounding net
of Before, After, Yesterday, While, Now,
Left, Right, I, You, Them, Others.

(The kabbalist that gave it home
this vast creature nicknamed Golem;
these truths are told by Scholem
in a learned passage of his tome.)

The rabbi taught to it the universe
"My foot, and yours; here is a clog."
After some years this thing perverse
could sweep, well or not, the Synagogue.

It could have been a miswriting,
or an error uttering the Holy Name;
despite so high a spell, it did not
learn to speak, this apprentice of man.

Its eyes, less a man's than a dog's
and so much less of dog than of thing,
tracked the rabbi through the trembling
shadows of their closed quarters.

Something odd and crude was in the Golem,
since out of its way the rabbi's cat
scurried. (This cat is not in Scholem
but, across time, I can glimpse that.)

Raising its pious hands to God
it mimed his God's devotions
or, dull and smiling, it sank
in hollow oriental genuflections.

The rabbi looked upon it with pride
and with some horror. How (he mused)
could I give birth to a pitiful son
and lose the sanity of inaction?

Why did I add yet another symbol
to the infinite Series? Why bring
to the vain skein spun by eternity
another cause, another effect and pain?

In that hour of dread and blurred light,
his eyes lingered on his Golem.
Who will tell us, what did God feel,
looking upon His rabbi in Prague?

I'm extremely thankful to psykotic whose input (there over reddit) was invaluable. Without it this would be much worse than it is now.


On the Boolean type

(This is a reprint of a 2003 article, with corrections)

Poring over the code of an open-source program, I felt uneasy upon stumbling over this fragment:

inline bool
Tuner::FftCalc::isPeak(int i, double *buf) throw()
    if (buf[i] > buf[i - 1] && buf[i] > buf[i + 1])
        return true;
    return false;

I do not think it is pertinent to denounce the author: when in doubt, abstain; I will only reflect on the why and not on the who, nor on issues of programmer quality in open-source projects.

C++ is one of the languages in the Algol genus. As such, it shares with its relatives the delineating trait of having block-structured syntactic (otherwise called static) scope: variables declared local to a scope don't survive its ending. This is the main characteristic that defines block-structured, or Algol-like languages.

There is another characteristic inherited from Algol by these languages, one designed into it purposefully and from the start: the presence of a fully functional, bona fide, Boolean data type (see EWD1298, "Under the spell of Leibniz's Dream", p. 6, for a discussion of the genesis of the Boolean type). Not only does it sport operators giving rise to the full two-valued Boolean algebra, but Boolean expressions are first-class citizens that can be assigned to variables, passed to procedures and functions and returned from those.

Aside: Minimally, modern programming languages embody three algebraic structures: the Boolean algebra B, the integer field Z/2NZ modulo some power of two (N = 64 is now fashionable) and the free monoid Σ* over some alphabet (Σ = Unicode is now fashionable). The floating-point numbers, also usually present, don't seem to fit into any recognizable algebra, since they are not associative, not Archimedean and not even closed (cf IEEE 754 denormalized numbers).

Bearing this in mind, I will start by rewriting the program fragment (simplifying it in the process to eliminate details extraneous to this exercise) to make explicit the fact that the condition being tested is not a predicate, but a Boolean value:

bool isPeak(int i, double *buf)
    bool condition;
    condition = buf[i] > buf[i - 1] && buf[i] > buf[i + 1];
    if (condition)
        return true;
    return false;

The formal justification for the next transformation is somewhat convoluted, but essentially reduces to the fact that wp."return; S".P = wp."return".P, for any statement S and predicate P (for the notation wp, see the Wikipedia entry on predicate transformers); that is, instructions after a return are unreachable. Hence, after the conditional, ¬condition holds, and I can now restore its alternative branch:

bool isPeak(int i, double *buf)
    bool condition;
    condition = buf[i] > buf[i - 1] && buf[i] > buf[i + 1];
    if (condition)
        return true;
        return false;

I can then reify the return value the same way I did with the condition, by introducing a variable, say, result:

bool isPeak(int i, double *buf)
    bool condition, result;
    condition = buf[i] > buf[i - 1] && buf[i] > buf[i + 1];
    if (condition)
        result = true;
        result = false;
    return result;

Note that in the process I have taken outside the conditional the repeated return result statements.

The program has now a regular structure, that allows me to reconstruct the predicates satisfied after the execution of every statement:

bool isPeak(int i, double *buf)
    bool condition, result;
    condition = buf[i] > buf[i - 1] && buf[i] > buf[i + 1];
    if (condition)
        // condition
        result = true;
        // conditionresult
        // ¬condition
        result = false;
        // ¬condition ∧ ¬result
    // (conditionresult) ∨ (¬condition ∧ ¬result)
    return result;

I will now massage the last predicate to reveal the, admittedly obvious, relationship between condition and result:

  (conditionresult) ∨ (¬condition ∧ ¬result)
≡ { Golden Rule }
  conditionresult ≡ ¬condition ∧ ¬resultconditionresult ∧ ¬condition ∧ ¬result
≡ { Contradiction }
  conditionresult ≡ ¬condition ∧ ¬result ≡ false
≡ { Definition of ¬ }
  conditionresult ≡ ¬(¬condition ∧ ¬result)
≡ { DeMorgan }
  conditionresult 𠪪(conditionresult)
≡ { Involution }
≡ { Golden Rule }

Thus the conditional reduces to the assignment statement, the minimal program that ensures that the two variables have the same value:

bool isPeak(int i, double *buf)
    bool condition, result;
    condition = buf[i] > buf[i - 1] && buf[i] > buf[i + 1];
    // conditioncondition
    result = condition;
    // conditionresult
    return result;

We see that the introduced variables served far more than as mere pedagogical devices: they permitted the restructuring of the program into a shape amenable to formal reasoning, according to the motto "name thy unknowns". That rôle fulfilled, I can eliminate them:

bool isPeak(int i, double *buf)
    return buf[i] > buf[i - 1] && buf[i] > buf[i + 1];

This is the function that the programmer should have written in the first place. Why didn't he?

I must by necessity speculate on the reasons of this widespread mental block. One plausible cause is that even if programmers might be somewhat familiar with predicate logic, they are not accustomed to manipulate Boolean objects in an algebraic, calculational fashion. Another one could be exposure to premature (and irrelevant!) operational considerations: thinking that the "computer" must "conditionally jump" to evaluate a "Boolean condition", believing that the conditional statement is "compiled to conditional jumps", et cœtera.

Aside: I shouldn't have cared, but out of curiosity I've compiled all seven versions of the program using GCC 3.1 (a mediocre compiler in terms of generated code, since it doesn't recognize that if (P) { S; return; } Q; can always be transformed to if (P) { S; return; } else Q;). With a suitable optimization level, every version but the first, longer by two instructions, compiled to exactly the same PPC code. The lesson one can derive from this is that broken symmetry is always costly, not only in terms of reduced expressive and reasoning power, but also in terms of lost efficiency.

On Quicksort

An array is a data-type complying with the following interface:

module type ARRAY =
  type 'a t
  val create : int -> 'a -> 'a t
  val length : 'a t -> int
  val get : 'a t -> int -> 'a
  val set : 'a t -> int -> 'a -> unit

together with a number of predicates that characterize the array as a data structure that, in fact, stores data. The OCaml built-in Array module satisfies the definition:

module Arr : ARRAY with type 'a t = 'a array =
  type 'a t = 'a array
  include Array

Also, Bigarrays or dynamic vectors can be fitted easily to this interface.

It is well-known that to sort the elements a.(li < h) of an array a between indexes l and h we can use the in-place Quicksort:

let rec quicksort a l h =
  if h - l > 1 then
    let m = partition a l h in begin
    quicksort a l m;
    quicksort a m h

for a suitable definition of partition ("suitable", here, means that after the call, a.i ≤ a.m < a.j for li < mj < h). With this in mind, we would like to write a generic (on the actual data structure with array semantics), in-place Quicksort, with the following declaration:

module Quicksort : functor (A : ARRAY) ->
  val qsort : 'a A.t -> unit
  val quicksort : 'a A.t -> int -> int -> unit

with the intention of using qsort a as a shortcut for quicksort a 0 (A.length a).

There are many ways to find the pivot a.m that allows partitioning the array a in disjoint pieces. The ultimate goal is to make those pieces as close as "halves" of a as possible; for heavily skewed partitions, Quicksort devolves into a very poorly performing algorithm.

Aside: This is why it is a better a priori engineering decision to use Heapsort, and not Quicksort, as the standard library sort routine. Even if the best-case cost of the former is more than twice that of the latter, Heapsort has guaranteed cost Θ(n⋅log.n).

An often-recommended choice of pivot is the median of elements a.l, a.h and a.m, for some index l < m < h. Any choice of m does the trick equally well (or rather, there is no objective choice of one over any other); for symmetry, we settle for m = ⌊ (l + h) / 2 ⌋.

The first task is to actually select the median element. Since at all times partition must satisfy the ordering condition, there is no penalty to sort the elements used to find the pivot. The usual imperative algorithm is a sequence of three compare-and-swap operations among the three array elements:

let sort3 a l h =
  let m = (l + h) lsr 1 in
  if A.get a l > A.get a m then swap a l m;
  if A.get a m > A.get a h then swap a m h;
  if A.get a l > A.get a m then swap a l m;

where swap is:

let swap a i j =
  let t = A.get a i in
  A.set a i (A.get a j);
  A.set a j t

In the best case, 6 reads to the array are needed. In the worst case, 12 reads and 6 writes are necessary to sort the three-element subarray. In any case, exactly 3 comparisons are performed every time. No more than three comparisons are necessary; this is because 22 < 3! < 23. However, we can sometimes do better than that; transitivity of the order relation allows us to save one comparison that could otherwise be deduced from the other two.

A comparison tree is an ordered (plane) binary tree that has comparisons between elements as (inner) nodes and permutations as leaves:

Comparison tree of order three

A path from the root to a leaf indicates the sequence of comparisons necessary to identify that particular permutation. Note that the comparisons are conceptually done in the same order as the previous code fragment (low with middle, then middle with high, and finally low with middle), but implicitly reflect the required swaps to put the permutation in order. Once determined, the permutation requires between 0 and 3 writes to be put in order. Storing the array elements in temporaries gives the following code:

let sort3 a l h =
  let m = (l + h) lsr 1 in
  let al, am, ah = A.get a l, A.get a m, A.get a h in
  if al > am then begin
    if al > ah then begin
      if am > ah then begin
        A.set a l ah;
        A.set a h al;
      end else begin
        A.set a l am;
        A.set a m ah;
        A.set a h al;
    end else begin
      A.set a l am;
      A.set a m al;
  end else if am > ah then begin
    if al > ah then begin
      A.set a l ah;
      A.set a m al;
      A.set a h am;
    end else begin
      A.set a m ah;
      A.set a h am;
  end else

No path sets more than 3 variables, and there is a path that doesn't set any, as I claimed before. Now, the invariant for partition must be established by putting elements less than a.m to the left of m, and those greater than it to its right. We have a choice about what to do about those elements that are equal to a.m; Bentley moves them to the left partition using the traditional "pointer crossing" algorithm. A more equitable alternative would be to use Dijkstra's Dutch National Flag algorithm to move them to their own block:

let swap3 a x l h =
  let i = ref (h+1)  (* x < a.(iph) *)
  and j = ref (l-1)  (* x > a.(lpj) *)
  and k = ref l in   (* x = a.(j < p < k) *)
  while !k != !i do
    let y = A.get a !k in
    (* x : a.(k <= p < i) *)
    if x > y then begin
      incr j;
      A.set a !k (A.get a !j);
      A.set a !j y;
      incr k
    end else if y > x then begin
      decr i;
      A.set a !k (A.get a !i);
      A.set a !i y
    end else
      incr k
  !i, !j

The result of the algorithm, as the invariants show, are the indices delimiting each partition. The advantage of this slightly more complicated program is that, if the array contains repeated elements, they need not be considered for comparison purposes on the recursive invocations. So we see that partition is completely specified by swap3 a (sort3 a l h) l h. Everything is in place to recursively sort the partitioned elements. It is standard to recur on the smaller partition and iterate on the larger to guarantee a maximum Ω(log.n) stack depth. Hence, the following:

let rec qsort a l h =
  if h - l > 0 then begin
    let i, j = swap3 a (sort3 a l h) l h in
    if j - l <= h - i
      then begin qsort a l j; qsort a i h end
      else begin qsort a i h; qsort a l j end

Putting all the functions together in a module Quicksort is left as an exercise for the reader.


Negabonacci Numbers

Reading an old Brian Hayes' column got me thinking: what happens to the Fibonacci recurrence

F.0 = F.1 = 1
F.n = F.(n-1) + F.(n-2)

when some of the terms are subtracted instead of added? If every term is subtracted we get something not very exciting, namely, the negative Fibonacci numbers. If, however, some terms are added while some others are subtracted, I thought, maybe the result would be more interesting. I settled on two possibilities:

T.0 = T.1 = 1
T.n = (-1)n·T.(n-1) + T.(n-2)


U.0 = U.1 = 1
U.n = U.(n-1) + (-1)n·U.(n-2)

Generating some of the terms readily revealed some patterns:

0 1 1 1+
1 1 1 1-
2 2 2 2+
3 3-1 1-
4 5 1 3+
5 8-2 2-
6 13-1 5+
7 21-1 3-
8 34-2 8+
9 55 1 5-
10 89-113+
11144 2 8-
12233 121+
13377 113-
14610 234+

The first striking thing is that T.n seems to take only a limited number of values. Indeed, a simple computer experiment shows that:

tau[0] = 1; tau[1] = 1;
tau[n_Integer] := tau[n] = (-1)^n tau[n - 1] + tau[n - 2]

Table[tau[i], {i, 0, 20}]
  {1, 1, 2, -1, 1, -2, -1, -1, -2, 1, -1, 2, 1, 1, 2, -1, 1, -2, -1, -1, -2}

and the hypothesis is that T.n has period 12: T.(n+12) = T.n. There are essentially three strategies to prove this:

  1. Show that |T.(n+3)| = |T.n| and that sgn.(T.(n+6)) = -sgn.(T.n), that is, the sequence of absolute values has period 3 while the signs alternate every 6 terms. Then, the overall period of the sequence must be 12
  2. Expand successively T.(n+12), applying the recursive definition and simplifying
  3. Calculate successively T.(n+2), T.(n+3), … T.(n+12) and show that the last is indeed T.n

Addendum: As an anonymous commenter remarks, there is an obvious (that is, if you wouldn't let yourself get sidetracked by pretty patterns as I apparently was) strategy: use induction. You can read all of it in the comments, but the insight that a finitely-generated recurrence needs as many base cases for induction as there are in the recurrence is all that it is needed: Assume that m < n with mn (mod 2); then T.m = T.n ∧ T.(m+1) = T.(n+1) ⇒ T.(m+k) = T.(n+k), for k ≥ 0. A simple expansion of the recurrence on both sides of the equality shows that, if both m and n have the same parity, the signs of the first terms are the same and hence, by an appeal to induction, the equality stands.

While I concede that this proof could hardly be simpler, I derive no great insight from it other than, indeed, the recurrence is periodic; what I proved originally gives me the terms of all such sequences obtained by freely choosing the original terms A, B.

Proving (1) is no easier than proving (2) or (3). Setting myself to prove (2), I quickly realized that the process would be exceedingly tedious. I used Mathematica to assist me:

taurule = tau[n_ + i_Integer] :>
  (-1)^(n + i) tau[n + (i - 1)] + tau[n + (i - 2)] /; i ≥ 2;

FullSimplify[tau[n + 12] //. taurule, n ∈ Integers && n ≥ 2]

where //. (or ReplaceRepeated) repeatedly applies the reduction rule until the expression doesn't change anymore. After a while, Mathematica computes the result tau[n], proving the theorem mechanically.

After some thought, I realized that even if the top-down approach is tedious to perform by hand, a bottom-up calculation can be done in at most twelve steps since the theorem is indeed true. Call A = T.n, B = T.(n+1), s = (-1)n (and note that s2 = 1). Then:

  1. T.(n+2) = (-1)n+2 T.(n+1) + T.n = s·B + A, by definition
  2. T.(n+3) = -s·T.(n+2) + T.(n+1) = -s·A

    This establishes the first lemma needed to prove the theorem.

  3. T.(n+4) = s·T.(n+3) + T.(n+2) = s·B
  4. T.(n+5) = -s·T.(n+4) + T.(n+3) = -s·A - B
  5. T.(n+6) = s·T.(n+5) + T.(n+4) = -A

    This establishes the second lemma needed to prove the theorem. There is no need to proceed further, but for completeness, these are the next steps:

  6. T.(n+7) = -B
  7. T.(n+8) = -s·B - A
  8. T.(n+9) = s·A
  9. T.(n+10) = -s·B
  10. T.(n+11) = s·A + B
  11. T.(n+12) = A

Hence, at any point, the sequence has the shape:

A, B, s·B + A, -s·A, s·B, -s·A - B, -A, -B, -s·B - A, s·A, -s·B, s·A + B, A

where s is +1 or -1, depending on where the subsequence starts.

What about U.n? Well, it seems that it is just two interleaved copies of F.n. Indeed:

U.(2·k)     = F.(k+1)
U.(2·k+1) = F. k

And this is quite simple to prove by induction. First, note that U.0 = 1 = F.1 and U.1 = 1 = F.0, establishing the base case. For the inductive step:

U.(2·k+1) = U.(2·k) + (-1)k+1 U.(2·k-1)
             = F.(k+1) - F.(k - 1)
             = F.k + F.(k - 1) - F.(k - 1)
             = F.k
U.(2·k)     = U.(2·k-1) + (-1)k U.(2·k-2)
             = F.(k-1) + F.(k)
             = F.(k+1)

where the second step for each case is justified inductively by an appeal to the opposite case.

There's always…

…a first time for everything. In my case, the year 2007 has been the year for my first cellular phone (!) and for my first blog. It's been a while since I wanted a place to hang my feuilles d'esquisses to dry. Nothing fancy (I am, after all, notfancy online!), nothing complicated.