2010-01-19

A Staggering Sequence

Happy new year! Inspired by N. J. A. Sloane's Seven Staggering Sequences, I started thinking about the Gijswijt sequence, A090822 in the OEIS. What appealed to me was the notion of "word logarithm" implicit in its definition:

We begin with b(1) = 1. The rule for computing the next term, b(n + 1), is again rather unusual. We write the sequence of numbers we have seen so far, b(1), b(2),… b(n) in the form of an initial string X, say (which can be the empty string ∅), followed by as many repetitions as possible of som nonempty string Y. That is, we write b(1), b(2),… b(n) = XYk, where k is as large as possible.

I wrote on paper a direct translation of the definition into an "implicit division" function in Haskell:

divide s y      = k
  where s       = x ++ power k y
        power k = concat . replicate k

This motivated me to try and plunge head-first into it! Given such a "division" operation of one string s by a non-empty suffix y, finding the logarithm is easy. First, we need some imports; then, the logarithm itself:


> import Data.List (tails)
> 
> logarithm s = maximum . map (divide s) . init . tails $ s

That is, find among all the suffixes of s the one that can be factored out from it the greatest number of times. Looking up the suffix repeatedly gives rise to a quadratic algorithm; but any problem on suffixes can be turned into a problem on the prefixes of the reversed strings. Because of this, I'll be sloppy and use prefixes and suffixes interchangeably. Counting the length of the longest common prefix of two lists is easy (edit simplified the original recursive function):


> prefixLength :: Eq a => [a] -> [a] -> Int
> prefixLength xs = length . takeWhile (uncurry (==)) . zip xs

In order to know how many times the string y is a prefix of s, it suffices to find the length of the common prefix between s and an infinity of repetitions of y:


> divide s y = n `div` length y
>   where n  = prefixLength (cycle . reverse $ y) (reverse s)

That's divide, now for the hard part. The sequence definition makes each element depend on all the preceding ones. In effect, it requires a function nest such that nest f e = [e, f [e], f [e, f [e]], f [e, f [e], f [e, f [e]]], …]. I thought of tying the knot on the tails of the result, like this (warning! broken code):

nest f e = xs where xs = e : (map f . tail . inits $ xs)

Unfortunately, inits is too strict, and goes off trying to match a corecursive stream against the empty list. The solution is to explicitly take longer and longer prefixes with a co-induction index:


> nest :: ([a] -> a) -> (a -> [a])
> nest f e       = xs
>   where xs     = e : from 1
>         from i = f (take i xs) : from (i + 1)

(note the funny type). With that in place, the sequence is:


> a090822 = nest logarithm 1

A quick check shows that everything works:

Main Data.Maybe Data.List> fromJust . elemIndex 4 $ a090822
219

(the quadratic complexity of next shows here). Don't try looking for the elemIndex 5 a090822. Read the paper for details.

Thanks to ddarius, opqdonut and Berengal on #haskell.