2008-06-09

Random Experiments

I often find it is much more rewarding to tackle an algorithmic or mathematical problem from an experiential perspective rather than from a formal one. Trying to get some test data for an earlier program, I thought about a couple of ways of generating random increasing sequences.

Intent on generating n random points 0 = x0x1 ≤ … ≤ xn-1 = 1 on the unit interval, I thought the most efficient way to do it was to at each step randomly split the remaining interval. This suggested itself:

rnd0[n_Integer?Positive] := 
 Append[NestList[# + RandomReal[1 - #]&, 0., n - 2], 1.]

(NestList generates a list of n+1 elements.) The result was less than satisfying:

ListLinePlot[rnd0[10], PlotRange -> All]

Not good; in fact, more points show that it indeed wasn't what I wanted:

ListLinePlot[rnd0[100], PlotRange -> All]

It is apparent that the not after long a pretty large chunk of the interval is carved off, and the remaining points compete for less and less room, giving rise to a knee. Well, back to the drawing board. It then occurred to me that instead of dividing the remainder I could step away from the last point a random distance. The problem with this is that I couldn't know in advance how far I'd end; hence, I needed a way to normalize the resulting random walk:

scale[l_List] := With[{h = l[[-1]] - l[[1]]}, (l - l[[1]])/h]

The sequence could be generated simply with:

rnd1[n_Integer?Positive] := 
 scale[Accumulate[Array[RandomReal[1]&, n]]]

(Accumulate gives the list of partial running sums.) To see how I fared, I plotted some points:

ListLinePlot[rnd1[10], PlotRange -> All]

Not bad; however, 100 points showed that the result was a bit too smooth:

ListLinePlot[rnd1[100], PlotRange -> All]

What to do now? Sometimes the most obvious solution glares so intently to you that you can't help but notice it the last, if at all. Since the difference between a random sequence and a monotone random sequence is order, why not try to sort a plain old bunch of random points?

rnd2[n_Integer?Positive] := scale[Sort[Array[RandomReal[1] &, n]]]

(I used scale to ensure that the first and last points where 0 and 1 respectively. I could have added them to a sequence n-1 points long just as well.) Few points showed an encouraging picture:

ListLinePlot[rnd2[10], PlotRange -> All]

Indeed, a few big jumps nicely interspersed with smaller increments. More points still maintain the picture of satisfying roughness:

ListLinePlot[rnd2[100], PlotRange -> All]

Now I wasn't satisfied with this. Is there a way to generate a just-rough-enough sequence like this without having to do an O(n log n) sorting pass? Time to investigate a bit what exactly is the origin of the roughness. One thing I found that distracted me from seeing this quality was the monotonicity. Maybe by looking at the detrended sequences could shed some light into this. The detrending is simply achieved by subtracting the line between the first and last points:

norm[l_List] := 
 With[{h = l[[-1]] - l[[1]], n = Length[l] - 1}, 
  l - l[[1]] - Table[i*h/n, {i, 0, n}]]

(This is more general than I needed, as by construction the trend is the line y = i/n.) I didn't feel that investing analysis on rnd0 was necessary. This is what rnd1 looks like detrended:

ListLinePlot[norm[rnd1[100]], PlotRange -> All]

And this is rnd2:

ListLinePlot[norm[rnd2[100]], PlotRange -> All]

If there was a difference I couldn't see it. But maybe there is some other way to look at the data. Maybe this roughness is a quality of the differences, the jumps between values in the sequence. Now the difference themselves would also be random, albeit with a different underlying distribution. And that's exactly the right idea; since I'm after all accumulating differences, what should their underlying distribution be to achieve the right degree of roughness? I needed first a way to compute histograms of the differences:

hist[k_Integer?Positive, l_List] := With[{d = Differences[l]},
  With[{hi = Max[d], lo = Min[d]}, 
   With[{st = k/(hi - lo)}, 
    Module[{h = Array[0&, k + 1]}, 
     Do[h[[Floor[st (x - lo)] + 1]]++, {x, d}]; h]]]]

(The fact that With binds in parallel, like ML's let is not very practical, as Mathematica's syntax is a bit heavy.) The function hist counts frequencies in k bins. Now, after seeing this:

ListLinePlot[hist[50, rnd1[10000]], PlotRange -> All]

it struck me as obvious: by construction, rnd1 differences are uniformly distributed. The mean 200 is simply 10,000 points divided by 50 bins. The last bin is empty as, since it's evident from the code above, I used a last value as a sentinel for the maximum possible value. Here is rnd2:

ListLinePlot[hist[50, rnd2[10000]], PlotRange -> All]

Now I was getting somewhere. The histogram looked like an exponential distribution. I could have read the article and tried to determine λ to within some confidence interval, in effect testing the hypothesis that the differences were indeed exponentially distributed; or I could try fitting an exponential to the data to see if it would lead me to somewhere. This is what I finally tried:

j = hist[100, rnd2[10000]];

FindFit[j, a k Exp[-k x], {a, k}, x]
{a -> 10420.3, k -> 0.0990637}

A reasonable fit: the a⋅e is approximately 10,000, the number of points. To really see how good the fit was I needed, however, a picture:

Plot[Evaluate[ a k Exp[-k x] /. %], {x, 0, 50}, 
 Epilog -> MapIndexed[Point[{#2[[1]], #1}]&, j]]

Very tight fit, indeed. With this, I could directly generate a monotone sequence with the right distribution of jumps, in one pass without having to sort:

rnd3[n_Integer?Positive] := 
 scale[Accumulate[Array[-Log[1 - RandomReal[1]]/n&, n]]]

(I used the standard algorithm to generate an exponentially distributed variate.) Graphically, there was no appreciable qualitative difference between rnd2 and rnd3:

ListLinePlot[rnd3[100], PlotRange -> All]

And the histogram, as I expected it would be by construction, was likewise similar:

ListLinePlot[hist[50, rnd3[10000]], PlotRange -> All]

I was initially quite satisfied after this hands-on approach to discovering the right algorithm. This satisfaction would prove to be ephemeral, as I realized none of this explained the why. Maybe it's time to consult The Art of Computer Programming, but that would be for another post.

No comments: