I often find it is much more rewarding to tackle an algorithmic or mathematical problem from an experiential perspective rather than from a formal one. Trying to get some test data for an earlier program, I thought about a couple of ways of generating random increasing sequences.

Intent on generating `n` random points 0 = `x _{0}` ≤

`x`≤ … ≤

_{1}`x`= 1 on the unit interval, I thought the most efficient way to do it was to at each step randomly split the remaining interval. This suggested itself:

_{n-1}rnd0[n_Integer?Positive] := Append[NestList[# + RandomReal[1 - #]&, 0., n - 2], 1.]

(`NestList`

generates a list of `n`+1 elements.) The result was less than satisfying:

ListLinePlot[rnd0[10], PlotRange -> All]

Not good; in fact, more points show that it indeed wasn't what I wanted:

ListLinePlot[rnd0[100], PlotRange -> All]

It is apparent that the not after long a pretty large chunk of the interval is carved off, and the remaining points compete for less and less room, giving rise to a knee. Well, back to the drawing board. It then occurred to me that instead of dividing the remainder I could step away from the last point a random distance. The problem with this is that I couldn't know in advance how far I'd end; hence, I needed a way to normalize the resulting random walk:

scale[l_List] := With[{h = l[[-1]] - l[[1]]}, (l - l[[1]])/h]

The sequence could be generated simply with:

rnd1[n_Integer?Positive] := scale[Accumulate[Array[RandomReal[1]&, n]]]

(`Accumulate`

gives the list of partial running sums.) To see how I fared, I plotted some points:

ListLinePlot[rnd1[10], PlotRange -> All]

Not bad; however, 100 points showed that the result was a *bit* too smooth:

ListLinePlot[rnd1[100], PlotRange -> All]

What to do now? Sometimes the most obvious solution glares so intently to you that you can't help but notice it the last, if at all. Since the difference between a random sequence and a *monotone* random sequence is order, why not try to sort a plain old bunch of random points?

rnd2[n_Integer?Positive] := scale[Sort[Array[RandomReal[1] &, n]]]

(I used `scale`

to ensure that the first and last points where 0 and 1 respectively. I could have added them to a sequence `n`-1 points long just as well.) Few points showed an encouraging picture:

ListLinePlot[rnd2[10], PlotRange -> All]

Indeed, a few big jumps nicely interspersed with smaller increments. More points still maintain the picture of satisfying roughness:

ListLinePlot[rnd2[100], PlotRange -> All]

Now I wasn't satisfied with this. Is there a way to generate a just-rough-enough sequence like this without having to do an O(`n` log `n`) sorting pass? Time to investigate a bit *what* exactly is the origin of the roughness. One thing I found that distracted me from seeing this quality was the monotonicity. Maybe by looking at the detrended sequences could shed some light into this. The detrending is simply achieved by subtracting the line between the first and last points:

norm[l_List] := With[{h = l[[-1]] - l[[1]], n = Length[l] - 1}, l - l[[1]] - Table[i*h/n, {i, 0, n}]]

(This is more general than I needed, as by construction the trend is the line `y` = `i`/`n`.) I didn't feel that investing analysis on `rnd0`

was necessary. This is what `rnd1`

looks like detrended:

ListLinePlot[norm[rnd1[100]], PlotRange -> All]

And this is `rnd2`

:

ListLinePlot[norm[rnd2[100]], PlotRange -> All]

If there was a difference I couldn't see it. But maybe there is some other way to look at the data. Maybe this roughness is a quality of the *differences*, the jumps between values in the sequence. Now the difference themselves would also be random, albeit with a different underlying distribution. And that's exactly the right idea; since I'm after all accumulating differences, what should their underlying distribution be to achieve the right degree of roughness? I needed first a way to compute histograms of the differences:

hist[k_Integer?Positive, l_List] := With[{d = Differences[l]}, With[{hi = Max[d], lo = Min[d]}, With[{st = k/(hi - lo)}, Module[{h = Array[0&, k + 1]}, Do[h[[Floor[st (x - lo)] + 1]]++, {x, d}]; h]]]]

(The fact that `With`

binds in parallel, like ML's `let`

is not very practical, as *Mathematica*'s syntax is a bit heavy.) The function `hist`

counts frequencies in `k` bins. Now, after seeing this:

ListLinePlot[hist[50, rnd1[10000]], PlotRange -> All]

it struck me as obvious: by construction, `rnd1`

differences are uniformly distributed. The mean 200 is simply 10,000 points divided by 50 bins. The last bin is empty as, since it's evident from the code above, I used a last value as a sentinel for the maximum possible value. Here is `rnd2`

:

ListLinePlot[hist[50, rnd2[10000]], PlotRange -> All]

Now I was getting somewhere. The histogram looked like an exponential distribution. I could have read the article and tried to determine λ to within some confidence interval, in effect testing the hypothesis that the differences were indeed exponentially distributed; or I could try fitting an exponential to the data to see if it would lead me to somewhere. This is what I finally tried:

j = hist[100, rnd2[10000]]; FindFit[j, a k Exp[-k x], {a, k}, x]{a -> 10420.3, k -> 0.0990637}

A reasonable fit: the `a`⋅e^{-λ} is approximately 10,000, the number of points. To really *see* how good the fit was I needed, however, a picture:

Plot[Evaluate[ a k Exp[-k x] /. %], {x, 0, 50}, Epilog -> MapIndexed[Point[{#2[[1]], #1}]&, j]]

Very tight fit, indeed. With this, I could directly generate a monotone sequence with the right distribution of jumps, in one pass without having to sort:

rnd3[n_Integer?Positive] := scale[Accumulate[Array[-Log[1 - RandomReal[1]]/n&, n]]]

(I used the standard algorithm to generate an exponentially distributed variate.) Graphically, there was no appreciable qualitative difference between `rnd2`

and `rnd3`

:

ListLinePlot[rnd3[100], PlotRange -> All]

And the histogram, as I expected it would be by construction, was likewise similar:

ListLinePlot[hist[50, rnd3[10000]], PlotRange -> All]

I was initially quite satisfied after this hands-on approach to discovering the right algorithm. This satisfaction would prove to be ephemeral, as I realized none of this explained the *why*. Maybe it's time to consult The Art of Computer Programming, but that would be for another post.

## No comments:

Post a Comment