2012-01-06

Perlin's Simplex Noise

Graphics pioneer Ken Perlin is well-known for his work on procedural generation of textures. Recently he rewrote his classic Noise algorithm to reduce the complexity of generating pseudorandom gradients in higher dimensions by transforming a problem on N-cells to one on N-simplices. I couldn't find on his NYU home page a write-up by his own hand; apparently the only implementation exists in his Java applet. There is a well-known exegesis by Stefan Gustavson that completely rewrites the algorithm; I've also found a write-up by Perlin himself in a collection of course notes. The latter shows an extremely terse implementation in Java (Appendix B) that purports to correspond directly with hardware operations implemented in a specialized pipelined processor. I'll show an unraveled re-implementation of that Java algorithm that exactly corresponds to Prof. Perlin's, in the sense that the outputs are the same, but with the magic hopefully explained clearly. I will assume, however, that you are familiar with the original algorithm; if not, read Gustavson's analysis for a clear introduction to its workings.

The key to Perlin's Noise is that it interpolates pseudo-random gradients attached to lattice points. In the original algorithm, those gradients are computed by successive lookups using a permutation table. The new algorithm is geared towards an 8.8 fixed-point format; instead of tables it uses bit-slicing operations to compute two pseudo-random 3-bit hashes from the lattice points:

```let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7
```

I've expressed the `patterns` in octal to stress the fact that the result is a 6-bit number. `btst` simlpy extracts the b-th bit from n. `bmix` takes a 3-bit slice from the b-th bits of i, j and k, in that order, and uses it to index on `patterns`. `shuffle` adds up `bmix`es of eight circular rotations of (i, j, k) for each bit 0, 1 … 7. The procedure seems rather ad-hoc, but in fact produces rather well-distributed 3-bit fields among all the (28)3 = 16 Mi combinations of (i, j, k):

NP[Lower 3 bits = N]P[Upper 3 bits = N]
00.1244812011718750.125709712505340576
10.12536621093750.126831173896789551
20.1250.128595232963562
30.12463378906250.127514779567718506
40.1255187988281250.123155772686004639
50.12463378906250.120522260665893555
60.1250.12257838249206543
70.12536621093750.125092685222625732

These 3-bit hashes are used to compute a magnitude for vector (x, y, z) by selecting each coordinate with probability 3/4, applying a random sign to each with probability 1/2 and adding the resulting values:

```let magnitude h (x, y, z) =
let p, q, r = match h land 7 with
| 0 -> z , x , y
| 1 -> x , y , 0.
| 2 -> y , z , 0.
| 3 -> z , x , 0.
| 4 -> z , x , y
| 5 -> x , 0., z
| 6 -> y , 0., x
| 7 -> z , 0., y
| _ -> assert false
in match (h lsr 3) land 7 with
| 0 -> -. p -. q +. r
| 1 -> +. p -. q -. r
| 2 -> -. p +. q -. r
| 3 -> +. p +. q +. r
| 4 -> +. p +. q -. r
| 5 -> -. p +. q +. r
| 6 -> +. p -. q +. r
| 7 -> -. p -. q -. r
| _ -> assert false
```

This amounts to finding the dot product of the vector (x, y, z) with a random gradient on the unit cell. Generating all combinations and extracting the gradient vectors we get the following diagram:

Each dot shows the frequency with which that vector is selected; the diagonal vectors (±1, ±1, ±1) occur half as frequently as the mid-edge vectors (±1, ±1, 0) —and rotations—, to account for their greater magnitude and thus weight. All this is done without multiplying! If you study Perlin's paper and code, you will notice that he uses bit operations to select each of the x, y and z coordinates and to add or subtract them, to avoid using tables; however, the code doesn't quite correspond to the explanation given in the text. I've unrolled the conditionals into two jump tables in a way that duplicates the effect of the code; thus the first jump table is not equivalent to the one given in the text.

The key to the efficiency of this algorithm is not just in the use of bit operations for generating pseudo-random gradients but in the subdivision strategy used to combine them in ℝN. In the original algorithm, each of the 2N vertices in the N-cell are associated with a gradient which is cubically interpolated between pairs of hyperplanes. This not only generates visual artifacts since the interpolation scheme is not C²-continuous, it requires O(2N) work. The new algorithm mitigates the first problem by using a quartic interpolant, and solves the second by decomposing the N-cell into N! simplices:

```let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]
```

This requires interpolating among N + 1 vertices taken in pairs, for O(N²) work total. The simplex in which a point (u, v, w) in the unit cell lies is determined by the permutation index of coordinates (u, v, w):

```let permindex (u, v, w) =
if u >= w then
if u >= v then
if v >= w then 0 else 1
else 2
else
if v >= w then 3 else
if u >= v then 4 else 5
```

Perlin's code uses a different method to directly travel along the simplex edges by incrementing each coordinate according to the permutation index; I've opted for using a table to simplify the code and eliminate the need for global variables. Each point (x, y, z) is decomposed into a sum of a lattice vector (i, j, k) and a point in the unit cell (u, v, w). To account for the different volumes in the simplicial decomposition of the unit N-cell, the vectors are `skew`ed:

```let int x =
if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
let s = (x +. y +. z) /. 3. in
let i = int (x +. s)
and j = int (y +. s)
and k = int (z +. s) in
(i, j, k)
```

and `unskew`ed:

```let unskew (x, y, z) (i, j, k) =
let s = float (i + j + k) /. 6. in
let u = x -. float i +. s
and v = y -. float j +. s
and w = z -. float k +. s in
(u, v, w)
```

along the main diagonal of the N-cell. This amounts to a change of basis between a cubic and a tetrahedral grid and back. In general, the skewing factor is (√(N+1) - 1)/N and the unskewing factor is (N + 1 - √(N+1))/(N(N+1)). A couple of simple functions will simplify the main routine:

```let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')
```

Now given a point p, `noise` finds the lattice vector l and the unit-cell coordinates x corresponding to it. It then selects the simplex s corresponding to x. Then, for each of its vertices v it finds the simplicial coordinates y of x relative to v. It computes in t a radial basis for y such that only the gradients attached to the simplex's vertices contribute to the final result. If it does, it computes the hash h corresponding to v, uses it to find the pseudo-random gradient applied to y and accumulates the corresponding contribution for this vertex:

```let noise p =
let l = skew   p   in
let x = unskew p l in
let s = simplices.(permindex x) in
let f = ref 0. in
for i = 0 to 3 do
let v = s.(i) in
let y = unskew x v in
let t = 0.6 -. norm2 y in
if t > 0. then
let h = shuffle (addi3 l v) in
let t = t *. t in
f := !f +. 8. *. t *. t *. magnitude h y
done;
!f
```

That's it. `noise` returns a floating-point number in the interval [-1, 1]; in order to use it in pixmaps it is frequently more useful to `rescale` it to an unsigned byte:

```let clampb n =
(n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
clampb (int (0.5 +. ldexp (f +. 1.) 7))
```

`clampb` is a nice little branchless routine to constrain an integer to the range [0, 255]. Note that in OCaml, integers are `Sys.word_size - 1` bits long; in Java, for instance, the code would be:

```public static int clampb(int n) {
return (n | ((255-n) >>> 31)) & ~(n >>> 31) & 255;
}
```

(teasing that code apart is a nice exercise in bit twiddling.) This implementation produces the same result, pixel for pixel, as Perlin's code. This is a slice in the interval (-2, -2, 0)–(2, 2, 0) as a GIF file computed with this code:

and this is the same slice as a PNG file computed in Java with the class given by Perlin in his Appendix B:

There are two avenues open for further explorations of this algorithm. The first is to generalize it to an arbitrary number of dimensions; Gustavson's paper is a good starting point for that. The other is to go further with Perlin's original intention of building a hardware generator and eliminate the use of floating-point operations. This is mostly mechanical, except for the calculation of t in the inner loop, which requires at least a 0.24 fixed-point format.

phil tomson said...

Could you post a link to the full code that is able to produce the output in your first figure?

Matías Giovannini said...

@Phil: will do. I've used the GIF encoder I blogges about in October: http://alaska-kamtchatka.blogspot.com/2011/10/first-principles-gif-encoder.html

Matías Giovannini said...

*blogged