Perlin's Simplex Noise

Graphics pioneer Ken Perlin is well-known for his work on procedural generation of textures. Recently he rewrote his classic Noise algorithm to reduce the complexity of generating pseudorandom gradients in higher dimensions by transforming a problem on N-cells to one on N-simplices. I couldn't find on his NYU home page a write-up by his own hand; apparently the only implementation exists in his Java applet. There is a well-known exegesis by Stefan Gustavson that completely rewrites the algorithm; I've also found a write-up by Perlin himself in a collection of course notes. The latter shows an extremely terse implementation in Java (Appendix B) that purports to correspond directly with hardware operations implemented in a specialized pipelined processor. I'll show an unraveled re-implementation of that Java algorithm that exactly corresponds to Prof. Perlin's, in the sense that the outputs are the same, but with the magic hopefully explained clearly. I will assume, however, that you are familiar with the original algorithm; if not, read Gustavson's analysis for a clear introduction to its workings.

The key to Perlin's Noise is that it interpolates pseudo-random gradients attached to lattice points. In the original algorithm, those gradients are computed by successive lookups using a permutation table. The new algorithm is geared towards an 8.8 fixed-point format; instead of tables it uses bit-slicing operations to compute two pseudo-random 3-bit hashes from the lattice points:

let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

I've expressed the patterns in octal to stress the fact that the result is a 6-bit number. btst simlpy extracts the b-th bit from n. bmix takes a 3-bit slice from the b-th bits of i, j and k, in that order, and uses it to index on patterns. shuffle adds up bmixes of eight circular rotations of (i, j, k) for each bit 0, 1 … 7. The procedure seems rather ad-hoc, but in fact produces rather well-distributed 3-bit fields among all the (28)3 = 16 Mi combinations of (i, j, k):

NP[Lower 3 bits = N]P[Upper 3 bits = N]

These 3-bit hashes are used to compute a magnitude for vector (x, y, z) by selecting each coordinate with probability 3/4, applying a random sign to each with probability 1/2 and adding the resulting values:

let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

This amounts to finding the dot product of the vector (x, y, z) with a random gradient on the unit cell. Generating all combinations and extracting the gradient vectors we get the following diagram:

Distribution of the random basis vectors

Each dot shows the frequency with which that vector is selected; the diagonal vectors (±1, ±1, ±1) occur half as frequently as the mid-edge vectors (±1, ±1, 0) —and rotations—, to account for their greater magnitude and thus weight. All this is done without multiplying! If you study Perlin's paper and code, you will notice that he uses bit operations to select each of the x, y and z coordinates and to add or subtract them, to avoid using tables; however, the code doesn't quite correspond to the explanation given in the text. I've unrolled the conditionals into two jump tables in a way that duplicates the effect of the code; thus the first jump table is not equivalent to the one given in the text.

The key to the efficiency of this algorithm is not just in the use of bit operations for generating pseudo-random gradients but in the subdivision strategy used to combine them in ℝN. In the original algorithm, each of the 2N vertices in the N-cell are associated with a gradient which is cubically interpolated between pairs of hyperplanes. This not only generates visual artifacts since the interpolation scheme is not C²-continuous, it requires O(2N) work. The new algorithm mitigates the first problem by using a quartic interpolant, and solves the second by decomposing the N-cell into N! simplices:

let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]

This requires interpolating among N + 1 vertices taken in pairs, for O(N²) work total. The simplex in which a point (u, v, w) in the unit cell lies is determined by the permutation index of coordinates (u, v, w):

let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  if v >= w then 3 else
  if u >= v then 4 else 5

Perlin's code uses a different method to directly travel along the simplex edges by incrementing each coordinate according to the permutation index; I've opted for using a table to simplify the code and eliminate the need for global variables. Each point (x, y, z) is decomposed into a sum of a lattice vector (i, j, k) and a point in the unit cell (u, v, w). To account for the different volumes in the simplicial decomposition of the unit N-cell, the vectors are skewed:

let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

and unskewed:

let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

along the main diagonal of the N-cell. This amounts to a change of basis between a cubic and a tetrahedral grid and back. In general, the skewing factor is (√(N+1) - 1)/N and the unskewing factor is (N + 1 - √(N+1))/(N(N+1)). A couple of simple functions will simplify the main routine:

let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

Now given a point p, noise finds the lattice vector l and the unit-cell coordinates x corresponding to it. It then selects the simplex s corresponding to x. Then, for each of its vertices v it finds the simplicial coordinates y of x relative to v. It computes in t a radial basis for y such that only the gradients attached to the simplex's vertices contribute to the final result. If it does, it computes the hash h corresponding to v, uses it to find the pseudo-random gradient applied to y and accumulates the corresponding contribution for this vertex:

let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y

That's it. noise returns a floating-point number in the interval [-1, 1]; in order to use it in pixmaps it is frequently more useful to rescale it to an unsigned byte:

let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

clampb is a nice little branchless routine to constrain an integer to the range [0, 255]. Note that in OCaml, integers are Sys.word_size - 1 bits long; in Java, for instance, the code would be:

public static int clampb(int n) {
  return (n | ((255-n) >>> 31)) & ~(n >>> 31) & 255;

(teasing that code apart is a nice exercise in bit twiddling.) This implementation produces the same result, pixel for pixel, as Perlin's code. This is a slice in the interval (-2, -2, 0)–(2, 2, 0) as a GIF file computed with this code:

Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

and this is the same slice as a PNG file computed in Java with the class given by Perlin in his Appendix B:

Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in Java

There are two avenues open for further explorations of this algorithm. The first is to generalize it to an arbitrary number of dimensions; Gustavson's paper is a good starting point for that. The other is to go further with Perlin's original intention of building a hardware generator and eliminate the use of floating-point operations. This is mostly mechanical, except for the calculation of t in the inner loop, which requires at least a 0.24 fixed-point format.


phil tomson said...

Could you post a link to the full code that is able to produce the output in your first figure?

Matías Giovannini said...

@Phil: will do. I've used the GIF encoder I blogges about in October: http://alaska-kamtchatka.blogspot.com/2011/10/first-principles-gif-encoder.html

Matías Giovannini said...