2012-01-11

Eighteen Million Noises

Update: Here is the PNGPack source-code archive:

noise.tgz

Per second. Yes, it's benchmark time. I pitted Prof. Perlin's original implementation against Stefan Gustavson's, and the results are in! I ran two sets of benchmarks, one in Java and the other in OCaml. In both cases I compared three versions of Simplex Noise: Perlin's pipelined 3D Noise, and Gustavson's 2D Noise and 3D Noise:


  • noisep: Perlin's 3D Simplex Noise
  • noise2: Gustavson's 2D Simplex Noise
  • noise3: Gustavson's 3D Simplex Noise

The Java programs are copied directly from the linked papers (except for one modification, noted below); the OCaml versions are pixel-faithful translations. Each benchmarked function calls noise 1,000 times in a tight loop. The benchmark runs the functions three times for 10 seconds each. I ran the benchmark on a lightly loaded, cool MacBook (2 GHz Intel Core 2 Duo, 2 GB 533 Mhz DDR2 SDRAM running Mac OS X 10.6.8). By cool I mean "left to cool down"; the temperature reached 37 °C yesterday. These are the best-of-three timings for each function:


FunctionCalls/s
OCamlJava
noisep 2,123,490 3,417,300
noise218,206,09018,239,810
noise310,506,63010,404,280

The timings relative to Java's noisep are:


FunctionRelative speed
OCamlJava
noisep0.62141.0000
noise25.32765.3375
noise33.07453.0446

The big difference between OCaml's and Java's noisep is attributable to the fact that my translation is more structured than Perlin's version; for the other two functions the difference is within 1% (which I find quite amazing in and of itself).


In the case of Gustavson's Noise, I modified the code to randomly choose among 16 gradients on the unit cell, not 12 as in the code given in the paper. In contrast to Perlin's suggestion of duplicating existing vectors, I added four cell vertices so that the 2D projection of the gradient vectors onto the XY plane is unbiased. This allows indexing the gradient vector with a bit-mask instead of a modulo operation:


let grad3 = [|
( 1., 1., 0.);(-1., 1., 0.);( 1.,-1., 0.);(-1.,-1., 0.);
( 1., 0., 1.);(-1., 0., 1.);( 1., 0.,-1.);(-1., 0.,-1.);
( 0., 1., 1.);( 0.,-1., 1.);( 0., 1.,-1.);( 0.,-1.,-1.);
( 1., 1., 1.);(-1., 1., 1.);( 1.,-1.,-1.);(-1.,-1., 1.);
|]

These are the raw timings, for OCaml:


noisep: 10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.11/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2123.49/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.31/s (n=22184)
noise2: 10.51 WALL (10.50 usr +  0.01 sys = 10.51 CPU) @ 18198.59/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18206.09/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18204.68/s (n=191223)
noise3: 10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10504.05/s (n=110443)
        10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10506.63/s (n=110443)
        10.57 WALL (10.56 usr +  0.01 sys = 10.57 CPU) @ 10449.81/s (n=110443)

          Rate     noisep noise3 noise2
noisep  2123+- 1/s     --   -80%   -88%
noise3 10487+-25/s   394%     --   -42%
noise2 18203+- 3/s   758%    74%     --

and for Java:


noisep: 10.04 WALL @ 3386.12/s (n=34000)
noisep: 10.24 WALL @ 3417.30/s (n=35000)
noisep: 10.24 WALL @ 3416.97/s (n=35000)
noise2: 10.01 WALL @ 18181.82/s (n=182000)
noise2: 10.00 WALL @ 18100.00/s (n=181000)
noise2: 10.03 WALL @ 18239.81/s (n=183000)
noise3: 10.04 WALL @ 10357.53/s (n=104000)
noise3: 10.09 WALL @ 10404.28/s (n=105000)
noise3: 10.01 WALL @ 10390.65/s (n=104000)

I used the following benchmark harness in OCaml:


let rand w = w *. (Random.float 1. -. 0.5)

let max_iter = 1_000
let width    = 4.

let noisep () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise p) done

let noise2 () =
  let p = (rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise2 p) done

let noise3 () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise3 p) done

let () =
  let open Benchmark in
  let bm = throughputN ~style:Auto ~repeat:3 10 [
    ("noisep", noisep, ());
    ("noise2", noise2, ());
    ("noise3", noise3, ());
  ] in
    print_newline();
  tabulate bm;

It uses Christophe Troestler's ocaml-benchmark module. The Java harness is similar but hand-coded:


public class NoiseBench {
  private static final double WIDTH = 4.0;
  private static final int NITER = 1000;
  private static final int NTEST = 3;
  private static final int NSECS = 10;

  private static void benchmark(String name, Runnable test) {
    for (int i = 0; i != NTEST; i++) {
      long now = System.currentTimeMillis(), lap = 0;
      int count = 0;
      while (lap < NSECS * 1000) {
        for (int j = 0; j != NITER; j++)
          test.run();
        lap = System.currentTimeMillis() - now;
        count += NITER;
      }
      System.out.printf("%s: %.2f WALL @ %.2f/s (n=%d)\n",
        name,
        (double) lap / 1000.0,
        (double) count * 1000. / (double) lap,
        count);
      System.out.flush();
    }
  }

  private static double rand() {
    return WIDTH * (Math.random() - 0.5);
  }

  private static void noisep() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      Noise3.noise(x, y, z);
  }

  private static void noise2() {
    double x = rand(), y = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y);
  }

  private static void noise3() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y, z);
  }

  public static void main(String[] args) {
    benchmark("noisep", new Runnable() { public void run() { noisep(); } });
    benchmark("noise2", new Runnable() { public void run() { noise2(); } });
    benchmark("noise3", new Runnable() { public void run() { noise3(); } });
  }
}

No comments: