Update: Here is the
PNGPack
source-code archive:
Per second. Yes, it's benchmark time. I pitted Prof. Perlin's original implementation against Stefan Gustavson's, and the results are in! I ran two sets of benchmarks, one in Java and the other in OCaml. In both cases I compared three versions of Simplex Noise: Perlin's pipelined 3D Noise, and Gustavson's 2D Noise and 3D Noise:
noisep
: Perlin's 3D Simplex Noisenoise2
: Gustavson's 2D Simplex Noisenoise3
: Gustavson's 3D Simplex Noise
The Java programs are copied directly from the linked papers (except for one modification, noted below); the OCaml versions are pixel-faithful translations. Each benchmarked function calls noise 1,000 times in a tight loop. The benchmark runs the functions three times for 10 seconds each. I ran the benchmark on a lightly loaded, cool MacBook (2 GHz Intel Core 2 Duo, 2 GB 533 Mhz DDR2 SDRAM running Mac OS X 10.6.8). By cool I mean "left to cool down"; the temperature reached 37 °C yesterday. These are the best-of-three timings for each function:
Function | Calls/s | |
---|---|---|
OCaml | Java | |
noisep | 2,123,490 | 3,417,300 |
noise2 | 18,206,090 | 18,239,810 |
noise3 | 10,506,630 | 10,404,280 |
The timings relative to Java's noisep
are:
Function | Relative speed | |
---|---|---|
OCaml | Java | |
noisep | 0.6214 | 1.0000 |
noise2 | 5.3276 | 5.3375 |
noise3 | 3.0745 | 3.0446 |
The big difference between OCaml's and Java's noisep
is attributable to the fact that my translation is more structured than Perlin's version; for the other two functions the difference is within 1% (which I find quite amazing in and of itself).
In the case of Gustavson's Noise, I modified the code to randomly choose among 16 gradients on the unit cell, not 12 as in the code given in the paper. In contrast to Perlin's suggestion of duplicating existing vectors, I added four cell vertices so that the 2D projection of the gradient vectors onto the XY plane is unbiased. This allows indexing the gradient vector with a bit-mask instead of a modulo operation:
let grad3 = [| ( 1., 1., 0.);(-1., 1., 0.);( 1.,-1., 0.);(-1.,-1., 0.); ( 1., 0., 1.);(-1., 0., 1.);( 1., 0.,-1.);(-1., 0.,-1.); ( 0., 1., 1.);( 0.,-1., 1.);( 0., 1.,-1.);( 0.,-1.,-1.); ( 1., 1., 1.);(-1., 1., 1.);( 1.,-1.,-1.);(-1.,-1., 1.); |]
These are the raw timings, for OCaml:
noisep: 10.45 WALL (10.44 usr + 0.01 sys = 10.45 CPU) @ 2122.11/s (n=22184) 10.45 WALL (10.44 usr + 0.01 sys = 10.45 CPU) @ 2123.49/s (n=22184) 10.45 WALL (10.44 usr + 0.01 sys = 10.45 CPU) @ 2122.31/s (n=22184) noise2: 10.51 WALL (10.50 usr + 0.01 sys = 10.51 CPU) @ 18198.59/s (n=191223) 10.50 WALL (10.50 usr + 0.01 sys = 10.50 CPU) @ 18206.09/s (n=191223) 10.50 WALL (10.50 usr + 0.01 sys = 10.50 CPU) @ 18204.68/s (n=191223) noise3: 10.51 WALL (10.51 usr + 0.01 sys = 10.51 CPU) @ 10504.05/s (n=110443) 10.51 WALL (10.51 usr + 0.01 sys = 10.51 CPU) @ 10506.63/s (n=110443) 10.57 WALL (10.56 usr + 0.01 sys = 10.57 CPU) @ 10449.81/s (n=110443) Rate noisep noise3 noise2 noisep 2123+- 1/s -- -80% -88% noise3 10487+-25/s 394% -- -42% noise2 18203+- 3/s 758% 74% --
and for Java:
noisep: 10.04 WALL @ 3386.12/s (n=34000) noisep: 10.24 WALL @ 3417.30/s (n=35000) noisep: 10.24 WALL @ 3416.97/s (n=35000) noise2: 10.01 WALL @ 18181.82/s (n=182000) noise2: 10.00 WALL @ 18100.00/s (n=181000) noise2: 10.03 WALL @ 18239.81/s (n=183000) noise3: 10.04 WALL @ 10357.53/s (n=104000) noise3: 10.09 WALL @ 10404.28/s (n=105000) noise3: 10.01 WALL @ 10390.65/s (n=104000)
I used the following benchmark harness in OCaml:
let rand w = w *. (Random.float 1. -. 0.5) let max_iter = 1_000 let width = 4. let noisep () = let p = (rand width, rand width, rand width) in for i = 1 to max_iter do ignore (Noise.noise p) done let noise2 () = let p = (rand width, rand width) in for i = 1 to max_iter do ignore (Noise.noise2 p) done let noise3 () = let p = (rand width, rand width, rand width) in for i = 1 to max_iter do ignore (Noise.noise3 p) done let () = let open Benchmark in let bm = throughputN ~style:Auto ~repeat:3 10 [ ("noisep", noisep, ()); ("noise2", noise2, ()); ("noise3", noise3, ()); ] in print_newline(); tabulate bm;
It uses Christophe Troestler's ocaml-benchmark
module. The Java harness is similar but hand-coded:
public class NoiseBench { private static final double WIDTH = 4.0; private static final int NITER = 1000; private static final int NTEST = 3; private static final int NSECS = 10; private static void benchmark(String name, Runnable test) { for (int i = 0; i != NTEST; i++) { long now = System.currentTimeMillis(), lap = 0; int count = 0; while (lap < NSECS * 1000) { for (int j = 0; j != NITER; j++) test.run(); lap = System.currentTimeMillis() - now; count += NITER; } System.out.printf("%s: %.2f WALL @ %.2f/s (n=%d)\n", name, (double) lap / 1000.0, (double) count * 1000. / (double) lap, count); System.out.flush(); } } private static double rand() { return WIDTH * (Math.random() - 0.5); } private static void noisep() { double x = rand(), y = rand(), z = rand(); for (int i = 0; i != NITER; i++) Noise3.noise(x, y, z); } private static void noise2() { double x = rand(), y = rand(); for (int i = 0; i != NITER; i++) SimplexNoise.noise(x, y); } private static void noise3() { double x = rand(), y = rand(), z = rand(); for (int i = 0; i != NITER; i++) SimplexNoise.noise(x, y, z); } public static void main(String[] args) { benchmark("noisep", new Runnable() { public void run() { noisep(); } }); benchmark("noise2", new Runnable() { public void run() { noise2(); } }); benchmark("noise3", new Runnable() { public void run() { noise3(); } }); } }
No comments:
Post a Comment