2012-01-12

File Sharing on the Spot

(now with highlighting!) Last time I complained that I couldn't find an easy way to share a source code archive that didn't involve signing up for a service I didn't care for. Blogging platforms only make easy to attach images to posts, so why not pack a file as a PNG? Enter PNGPack. I tried finding OCaml bindings to libpng; after a disheartening exploration I realized that Java is (for me at least) the ideal platform to make this possible, since it has a rich standard library to resort to:

import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import javax.imageio.ImageIO;

PNGPack is the simplest command-line utility I could get away with. It processes its arguments one by one, checking that they actually are regular files and determining by their extension whether to pack them as PNGs or to extract them:

public class PNGPack {
  public static void main(String[] args) {
    if (args.length == 0) {
      System.err.println("usage - java PNGPack <file>...");
      System.exit(2);
    }
    for (int i = 0; i != args.length; i++) {
      final File file = new File(args[i]);
      if (!file.isFile()) {
        System.err.printf("File \"%s\" is not a regular file\n", file.toString());
        continue;
      }
      final String fileName = file.getName();
      final int index = fileName.lastIndexOf('.');
      final String baseName, extension;
      if (index < 1) {
        baseName  = fileName;
        extension = null;
      } else {
        baseName  = fileName.substring(0, index);
        extension = fileName.substring(index + 1).toLowerCase();
      }
      try {
        if ("png".equals(extension)) {
          final File out = new File(file.getParentFile(), baseName);
          decode(file, out);
        } else {
          final File out = new File(file.getParentFile(), fileName + ".png");
          encode(file, out);
        }
      } catch (IOException e) {
        System.err.printf("Can't read \"%s\"\n", file.toString());
        e.printStackTrace(System.err);
      }
    }
  }

The image is headed by 24 bytes comprised by:

OffsetLengthField
04Signature 'PNGP'
44File length (big endian)
816MD5 digest
  private static final int PNGP_SIG  = 0x504e4750; // 'PNGP'
  private static final int HEADER_SZ = 24;

The image dimensions are selected so that the result is as square as possible. The image uses 4-byte ABGR pixels for maximum compactness and round-trip fidelity. The image is padded to size with 0 bytes, and the MD5 digest is computed over all the contents, padding included to avoid opening a steganographic channel:

  private static void encode(File inp, File out) throws IOException {
    final long length = inp.length();
    if (length >= 0x80000000L)
      throw new IOException("Overlong file");
    final int pixels = (int) (length + HEADER_SZ) >> 2;
    final int height = (int) Math.floor(Math.sqrt((double) pixels));
    final int width  = (int) Math.ceil((double) pixels / (double) height);
    final BufferedImage image = new BufferedImage(width, height, BufferedImage.TYPE_4BYTE_ABGR);
    final byte[] frame = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
    final MessageDigest md5 = getMD5Digest();
    final InputStream is = new DigestInputStream(new FileInputStream(inp), md5);
    try {
      int nread;
      int index = HEADER_SZ;
      while ( (nread = is.read(frame, index, frame.length - index)) != -1 )
        index += nread;
      for (int i = index; i != frame.length; i++)
        frame[i] = 0;
      md5.update(frame, index, frame.length - index);
    }
    finally {
      is.close();
    }
    final byte[] digest = md5.digest();
    assert (digest.length == 16);
    intToBytes(frame, 0, PNGP_SIG);
    intToBytes(frame, 4, (int) length);
    for (int i = 0; i != digest.length; i++)
      frame[8 + i] = digest[i];
    final OutputStream os = new BufferedOutputStream(new FileOutputStream(out));
    try {
      ImageIO.write(image, "PNG", os);
    } finally {
      os.close();
    }
    System.out.printf("<img width=\"%d\" height=\"%d\" src=\"%s\" alt=\"%s\" />\n",
      width, height, out.getName(), inp.getName());
  }

The encoding ends by outputting a handy <img> tag. Decoding an image is equally straightforward, except for a number of safety checks trying to ensure that only proper PNGPack images are decoded:

  private static void decode(File inp, File out) throws IOException {
    final InputStream is = new BufferedInputStream(new FileInputStream(inp));
    final BufferedImage image;
    try {
      image = ImageIO.read(is);
    } finally {
      is.close();
    }
    if (image.getType() != BufferedImage.TYPE_4BYTE_ABGR)
      throw new IOException("Invalid PNGPack image (bad image type)");
    final byte[] frame = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();
    if (bytesToInt(frame, 0) != PNGP_SIG)
      throw new IOException("Invalid PNGPack image (bad signature)");
    final int length = bytesToInt(frame, 4);
    final int pixels = (int) (length + HEADER_SZ) >> 2;
    final int height = (int) Math.floor(Math.sqrt((double) pixels));
    final int width  = (int) Math.ceil((double) pixels / (double) height);
    if (!(height == image.getHeight() && width == image.getWidth()))
      throw new IOException("Invalid PNGPack image (bad dimensions)");
    final MessageDigest md5 = getMD5Digest();
    md5.update(frame, HEADER_SZ, frame.length - HEADER_SZ);
    final byte[] digest = md5.digest();
    assert (digest.length == 16);
    for (int i = 0; i != digest.length; i += 4)
      if (frame[8 + i] != digest[i])
        throw new IOException("Invalid PNGPack image (bad MD5 digest)");
    final OutputStream os = new FileOutputStream(out);
    try {
      os.write(frame, HEADER_SZ, length);
    } finally {
      os.close();
    }
  }

The code is hyper-compact in order to minimize the number of lines shown in this post. Feel free to add whitespace and comments to suit taste. The only thing left is a couple of helper functions:

  private static void intToBytes(byte[] buf, int off, int n) {
    buf[off + 0] = (byte) ((n >> 24) & 255);
    buf[off + 1] = (byte) ((n >> 16) & 255);
    buf[off + 2] = (byte) ((n >>  8) & 255);
    buf[off + 3] = (byte) ( n        & 255);
  }

  private static int bytesToInt(byte[] buf, int off) {
    return (((int) buf[off + 0] & 255) << 24)
      |  (((int) buf[off + 1] & 255) << 16)
      |  (((int) buf[off + 2] & 255) <<  8)
      |   ((int) buf[off + 3] & 255);
  }

  private static MessageDigest getMD5Digest() {
    try {
      return MessageDigest.getInstance("MD5");
    } catch (NoSuchAlgorithmException _) {
      System.err.println("No instance of MD5 digest algorithm!");
      System.exit(1);
      return null; // not reached
    }
  }
}

I've updated the last post to include the PNGPacked code archive. Use it responsibly, and enjoy!

2012-01-11

Eighteen Million Noises

Update: Here is the PNGPack source-code archive:

noise.tgz

Per second. Yes, it's benchmark time. I pitted Prof. Perlin's original implementation against Stefan Gustavson's, and the results are in! I ran two sets of benchmarks, one in Java and the other in OCaml. In both cases I compared three versions of Simplex Noise: Perlin's pipelined 3D Noise, and Gustavson's 2D Noise and 3D Noise:


  • noisep: Perlin's 3D Simplex Noise
  • noise2: Gustavson's 2D Simplex Noise
  • noise3: Gustavson's 3D Simplex Noise

The Java programs are copied directly from the linked papers (except for one modification, noted below); the OCaml versions are pixel-faithful translations. Each benchmarked function calls noise 1,000 times in a tight loop. The benchmark runs the functions three times for 10 seconds each. I ran the benchmark on a lightly loaded, cool MacBook (2 GHz Intel Core 2 Duo, 2 GB 533 Mhz DDR2 SDRAM running Mac OS X 10.6.8). By cool I mean "left to cool down"; the temperature reached 37 °C yesterday. These are the best-of-three timings for each function:


FunctionCalls/s
OCamlJava
noisep 2,123,490 3,417,300
noise218,206,09018,239,810
noise310,506,63010,404,280

The timings relative to Java's noisep are:


FunctionRelative speed
OCamlJava
noisep0.62141.0000
noise25.32765.3375
noise33.07453.0446

The big difference between OCaml's and Java's noisep is attributable to the fact that my translation is more structured than Perlin's version; for the other two functions the difference is within 1% (which I find quite amazing in and of itself).


In the case of Gustavson's Noise, I modified the code to randomly choose among 16 gradients on the unit cell, not 12 as in the code given in the paper. In contrast to Perlin's suggestion of duplicating existing vectors, I added four cell vertices so that the 2D projection of the gradient vectors onto the XY plane is unbiased. This allows indexing the gradient vector with a bit-mask instead of a modulo operation:


let grad3 = [|
( 1., 1., 0.);(-1., 1., 0.);( 1.,-1., 0.);(-1.,-1., 0.);
( 1., 0., 1.);(-1., 0., 1.);( 1., 0.,-1.);(-1., 0.,-1.);
( 0., 1., 1.);( 0.,-1., 1.);( 0., 1.,-1.);( 0.,-1.,-1.);
( 1., 1., 1.);(-1., 1., 1.);( 1.,-1.,-1.);(-1.,-1., 1.);
|]

These are the raw timings, for OCaml:


noisep: 10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.11/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2123.49/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.31/s (n=22184)
noise2: 10.51 WALL (10.50 usr +  0.01 sys = 10.51 CPU) @ 18198.59/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18206.09/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18204.68/s (n=191223)
noise3: 10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10504.05/s (n=110443)
        10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10506.63/s (n=110443)
        10.57 WALL (10.56 usr +  0.01 sys = 10.57 CPU) @ 10449.81/s (n=110443)

          Rate     noisep noise3 noise2
noisep  2123+- 1/s     --   -80%   -88%
noise3 10487+-25/s   394%     --   -42%
noise2 18203+- 3/s   758%    74%     --

and for Java:


noisep: 10.04 WALL @ 3386.12/s (n=34000)
noisep: 10.24 WALL @ 3417.30/s (n=35000)
noisep: 10.24 WALL @ 3416.97/s (n=35000)
noise2: 10.01 WALL @ 18181.82/s (n=182000)
noise2: 10.00 WALL @ 18100.00/s (n=181000)
noise2: 10.03 WALL @ 18239.81/s (n=183000)
noise3: 10.04 WALL @ 10357.53/s (n=104000)
noise3: 10.09 WALL @ 10404.28/s (n=105000)
noise3: 10.01 WALL @ 10390.65/s (n=104000)

I used the following benchmark harness in OCaml:


let rand w = w *. (Random.float 1. -. 0.5)

let max_iter = 1_000
let width    = 4.

let noisep () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise p) done

let noise2 () =
  let p = (rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise2 p) done

let noise3 () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise3 p) done

let () =
  let open Benchmark in
  let bm = throughputN ~style:Auto ~repeat:3 10 [
    ("noisep", noisep, ());
    ("noise2", noise2, ());
    ("noise3", noise3, ());
  ] in
    print_newline();
  tabulate bm;

It uses Christophe Troestler's ocaml-benchmark module. The Java harness is similar but hand-coded:


public class NoiseBench {
  private static final double WIDTH = 4.0;
  private static final int NITER = 1000;
  private static final int NTEST = 3;
  private static final int NSECS = 10;

  private static void benchmark(String name, Runnable test) {
    for (int i = 0; i != NTEST; i++) {
      long now = System.currentTimeMillis(), lap = 0;
      int count = 0;
      while (lap < NSECS * 1000) {
        for (int j = 0; j != NITER; j++)
          test.run();
        lap = System.currentTimeMillis() - now;
        count += NITER;
      }
      System.out.printf("%s: %.2f WALL @ %.2f/s (n=%d)\n",
        name,
        (double) lap / 1000.0,
        (double) count * 1000. / (double) lap,
        count);
      System.out.flush();
    }
  }

  private static double rand() {
    return WIDTH * (Math.random() - 0.5);
  }

  private static void noisep() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      Noise3.noise(x, y, z);
  }

  private static void noise2() {
    double x = rand(), y = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y);
  }

  private static void noise3() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y, z);
  }

  public static void main(String[] args) {
    benchmark("noisep", new Runnable() { public void run() { noisep(); } });
    benchmark("noise2", new Runnable() { public void run() { noise2(); } });
    benchmark("noise3", new Runnable() { public void run() { noise3(); } });
  }
}

2012-01-09

Putting Noise to the Test

So how do you use Perlin's Simplex Noise? By using the GIF encoder, of course! I generated the test image shown in the last post:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

with a minimal application of both modules in test.ml (source code follows). For a visually richer but not much more complicated example, I also tried replicating Iñigo Quílez's turbulent textures in fbm.ml. The animation traces a tiny loop in noise space so that it seamlessly repeats over and over. The first frame looks like this:



I'd like to know of a very easy way to share source tarballs with you (no, Forges require waiting for new project approval; no, Github requires adhering to a philosophy of sharing for which I don't care; no, Google Code forces me to give up my own version control; no, file hosting is not an option. I'm open to suggestions.); in the meantime, here's a wall of code with the complete project.


gif.mli


val fold_string : ('a -> char -> 'a) -> 'a -> string -> 'a

type color = { red : int; green : int; blue : int; }

type palette

val palette : ?background:int -> color array -> palette

val num_colors : palette -> int

type t

val make    : ?ct:palette -> width:int -> height:int
           -> (t -> unit) -> (out_channel -> unit)

val image   : ?ct:palette -> ?transparent:int -> ?fps:int
           -> ?x:int -> ?y:int -> ?width:int -> ?height:int
           -> string -> t -> unit

val comment : string -> t -> unit

val repeat  : ?count:int -> t -> unit

val verify_gif : in_channel -> bool

val grayscale   : palette
val heatmap     : palette
val rainbow     : palette
val black_white : palette

val unwind : protect:('a -> unit) -> ('a -> 'b) -> ('a -> 'b)
val with_output_file : (out_channel -> 'a) -> (string -> 'a)
val with_input_file  : (in_channel  -> 'a) -> (string -> 'a)

gif.ml


let unwind ~(protect : 'a -> unit) f x =
  try let y = f x in protect x; y
  with e -> protect x; raise e

class output otch = object (self)
  val         buffer  =  String.create 255
  val mutable current = -1

  method private flush =
    if current > 0 then begin
      output_byte otch current;
      output      otch buffer 0 current;
      current <- 0
    end

  method byte n =
    let b = n land 255 in
    if current == -1 then output_byte otch b else begin
      if current == 255 then self#flush;
      buffer.[current] <- char_of_int b;
      current <- current + 1
    end

  method le16 n =
    self#byte  n;
    self#byte (n lsr 8)

  method string str =
    String.iter (fun c -> self#byte (int_of_char c)) str

  method begin_block =
    if current != -1 then failwith "begin_block";
    current <- 0

  method end_block =
    if current == -1 then failwith "end_block" else
    self#flush;
    current <- -1;
    self#byte 0

  method close =
    if current != -1 then self#end_block;
    flush otch
end

let fold_string f e s =
  let res = ref e in
  String.iter (fun c -> res := f !res c) s;
  !res

class lzw ?(bits=8) out =
  let count       = 1 lsl bits in
  let clear_code  = count
  and end_of_info = count + 1 in
object (self)
  val table : (int * int, int) Hashtbl.t = Hashtbl.create 5003
  val mutable code_size = 0
  val mutable next_code = 0
  val mutable buffer    = 0
  val mutable length    = 0

  method append n =
    buffer <- buffer lor ((n land pred (1 lsl code_size)) lsl length);
    length <- length + code_size;
    while length >= 8 do
      out#byte (buffer land 0xff);
      buffer <- buffer lsr 8;
      length <- length - 8
    done

  method finish =
    if length > 0 then
      out#byte (buffer land pred (1 lsl length))

  method private reset =
    Hashtbl.clear table;
    for i = 0 to count - 1 do
      Hashtbl.add table (-1, i) i
    done;
    code_size <- bits  + 1;
    next_code <- count + 2

  method private add_string s =
    (* Check code for overflow *)
    if next_code == 1 lsl code_size then
      code_size <- code_size + 1;
    Hashtbl.add table s next_code;
    next_code <- next_code + 1;
    (* Limit maximum code size to 12 bits *)
    if next_code == 0x1000 then begin
      self#append clear_code;
      self#reset;
    end

  method compress data =
    self#reset;
    self#append clear_code;
    let last = fold_string (fun prefix c ->
      let  k = int_of_char c in
      try  Hashtbl.find table (prefix, k)
      with Not_found ->
        self#append prefix;
        self#add_string (prefix, k);
        k) (-1) data in
    self#append last;
    self#append end_of_info;
    self#finish
end

(* 0 <= red, green, blue < 256 *)
type color = { red : int; green : int; blue : int; }

type palette = { background : int; bits : int; table : color array; }

let num_colors { table; _ } = Array.length table

let palette ?(background=0) table =
  let size = Array.length table in
  if size < 2
  || size > 256
  || size land (size - 1) != 0
  || background < 0
  || background >= size
  then invalid_arg "color_table" else
  let bits = truncate (log (float size) /. log 2.) in
  assert (size == 1 lsl bits);
  { background; bits; table = Array.copy table; }

type t = {
  width  : int;
  height : int;
  ct     : palette option;
  out    : output;
}

let header { out; _ } = out#string "GIF89a"

let trailer { out; _ } = out#byte 0x3b

let color_table { table; _ } { out; _ } =
  Array.iter (fun { red; green; blue } ->
    out#byte red  ;
    out#byte green;
    out#byte blue) table

let logical_screen ({ width; height; ct; out } as gif) =
  out#le16   width;
  out#le16   height;
  match ct with
  | None ->
    out#byte 0;
    out#byte 0;
    out#byte 0 (* default aspect ratio (1:1 = 49) *)
  | Some ({ background; bits; _ } as ct) ->
    out#byte (0xf0 lor pred bits);
    out#byte background;
    out#byte 0;
    color_table ct gif

let make ?ct ~width ~height proc otch =
  unwind ~protect:(fun { out; _ } -> out#close) (fun gif ->
    header gif;
    logical_screen gif;
    proc gif;
    trailer gif)
    { width; height; ct; out = new output otch; }

let graphics_control_extension transparent fps { out; _ } =
  let delay = if fps = 0 then 0 else (200 + fps) / (fps + fps) in
  out#byte 0x21; (* GIF Extension Code *)
  out#byte 0xf9; (* Graphic Control Label *)
  out#byte 0x04; (* Length of Data Sub-Block *)
  out#byte (match transparent with None -> 0 | Some _ -> 9);
  out#le16 delay;
  out#byte (match transparent with None -> 0 | Some c -> c);
  out#byte 0x00 (* Data Sub-Block Terminator *)

let image_descriptor ct x y width height ({ out; _ } as gif) =
  out#byte 0x2c;
  out#le16 x;
  out#le16 y;
  out#le16 width;
  out#le16 height;
  match ct with
  | None ->
    out#byte 0x00
  | Some ({ bits; _ } as ct) ->
    out#byte (0x80 lor pred bits);
    color_table ct gif

let image ?ct ?transparent ?(fps=0) ?(x=0) ?(y=0) ?width ?height bytes ({ out; _ } as gif) =
  let w = match width  with None -> gif.width  | Some w -> w
  and h = match height with None -> gif.height | Some h -> h in
  if x < 0 || x + w  > gif.width
  || y < 0 || x + h > gif.height
  || String.length bytes != w * h
    then invalid_arg "image";
  let bits = match ct, gif.ct with
  | Some ct, _
  | None   , Some ct -> max 2 ct.bits
  | None   , None    -> invalid_arg "image"
  in
  (match transparent, fps with
  | None, 0 -> ()
  | _   , _ -> graphics_control_extension transparent fps gif);
  image_descriptor ct x y w h gif;
  out#byte bits;
  out#begin_block;
  (new lzw ~bits out)#compress bytes;
  out#end_block

let comment text { out; _ } =
  out#byte 0x21;         (* GIF Extension Code *)
  out#byte 0xfe;         (* Comment Label *)
  out#begin_block;
  out#string text;
  out#end_block

let repeat ?(count=0) { out; _ } =
  out#byte   0x21;       (* GIF Extension code *)
  out#byte   0xff;       (* Application Extension Label *)
  out#byte   0x0b;       (* Length of application block *)
  out#string "NETSCAPE"; (* Application Identifier *)
  out#string "2.0";      (* Appl. Authentication Code *)
  out#byte   0x03;       (* Length of Data Sub-Block *)
  out#byte   0x01;       (* Loop sub-block ID *)
  out#le16   count;      (* Loop count (0 = forever) *)
  out#byte   0x00        (* Data Sub-Block Terminator *)

let verify_gif inch =
  let failf   fmt = Printf.kprintf failwith fmt in
  let check p fmt = if not p then failf fmt in
  let buffer = Buffer.create 16 in
  let input_le16 inch  =
    let b0 = input_byte inch in
    let b1 = input_byte inch in
    b0 lor (b1 lsl 8)
  in
  let verify_header inch =
    let buf = String.create 6 in
    really_input inch buf 0 6;
    check (buf = "GIF89a") "Expected GIF header"
  in
  let verify_color_table len inch =
    let cnt = 3 * (1 lsl len) in
    for i = 1 to cnt do
      ignore (input_byte inch)
    done;
    Printf.printf "CT   %d colors in %d bytes\n" (cnt / 3) cnt
  in
  let verify_blocks inch =
    let tot = ref 0 in
    Buffer.clear buffer;
    try while true do
      let cnt = input_byte inch in
      if cnt == 0 then raise Exit else
      Buffer.add_channel buffer inch cnt;
      incr tot
    done; assert false with Exit ->
      let contents = Buffer.contents buffer in
      Printf.printf "%d bytes in %d blocks, EOB = %010x\n%!"
        (String.length contents) !tot (pos_in inch);
      contents
  in
  let verify_logical_screen_descriptor inch =
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    let backgr = input_byte inch in
    let aspect = input_byte inch in
    Printf.printf "LSD  w = %d, h = %d, f = %2x, b = %d, a = %d\n"
      width height fields backgr aspect;
    if fields land 0x80 == 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_image_descriptor inch =
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    Printf.printf "ID   x = %d, y = %d, w = %d, h = %d, f = %2x\n"
      left top width height fields;
    if fields land 0x80 = 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_table_based_image inch =
    verify_image_descriptor inch;
    let bits = input_byte inch in
    Printf.printf "IMG  code size = %d, " bits;
    ignore (verify_blocks inch)
  in
  let verify_plain_text_extension inch =
    check (12 == input_byte inch) "Expected block size = 12\n";
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let _celwid = input_byte inch in
    let _celhgt = input_byte inch in
    let _fgcol  = input_byte inch in
    let _bgcol  = input_byte inch in
    Printf.printf "PTE  l = %d, t = %d, w = %d, h = %d "
      left top width height;
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_application_extension inch =
    check (11 == input_byte inch) "Expected block size = 11\n";
    let label = String.create 11 in
    really_input inch label 0 11;
    Printf.printf "AE   %s " label;
    ignore (verify_blocks inch)
  in
  let verify_graphic_control_extension inch =
    check (4 == input_byte inch) "Expected block size = 4\n";
    let fields = input_byte inch in
    let delay  = input_le16 inch in
    let transp = input_byte inch in
    check (0x00 == input_byte inch) "Expected block terminator\n";
    Printf.printf "GCE  f = %2x, d = %d, t = %d\n"
      fields delay transp
  in
  let verify_comment_extension inch =
    Printf.printf "CE   ";
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_extension inch =
    match input_byte inch with
    | 0xff -> verify_application_extension     inch
    | 0xfe -> verify_comment_extension         inch
    | 0xf9 -> verify_graphic_control_extension inch
    | 0x01 -> verify_plain_text_extension      inch
    | b    -> failf "Unknown extension introducer %2x" b
  in
  let verify_eof inch =
    try ignore (input_char inch); failf "Extra contents after EOF"
    with End_of_file -> ()
  in
  let rec verify_data inch =
    match input_byte inch with
    | 0x3b -> verify_eof inch
    | 0x2c -> verify_table_based_image inch; verify_data inch
    | 0x21 -> verify_extension         inch; verify_data inch
    | b    -> failf "Unknown content block %2x" b
  in
  try
    verify_header inch;
    verify_logical_screen_descriptor inch;
    verify_data inch;
    true
  with Failure e ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): %s\n%!" off off e;
    false
  | End_of_file ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): Unexpected EOF\n%!" off off;
    false

let grayscale = palette (Array.init 256 (fun i ->
  { red = i; green = i; blue = i }))

let heatmap =
  let ramp i l h =
    let j = 255 * (i - l) / (h - l) in
    if j < 0 then 0 else if j > 255 then 255 else j
  in palette (Array.init 256 (fun i ->
  { red = ramp i 0 84; green = ramp i 85 170; blue = ramp i 171 255 }))

let rainbow =
  let pi   = 3.14159265358979324
  and q1_3 = 0.333333333333333333 in
  let cs x =
    let c = cos (pi *. x) in
    truncate (255. *. c *. c +. 0.5)
  in palette (Array.init 256 (fun i ->
  let x = float i /. float 256 in
  { red = cs x; green = cs (x -. q1_3); blue = cs (x +. q1_3) }))

let black_white = palette [|
  { red = 255; green = 255; blue = 255; };
  { red =   0; green =   0; blue =   0; };
|]

let with_output_file proc fname =
  unwind ~protect:close_out proc (open_out_bin fname)

let with_input_file proc fname =
  unwind ~protect:close_in proc (open_in_bin fname)

noise.mli


val noise : float * float * float -> float
val fBM : ?octaves:int -> float * float * float -> float

(** Clamp an integer into the range [0, 255] *)
val clampb : int -> int
(** Convert a float in the range [-1, 1] into an integer in the range [0, 255] *)
val rescale : float -> int

noise.ml


let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]

let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  else
  if v >= w then 3 else
  if u >= v then 4 else 5

let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y
  done;
  !f

let fBM ?(octaves=5) =
  let rec go f w i (x, y, z as p) =
    if i == 0 then f else
    let f = f +. w *. noise p in
    go f (0.5 *. w) (pred i) (2. *. x, 2. *. y, 2. *. z)
  in go 0. 1. octaves

let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

test.ml


let () =
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  let x0, y0, z0 = -2., -2., 0.
  and sc = 4.0 in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      for i = 0 to height - 1 do
        let y = y0 +. sc *. float i /. float height in
        for j = 0 to width - 1 do
          let x = x0 +. sc *. float j /. float width in
          let p = Noise.noise (x, y, z0) in
          pixels.[i * width + j] <- char_of_int (Noise.rescale p)
        done
      done;
      Gif.image pixels gif)) "perlin.gif"

fbm.ml


let ( +/ ) (x0, y0, z0) (x1, y1, z1) = (x0 +. x1, y0 +. y1, z0 +. z1)
let ( */ ) k (x, y, z) = (k *. x, k *. y, k *. z)

let two_pi = 6.283185307179586476925286766552

let () =
  let count  = 100 in
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      if count > 1 then Gif.repeat gif;
      let sc = 1.0 in
      for c = 0 to count - 1 do
        let q = two_pi *. float c /. float count in
        let sq = 0.05 *. sin q
        and cq = 0.05 *. cos q in
        for i = 0 to height - 1 do
          let y = sc *. float i /. float height in
          for j = 0 to width - 1 do
            let x = sc *. float j /. float width in
            let p = (x, y, 0.) in
            let q = p +/ 2. */ (
              Noise.fBM (p +/ (   cq, sq, 0.)),
              Noise.fBM (p +/ (-. sq, cq, 0.)),
              Noise.fBM (p +/ (   0., 0., 1.))) in
            let r = p +/ 2. */ (
              Noise.fBM (q +/ (   cq, 0., sq)),
              Noise.fBM (q +/ (   0., 1., 0.)),
              Noise.fBM (q +/ (-. sq, 0., cq))) in
            let f = 2. *. Noise.fBM r  in
            pixels.[i * width + j] <- char_of_int (Noise.rescale f)
          done
        done;
        Printf.printf "%d/%d\n%!" (succ c) count;
        Gif.image ~fps:25 pixels gif
      done))
    "fbm.gif"

Makefile


# :mode=makefile:
OCAMLC=    ocamlfind ocamlc   -w Aelz -g
OCAMLOPT=  ocamlfind ocamlopt -w Aelz -unsafe -inline 1000 -ccopt -O2 -ccopt -fomit-frame-pointer
OCAMLLEX=  ocamllex.opt
OCAMLYACC= ocamlyacc
OCAMLDEP=  ocamldep $(PP)
OCAMLMKLIB=ocamlmklib #-custom -linkall

COMN=\
    gif.ml noise.ml

SRCS=\
    $(COMN) test.ml fbm.ml

OBJS=$(COMN:.ml=.cmo)

LIBS=

PROGRAMS=\
    test fbm

all: $(PROGRAMS) $(PROGRAMS:=.bc)

fbm.bc: $(OBJS) fbm.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
fbm: $(OBJS:.cmo=.cmx) fbm.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

test.bc: $(OBJS) test.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
test: $(OBJS:.cmo=.cmx) test.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

clean:
    rm -f *.cm[iox] *.o *.a *~
distclean: clean
    rm -f $(PROGRAMS) $(PROGRAMS:=.bc) *.gif

.SUFFIXES: .cmo .cmi .cmx .ml .mli .mll .mly

.mly.ml:
.mly.mli:
    $(OCAMLYACC) $<

.mll.ml:
    $(OCAMLLEX) $<

.ml.cmo:
    $(OCAMLC) -c $<

.mli.cmi:
    $(OCAMLC) -c $<

.ml.cmx:
    $(OCAMLOPT) -c $<

.depend:
    $(OCAMLDEP) $(SRCS) > .depend

include .depend

2012-01-06

Perlin's Simplex Noise

Graphics pioneer Ken Perlin is well-known for his work on procedural generation of textures. Recently he rewrote his classic Noise algorithm to reduce the complexity of generating pseudorandom gradients in higher dimensions by transforming a problem on N-cells to one on N-simplices. I couldn't find on his NYU home page a write-up by his own hand; apparently the only implementation exists in his Java applet. There is a well-known exegesis by Stefan Gustavson that completely rewrites the algorithm; I've also found a write-up by Perlin himself in a collection of course notes. The latter shows an extremely terse implementation in Java (Appendix B) that purports to correspond directly with hardware operations implemented in a specialized pipelined processor. I'll show an unraveled re-implementation of that Java algorithm that exactly corresponds to Prof. Perlin's, in the sense that the outputs are the same, but with the magic hopefully explained clearly. I will assume, however, that you are familiar with the original algorithm; if not, read Gustavson's analysis for a clear introduction to its workings.


The key to Perlin's Noise is that it interpolates pseudo-random gradients attached to lattice points. In the original algorithm, those gradients are computed by successive lookups using a permutation table. The new algorithm is geared towards an 8.8 fixed-point format; instead of tables it uses bit-slicing operations to compute two pseudo-random 3-bit hashes from the lattice points:


let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

I've expressed the patterns in octal to stress the fact that the result is a 6-bit number. btst simlpy extracts the b-th bit from n. bmix takes a 3-bit slice from the b-th bits of i, j and k, in that order, and uses it to index on patterns. shuffle adds up bmixes of eight circular rotations of (i, j, k) for each bit 0, 1 … 7. The procedure seems rather ad-hoc, but in fact produces rather well-distributed 3-bit fields among all the (28)3 = 16 Mi combinations of (i, j, k):


NP[Lower 3 bits = N]P[Upper 3 bits = N]
00.1244812011718750.125709712505340576
10.12536621093750.126831173896789551
20.1250.128595232963562
30.12463378906250.127514779567718506
40.1255187988281250.123155772686004639
50.12463378906250.120522260665893555
60.1250.12257838249206543
70.12536621093750.125092685222625732

These 3-bit hashes are used to compute a magnitude for vector (x, y, z) by selecting each coordinate with probability 3/4, applying a random sign to each with probability 1/2 and adding the resulting values:


let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

This amounts to finding the dot product of the vector (x, y, z) with a random gradient on the unit cell. Generating all combinations and extracting the gradient vectors we get the following diagram:


Distribution of the random basis vectors

Each dot shows the frequency with which that vector is selected; the diagonal vectors (±1, ±1, ±1) occur half as frequently as the mid-edge vectors (±1, ±1, 0) —and rotations—, to account for their greater magnitude and thus weight. All this is done without multiplying! If you study Perlin's paper and code, you will notice that he uses bit operations to select each of the x, y and z coordinates and to add or subtract them, to avoid using tables; however, the code doesn't quite correspond to the explanation given in the text. I've unrolled the conditionals into two jump tables in a way that duplicates the effect of the code; thus the first jump table is not equivalent to the one given in the text.


The key to the efficiency of this algorithm is not just in the use of bit operations for generating pseudo-random gradients but in the subdivision strategy used to combine them in ℝN. In the original algorithm, each of the 2N vertices in the N-cell are associated with a gradient which is cubically interpolated between pairs of hyperplanes. This not only generates visual artifacts since the interpolation scheme is not C²-continuous, it requires O(2N) work. The new algorithm mitigates the first problem by using a quartic interpolant, and solves the second by decomposing the N-cell into N! simplices:


let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]

This requires interpolating among N + 1 vertices taken in pairs, for O(N²) work total. The simplex in which a point (u, v, w) in the unit cell lies is determined by the permutation index of coordinates (u, v, w):


let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  else
  if v >= w then 3 else
  if u >= v then 4 else 5

Perlin's code uses a different method to directly travel along the simplex edges by incrementing each coordinate according to the permutation index; I've opted for using a table to simplify the code and eliminate the need for global variables. Each point (x, y, z) is decomposed into a sum of a lattice vector (i, j, k) and a point in the unit cell (u, v, w). To account for the different volumes in the simplicial decomposition of the unit N-cell, the vectors are skewed:


let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

and unskewed:


let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

along the main diagonal of the N-cell. This amounts to a change of basis between a cubic and a tetrahedral grid and back. In general, the skewing factor is (√(N+1) - 1)/N and the unskewing factor is (N + 1 - √(N+1))/(N(N+1)). A couple of simple functions will simplify the main routine:


let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

Now given a point p, noise finds the lattice vector l and the unit-cell coordinates x corresponding to it. It then selects the simplex s corresponding to x. Then, for each of its vertices v it finds the simplicial coordinates y of x relative to v. It computes in t a radial basis for y such that only the gradients attached to the simplex's vertices contribute to the final result. If it does, it computes the hash h corresponding to v, uses it to find the pseudo-random gradient applied to y and accumulates the corresponding contribution for this vertex:


let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y
  done;
  !f

That's it. noise returns a floating-point number in the interval [-1, 1]; in order to use it in pixmaps it is frequently more useful to rescale it to an unsigned byte:


let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

clampb is a nice little branchless routine to constrain an integer to the range [0, 255]. Note that in OCaml, integers are Sys.word_size - 1 bits long; in Java, for instance, the code would be:


public static int clampb(int n) {
  return (n | ((255-n) >>> 31)) & ~(n >>> 31) & 255;
}

(teasing that code apart is a nice exercise in bit twiddling.) This implementation produces the same result, pixel for pixel, as Perlin's code. This is a slice in the interval (-2, -2, 0)–(2, 2, 0) as a GIF file computed with this code:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

and this is the same slice as a PNG file computed in Java with the class given by Perlin in his Appendix B:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in Java

There are two avenues open for further explorations of this algorithm. The first is to generalize it to an arbitrary number of dimensions; Gustavson's paper is a good starting point for that. The other is to go further with Perlin's original intention of building a hardware generator and eliminate the use of floating-point operations. This is mostly mechanical, except for the calculation of t in the inner loop, which requires at least a 0.24 fixed-point format.