Showing posts with label OCaml. Show all posts
Showing posts with label OCaml. Show all posts

2012-10-25

How to Write a Simple Web Application Using Ocamlnet

This post might seem to be in apparent contradiction: Ocamlnet is a large, very opinionated framework for network programming that solves most, if not all, those infrastructure issues that need to be taken care of when writing a networked, distributed, fault-tolerant server, from process management to string decoding and including protocol parsing and building. The fact is that Ocamlnet is not particularly complicated for all that it does, and it does quite a bit; but there are (for me at least) three hurdles to overcome in order to start using it effectively:

  • It is big. The project includes Posix wrappers, string handling, process management, CGI programming, various protocols, RPC registration, interprocess communication…
  • It is strange. Most of the API is object oriented rather than functional, using inheritance extensively
  • It is underdocumented. While the API documents are complete, and the project page includes some cookbook examples, for most non-trivial needs you have to go deep into the source code

In this instance I'll follow a tutorial, top-down style and I won't necessarily show complete compile-and-run code, which means that you'll have to reconstruct the source code to a compilable state, but I hope it will still be useful as a starting point and guide to writing HTTP services with Ocamlnet. This tutorial assumes you have installed OCaml 4.0 with Findlib, Ocamlnet 3.6, Yojson and its dependencies and PCRE-Ocaml.

Also, I won't attempt to compare it with Ocsigen since I've never used it, not least because I'm not really fond of convention-over-configuration-style frameworks I don't know it at all, and maybe I'm irrationally prejudiced :-). If you want full control of your code and feel comfortable with a large number of tweakable knobs, then Ocamlnet is for you. If you have work that needs to be done quickly and you're used to RoR or Django, Ocamlnet will definitely set your hair on fire.

Setting up a Netplex server: http.ml

The architecture I use is a Web application written in HTML that consumes JSON services published in the back-end. Ocamlnet lets me serve both the static content (HTML, CSS, JavaScript and graphic assets) and route a number of dynamic services. I can use just that for development, and have a production Apache server that serves the static content and proxies the dynamic requests to the Ocamlnet-published services. Another simplification I make is that I manually route the method URLs instead of letting Ocamlnet do it itself, as it is otherwise perfectly capable to. This makes it simpler to configure the services, at the cost of having to explicitely handle routing in code.

Let's begin by taking a look at the main function:

let main () =
  let database f =
    db_file := Io.get_absolute_path f;
    if not (Sys.file_exists !db_file) then raise (Arg.Bad ("File not found: " ^ f))
  in
  let usage = "usage: " ^ (Filename.basename Sys.executable_name) ^ " [options]" in
  let opt_list, cmdline_cfg = Netplex_main.args () in
  let opt_list = Arg.align ([
"-db"        , Arg.String database,                   "<file>  Database file";
  ] @ opt_list) in
  Arg.parse opt_list (fun s -> raise (Arg.Bad ("Invalid argument: " ^ s))) usage;
  Netsys_signal.init ();
  Netplex_main.startup
    ~late_initializer:(fun _ _container ->
      Netlog.logf `Notice "Starting up")
    (Netplex_mp.mp ())
    Netplex_log.logger_factories
    Netplex_workload.workload_manager_factories
    [
      service_factory ();
    ]
    cmdline_cfg

let () = if not !Sys.interactive then main ()

Netplex is the part of Ocamlnet that orchestrates the management and intercommunication between the processes that make up a network service. It has a number of command-line options for configuration, most notably -fg to launch the service in the foreground instead of as a detached dæmon. Netplex_main.args gives back a list of needed options upon which to add program-specific ones. In this case the only option is to pass a database file. Every filesystem resource must be accessed by absolute path, since Netplex changes the working directory to / upon startup. This file is stored in a global reference:

let db_file = ref (Io.get_absolute_path "myfile.db")

Once the command line is parsed, the service is created. First, Ocamlnet has to take over signal handling to give the service an orderly lifecycle (Netsys is the collection of modules providing cross-platform POSIX functionality). Netplex is then started with the multi-process parallelizer, the standard log and workload factories that set up themselves out of the configuration file, and a single custom service factory that will create the HTTP services themselves:

let service_factory = Nethttpd_plex.nethttpd_factory
  ~name:"nethttpd"
  ~hooks:(new environment_hooks)
  ~config_cgi:Netcgi.({
    default_config with
    default_exn_handler = false;
    permitted_input_content_types = [
      "application/json";
    ] @ default_config.permitted_input_content_types
  })
  ~handlers:[
    "json_service", Net.json_service ~dyn_uri:"/cgi" [
      "daterange", with_db Daterange.json;
      "calculate", with_db Calculate.json;
      "calendar" , with_db Calendar.json;
    ];
  ]
  ~services:Nethttpd_plex.default_services

The service name must correspond with that defined in the config file. In order to arrange for the workers to have access to the database file I intercept the service creation with hooks to their lifecycle process to open it and close it as needed. Netcgi sets up the environment that each HTTP service requires to function; in this case I use a minimal configuration that augments valid POST MIME types with a type for JSON requests (not otherwise used in this case) and opt out of the standard exception handler in exchange for my own. I configure a single "json_service" handler that will dispatch to the relevant methods of type cgi_activation → Yojson.Basic.json. The Netplex services for this service are the default Nethttpd_plex ones required by the infrastructure in order to manage the lifecycle of the process group: creation, tear-down and IPC. Note well that the factory is a thunk, not a data structure, the resulting type is unit → Netplex_types.processor_factory.

The lifecycle hooks are specified as a subclass of Netplex_kit.empty_processor_hooks. It uses the Netplex environment plug-in to store a shared reference to the database in a way that both thread- and process-based services can access in an uniform manner:

class environment_hooks = object
  inherit Netplex_kit.empty_processor_hooks ()
  method! post_add_hook _ container =
    container#add_plugin Netplex_mutex.plugin;
    container#add_plugin Netplex_sharedvar.plugin
  method! post_start_hook container =
    Netlog.logf `Info "(%s) opening database \"%s\"" (Net.cur_sys_id ()) !db_file;
    try  set_db (DB.open_database !db_file)
    with DE42x.Error (msg) -> container#log `Crit msg
  method! pre_finish_hook _container =
    Netlog.logf `Info "(%s) closing database" (Net.cur_sys_id ());
    match get_db () with
    | Some db -> DB.close_db db
    | None    -> ()
end

In this case I open and close database connections (represented here as an open file descriptor) which are stored in a per-process environment:

let get_db, set_db =
  let env_id = "MyService.localenv" in
  let module E = Netplex_cenv.Make_var_type (struct type t = DB.t end) in
  (fun () -> try Some (E.get env_id) with Netplex_cenv.Container_variable_not_found _ -> None),
  E.set env_id

Neplex_cenv makes a strongly-typed accessor for shared variables; in this case I have just one keyed by env_id. As a utility I arrange for my service methods to be closed over a reference to the database (cf the handler setup above):

let with_db proc arg = match get_db () with
| None    -> Net.failf "no database!"
| Some db -> proc db arg

A Nethttp JSON framework: net.ml

Every Nethttpd_plex-based service follows the same structure, while the specifics will make up for the bulk of the application. In this example these details have to do with utilities that make consuming and producing JSON data easier. I have a Net module with a number of helpers, of which I've used two already, cur_sys_id and failf:

let failf fmt = Printf.ksprintf failwith fmt
and argf  fmt = Printf.ksprintf invalid_arg fmt

let cur_sys_id () = match Netplex_cenv.current_sys_id () with
| `Process pid -> Printf.sprintf "PID %d" pid
| `Thread  tid -> Printf.sprintf "TID %d" tid

Another useful function is an encoding-safe string wrapper:

let text = Netencoding.Html.encode_from_latin1

Normally, Nethttp sends HTML 4.01-formatted error messages. In a JSON-based application it is preferable to have standardized JSON errors:

let error_json (env : Netcgi.cgi_environment) status fields cause message =
  let json_of_header hdr =
    try `String (env#input_header_field hdr)
    with Not_found -> `Null
  in
  try
    let script_name = env#cgi_script_name in
    let code = Nethttp.int_of_http_status status in
    env#log_error (Printf.sprintf "%s: %s (Status %i)" script_name message code);
    env#set_output_header_fields []; (* reset *)
    env#set_output_header_field "Content-type" "application/json; charset=utf-8";
    env#set_status status;
    env#set_multiple_output_header_field "Cache-control"
      [ "max-age=0"; "must-revalidate" ];
    let secs = Netdate.mk_mail_date (Unix.time ()) in
    env#set_output_header_field "Expires" secs;
    List.iter (fun (n,v) -> env#set_multiple_output_header_field n v) fields;
    env#send_output_header();
    if env#cgi_request_method <> "HEAD" then
      Yojson.Basic.to_output env#out_channel (`Assoc [
        "status"       , `Int    code;
        "statusText"   , `String (Nethttp.string_of_http_status status);
        "cause"        , `String cause;
        "message"      , `String message;
        "scriptName"   , `String script_name;
        "requestMethod", `String env#cgi_request_method;
        "queryString"  , `String env#cgi_query_string;
        "referrer"     ,  json_of_header "referer"
      ]);
    env#out_channel#flush ();
    env#out_channel#close_out ()
  with e ->
    Netlog.logf `Crit "Unexpected exception %s" (Printexc.to_string e)

This is a good example of how to use the cgi_environment to query the CGI execution and to exert maximum control over the HTTP response. I raise standard Ocaml exceptions from the method handlers and translate them into the relevant HTTP status codes by wrapping them in a higher-order protective function:

let protect handler (cgi : Netcgi.cgi_activation) =
  try handler cgi
  with
  | Netcgi.Argument.Oversized ->
    error_json cgi#environment `Request_entity_too_large []
      "Oversized" "A POST parameter exceeds maximum allowed size"
  | Invalid_argument msg ->
    error_json cgi#environment `Bad_request []
      "Bad request" (text msg)
  | Failure msg ->
    error_json cgi#environment `Internal_server_error []
      "Method failure" (text msg)
  | Not_found ->
    error_json cgi#environment `Not_implemented []
      "Not implemented" "The requested operation is not implemented"
  | exn ->
    let msg = Printexc.to_string exn in
    error_json cgi#environment `Internal_server_error []
      "Internal server error" ("Unexpected exception: " ^ text msg)

It is not normally necessary to manipulate the cgi_environment in such a detailed, low-level manner; the cgi_activation does pretty much the same thing in a easier-to-use way:

let send_json json (cgi : Netcgi.cgi_activation) =
  cgi#set_header ~content_type:"application/json; charset=utf-8" ~cache:`No_cache ();
  Yojson.Basic.to_output cgi#out_channel json;
  cgi#output#commit_work ()

Note well that Yojson doesn't provide a streaming interface: you must build the entire JSON value, which gets serialized in bulk; this makes necessary to configure the HTTP service so that the cgi_activations it creates are at least buffered:

let json_service ?dyn_uri handlers =
  let dyn_translator path =
      let len  = String.length path in
      let path = if len != 0 && path.[0] = '/'
        then String.sub path 1 (len - 1)
        else path
      in
      Pcre.replace ~pat:"/" ~templ:"_" path
  and dyn_handler env cgi =
    protect (fun cgi ->
      let h = List.assoc env#cgi_path_translated handlers in
      send_json (h cgi) cgi) cgi
  in Nethttpd_services.({
    dyn_handler;
    dyn_activation = std_activation `Std_activation_buffered;
    dyn_uri;
    dyn_translator;
    dyn_accept_all_conditionals = true;
  })

The dynamic path translator removes the leading slash and turns subsequent slashes into underscores, so that a method in the namespace /cgi, say can serve as a look-up key in the list of handlers: a call to /cgi/my/method/name will turn into a key my_method_name. This is, of course, a completely arbitrary decision. The dynamic handler in turn looks up the method handler (recall, of type cgi_activation → Yojson.Basic.json) by this key, calls it with the cgi_activtion expecting a JSON response and sends it out. Since the handling is protected against exceptions, any missing method, parameter validation error or exceptional condition is sent out as the corresponding HTTP error response.

Speaking of parameter extraction, I don't use anything fancy like parsing combinators, just plain old higher-order functions and regular expressions validating the result of CGI accessor functions:

let with_arg arg f = Io.unwind ~protect:(fun arg -> arg#finalize ()) f arg

let get_arg cgi name =
  try  Some (with_arg (cgi#argument name) (fun arg -> arg#value))
  with Not_found -> None

let parse ?default ~validate ~parse cgi name =
  match default, get_arg cgi name with
  | None  , None    -> argf "Missing parameter \"%s\"" name
  | Some v, None    -> v
  | _     , Some p  ->
    try  parse (Pcre.extract ~rex:validate ~full_match:false p)
    with Not_found -> argf "Invalid parameter \"%s\"" name

Since CGI arguments can be, if large, buffered into a temporary file, Netcgi requires explicit finalization. Every error is signaled with an Invalid_argument exception which protect catches and translates into a HTTP 400 (Bad Request) via error_json. Parsing specific argument types is straightforward:

let re_char     = Pcre.regexp "^(.)$"
let re_bool     = Pcre.regexp "^(true|false)$"
let re_int      = Pcre.regexp "^([-+]?\\d+)$"
let re_float    = Pcre.regexp "^([-+]?\\d+(?:.\\d*))$"
let re_date     = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})$"
let re_datetime = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})[ T](\\d{2}):(\\d{2}):(\\d{2})(Z|[-+]\\d{4})$"

let parse_char ?default cgi = parse ?default ~validate:re_char
  ~parse:(fun res -> res.(0).[0]) cgi

let parse_bool cgi = parse ~default:false ~validate:re_bool
  ~parse:(fun res -> bool_of_string res.(0)) cgi

let parse_int ?default cgi = parse ?default ~validate:re_int
  ~parse:(fun res -> int_of_string res.(0)) cgi

let parse_float ?default cgi = parse ?default ~validate:re_float
  ~parse:(fun res -> float_of_string res.(0)) cgi

let parse_date ?default cgi = parse ?default ~validate:re_date ~parse:(fun res ->
  let year   = int_of_string res.(0)
  and month  = int_of_string res.(1)
  and day    = int_of_string res.(2)
  in let dummy = Netdate.create 0. in
  Netdate.({ dummy with year; month; day; })) cgi

let parse_datetime ?default cgi = parse ?default ~validate:re_datetime ~parse:(fun res ->
  let year   = int_of_string res.(0)
  and month  = int_of_string res.(1)
  and day    = int_of_string res.(2)
  and hour   = int_of_string res.(3)
  and minute = int_of_string res.(4)
  and second = int_of_string res.(5)
  and zone   = match res.(6) with
  |"Z" -> 0
  | dt ->
    let hrs = int_of_string (String.sub dt 1 2)
    and mns = int_of_string (String.sub dt 3 2) in
    let off = 60 * hrs + mns in
    if dt.[0] == '+' then off else -off
  in let dummy = Netdate.create 0. in
  Netdate.({ dummy with year; month; day; hour; minute; second; zone; })) cgi

Writing the JSON methods: myservice.ml

That is the infrastructure, in broad strokes. I put each method in its own module, following a simple convention:

  • I define a type t of parsed method arguments:

    type t = {
      lon : float;
      lat : float;
      dt  : Netdate.t;
      jd  : Jd.t;
      tz  : int;
      lim : Jd.t option;
    }
    

    (in this instance, Jd.t is the type of dates represented as Julian dates)

  • I define a validate function to parse the CGI arguments into a value of type t:

    let validate db cgi =
      let open Net in
      let lon = parse_float    cgi "lon"
      and lat = parse_float    cgi "lat"
      and dt  = parse_datetime cgi "dt" in
      let jd  = jd_of_netdate   dt in
      let tz  = parse_int      cgi "tz"  ~default:dt.Netdate.zone in
      let lim = if parse_bool  cgi "tab"
        then Some (Jd.min db.DB.max_date (jd_of_netdate (Net.parse_date cgi "lim")))
        else None
      in
      if not (-180. <= lon && lon <= 180.) then
        Net.argf "Longitude out of range"
      else
      if not (-66.56 <= lat && lat <= 66.56) then
        Net.argf "Latitude out of range"
      else
      if Jd.compare jd db.DB.min_date < 0 then
        Net.argf "Date too early"
      else
      if Jd.compare jd db.DB.max_date > 0 then
        Net.argf "Date too late"
      else
      { lon = lon /. Trig.radian;
        lat = lat /. Trig.radian;
        dt;
        jd;
        tz;
        lim; }
    
  • I define and export a json function to generate the actual output:

    let json db cgi =
      let req = validate db cgi in
      let tz  = req.dt.Netdate.zone / 60 in
      let tdt = Jd.dynamic_time req.jd in
      (* … *)
      `Assoc [
        "jd"           , `Float  t;
        "dt"           ,  Net.time   Jd.(tdt <-> req.jd);
        "lst"          ,  Net.time  (lst /. Jd.secs_day);
        "lon"          ,  Net.angle  ~reduce:false req.lon;
        "lat"          ,  Net.angle  ~reduce:false req.lat;
        (* … *)
      ]
    

    (functions Net.time and Net.angle return appropriate JSON values). This exported function goes into the dynamic method map, as seen in the service_factory above.

Configuring the Netplex server: myservice.conf

That is mostly it, code-wise. It remains the detail of configuring Netplex. I use a simple myservice.conf file:

netplex {
  controller {
    max_level = "info";
    logging {
      type = "file";
      file = "/var/log/myservice.log";
      component = "*";
      subchannel = "*";
      max_level = "info";
    };
  };

  service {
    name = "myservice";
    protocol {
      name = "http";
      tcp_nodelay = true;
      address {
        type = "internet";
        bind = "0.0.0.0:8080";
      };
    };
    processor {
      type = "nethttpd";
      timeout = 60.0;
      timeout_next_request = 6.0;
      access_log = "enabled";
      suppress_broken_pipe = true;
      host {
        pref_name = "localhost";
        pref_port = 8080;
        names = "127.0.0.1:0";
        uri {
          path = "/";
          service {
            type = "file";
            docroot = "/path/to/static/";
            media_types_file = "/etc/mime.types";
            default_media_type = "application/xhtml+xml";
            enable_gzip = true;
            enable_listings = false;
            index_files = "index.html";
          };
        };
        uri {
          path = "/cgi";
          method {
            allow = "GET";
            service {
              type = "dynamic";
              handler = "json_service";
            };
          };
        };
      };
    };
    workload_manager {
      type = "dynamic";
      max_jobs_per_thread = 1;
      min_free_jobs_capacity = 2;
      max_free_jobs_capacity = 5;
      max_threads = 50;
    };
  };
}

Note that the Nethttpd_plex section declares two URIs: the root path maps to a file service that will serve the static content, defaulting to XHTML, while the /cgi prefix will map to the dynamic JSON handler. This is useful for development, since it only requires launching myservice -fg and trying it with a Web browser on http://127.0.0.1:8080/. In production I set up Apache with mod_proxy like this:

Alias /myservice /path/to/static

<Directory /path/to/static>
  Options FollowSymLinks
  AllowOverride All
  Order allow,deny
  Allow from all
</Directory>

<Location /myservice/>
  AuthType Digest
  AuthName "SERVICE"
  AuthDigestDomain /myservice/
  AuthUserFile /etc/httpd/passwd
  Require valid-user
</Location>

ProxyPass /myservice/cgi  http://127.0.0.1:8080/cgi

(where /path/to/static and /cgi must match what is configured in myservice.conf). Of course you can map your application to the HTTP root, in this case I have a single Apache instance serving various sub-paths.

Compiling: Makefile

It is not necessary to complicate the build process with anything more than a properly written Makefile. In this case I have one interface and one implementation for each JSON method (which you will note don't quite correspond to the dynamic service set-up I've shown first). Note well the list of PACKAGES required for building:

OCAMLFLAGS        = -thread -w @a -unsafe
OCAMLOPTFLAGS     = $(OCAMLFLAGS) -inline 10000
OCAMLLIBS         = unix.cma
CFLAGS            = -I/opt/ocaml/lib/ocaml -arch x86_64 -O3 -Wall -Wextra
PACKAGES          = -package threads,pcre,yojson,netplex,netcgi2,nethttpd

SRC  =            \
    net.ml        \
    myservice.ml  \
    http.ml

PROGS = myservice

all: $(PROGS)

myservice: $(SRC:%.ml=%.cmx)
    ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -linkpkg $+ -o $@

%.cmi: %.mli
    ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $<

%.cmo: %.ml
    ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $<

%.cmx: %.ml
    ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -c $<

%.o: %.c
    ocamlfind ocamlc -ccopt "$(CFLAGS)" -c $<

clean:
    /bin/rm -rf *.o *.a *.so *.cmi *.cmo *.cmx *~ *.log

distclean: clean
    /bin/rm -rf $(PROGS) depend

depend:
    $(OCAMLDEP) -one-line $(OCAMLLIBS) *.ml *.mli > depend

include depend

.PHONY: clean distclean all

Conclusion

There are a number of more advanced issues I'd like to address in the future. As it is, this framework can handle simple GET and POST requests but won't parse multipart attachments nor handle file transfers. Another thing it doesn't handle is HTTP Authorization; for simple requirements a simple filter can work, while for more complicated set-ups the best way to go is, in my opinion, to leverage Apache as I did here.

For those of you that have to interoperate with SOAP Web Services, the same architectural strategy is perfectly applicable with the aid of PXP and perhaps OC-SOAP.

A big field for further exploration is how to structure a complex Web application into independent services; Netplex makes that possible, if not necessarily easy. There is a hot architectural trend making some noise now called Command-Query Separation (CQS); this pattern can be profitably implemented with a single Netplex RPC service that handles all commands to which the Nethttpd_plex workers delegate. The advantages of this separation are enforced separation of concerns and automatic, transparent fault tolerance and distribution, both of which are the guiding principles behind Ocamlnet's design.

A closing remark that I don't want to miss on making is that the payoff of using Ocamlnet's process model is that it is really fast. My "production" server is an ancient 400 MHz PPC with 384 MB RAM which is perfectly capable of producing and serving really computationally-intensive content with minimal environmental requirements. This is something that I simply couldn't hope to pull off with PHP or Java. I encourage you to try Ocamlnet and see if you find, like I do, that Ocaml is the language of choice of discriminating Web developers.

2012-08-02

A Helping Phantom Hand

You don't have to be writing an interpreter or some other kind of abstract code to profit from some phantom types. Suppose you have two or more functions that work by "cooking" a simple value (a float, say) with a lengthy computation before proceeding:

let sun_geometric_longitude j =
  let je = to_jcen (to_jde j) in
  (* … computation with je … *)

let sun_apparent_longitude j =
  let je = to_jcen (to_jde j) in
  let q  = sun_geometric_longitude j in
  (* … computation with je … *)

In this case j is a date expressed in Julian Days as a float, and to_jde computes the Ephemeris Time as a 63-term trigonometric polynomial correction on it. sun_apparent_longitude calls sun_geometric_longitude and both call to_jde. Obviously this unnecessary duplication can be factored out:

let sun_geometric_longitude je =
  let t  = to_jcen je in
  (* … computation with je … *)

let sun_apparent_longitude je =
  let q  = sun_geometric_longitude je in
  let t  = to_jcen je in
  (* … computation with je … *)

let foo j =
  let je = to_jde j in
  let l  = sun_apparent_longitude je in
  (* … *)

(to_jcen is cheap and not worth factoring out.) But now a naked float represent two different things, Universal Time and Ephemeris Time, and we have a valid concern of mixing them up. We can wrap the time in an ADT:

type dt = UT of float | TD of float

let to_jcen j = match j with
| UT j ->
  (* … lengthy computation … *)
  TD j
| TD _ -> invalid_arg "to_jcen"

let sun_geometric_longitude je = match je with
| TD je ->
  let t  = to_jcen je in
  (* … computation with je … *)
| UT _  -> invalid_arg "sun_geometric_longitude"

let sun_apparent_longitude je = match je with
| TD je ->
  let q  = sun_geometric_longitude je in
  let t  = to_jcen je in
  (* … computation with je … *)
| UT _  -> invalid_arg "sun_apparent_longitude"

let foo j =
  let je = to_jde j in
  (* … computation with sun_apparent_longitude je … *)

but this forces us to check at run-time whether we mixed times up in our code. A better technique is to use a phantom type. First fix an abstract signature for the module implementing these functions:

module Test : sig
  type 'a dt

  val datetime : yy:int -> mm:int -> dd:int -> hh:int -> nn:int -> ss:float -> [ `JD ] dt
  val to_jde   : [ `JD ] dt -> [ `JDE ] dt
  val to_jcen  : 'a dt -> float

  val sun_geometric_longitude : [ `JDE ] dt -> float
  val sun_apparent_longitude  : [` JDE ] dt -> float
end = struct
  (* … *)

We have a way to construct our type α dt from a Universal datetime, a way to convert it to Dynamical Time with to_jde and operations that respect the kind of measure. The implementation is as before:

  (* … *)
  type 'a dt = float (* phantom type! *)

  let datetime ~yy ~mm ~dd ~hh ~nn ~ss = (* … *)

  let to_jde  j = (* … *)
  let to_jcen j = (* … *)

  let sun_geometric_longitude je =
    (* … computation with a statically checked je … *)

  let sun_apparent_longitude je =
    let q  = sun_geometric_longitude je in
    (* … computation with a statically checked je … *)
end

Now the compiler checks for us that we don't mix up measures. The only inconvenient of this approach is that the type α dt is fully abstract, and you must provide coercions, string_ofs and pretty printers for it if you need to show them or debug your code. There is a way out, though; just make it a private type abbreviation:

module Test : sig
  type 'a dt = private float
  (* … signature exactly as before … *)
end = struct
  type 'a dt = float (* phantom type! *)
  (* … implementation exactly as before … *)
end

Now α dt will be shown in the top-level, can be printed with a coercion (je :> float), etc.

For another simple example, suppose you want to represent sets as lists. The best way to do that is to keep them sorted so that set operations run in linear time. If you want to provide some operations that temporarily destroy the ordering, a phantom type can keep track of the invariant "this list is sorted":

module Set : sig
  type ('a, 'b) t = private 'a list

  val of_list : 'a list -> ('a, [ `S ]) t
  val as_set  : ('a, [ `U ]) t -> ('a, [ `S ]) t
  val empty   : ('a, [ `S ]) t
  val mem     : 'a -> ('a, [ `S ]) t -> bool
  val add     : 'a -> ('a, [ `S ]) t -> ('a, [ `S ]) t
  val union   : ('a, [ `S ]) t -> ('a, [ `S ]) t -> ('a, [ `S ]) t
  val inter   : ('a, [ `S ]) t -> ('a, [ `S ]) t -> ('a, [ `S ]) t
  val append  : ('a, 'b) t -> ('a, 'c) t -> ('a, [ `U ]) t
end = struct
  type ('a, 'b) t = 'a list

  let of_list l   = List.sort compare l
  and as_set  l   = List.sort compare l
  and empty       = []
  let union   l r = (* … linear merge … *)
  and inter   l r = (* … linear merge … *)
  let mem     e   = List.mem e
  and add     e   = union [e]
  and append      = List.append
end

The phantom type [ `S | `U ] tracks the sortedness of the list. Note that in the case of append the argument lists can have any ordering but the result is known to be unsorted. Note also how the fact that the empty list is by definition sorted is directly reflected in the type.

2012-07-19

Theorems for Free: The Monad Edition

This is for the record, since the derivations took me a while and I'd rather not lose them.

A functor is the signature:

module type FUNCTOR = sig
  type 'a t
  val fmap : ('a -> 'b) -> ('a t -> 'b t)
end

satisfying the following laws:

Identity:    fmap id      ≡ id
Composition: fmap (f ∘ g) ≡ fmap f ∘ fmap g

An applicative structure or idiom is the signature:

module type APPLICATIVE = sig
  type 'a t
  val pure : 'a -> 'a t
  val ap : ('a -> 'b) t -> ('a t -> 'b t)
end

satisfying the following laws:

Identity:     ap (pure id)                  ≡ id
Composition:  ap (ap (ap (pure (∘)) u) v) w ≡ ap u (ap v w)
Homomorphism: ap (pure f) ∘ pure            ≡ pure ∘ f
Interchange:  ap u (pure x)                 ≡ ap (pure (λf.f x)) u

An applicative functor is the structure:

module type APPLICATIVE_FUNCTOR = sig
  type 'a t
  include FUNCTOR     with type 'a t := 'a t
  include APPLICATIVE with type 'a t := 'a t
end

that is simultaneously a functor and an applicative structure, satisfying the additional law:

Fmap: fmap ≡ ap ∘ pure

A monad is the structure:

module type MONAD = sig
  type 'a t
  val return : 'a -> 'a t
  val bind : ('a -> 'b t) -> ('a t -> 'b t)
end

satisfying the following laws:

Right unit:    bind return     ≡ id
Left unit:     bind f ∘ return ≡ f
Associativity: bind f ∘ bind g ≡ bind (bind f ∘ g)

Every monad is an applicative functor:

module ApplicativeFunctor (M : MONAD)
: APPLICATIVE_FUNCTOR with type 'a t = 'a M.t
= struct
  type 'a t = 'a M.t
  open M
  let fmap f = bind (fun x -> return (f x))
  let pure   = return
  let ap u v = bind (fun f -> fmap f v) u
end

This can be proved by easy (but tedious) equational reasoning. That the proof is rigorous is Wadler's celebrated result.

Proof of the Functor Identity:

  fmap id
≡ { definition }
  bind (return ∘ id)
≡ { composition }
  bind return
≡ { Monad Right unit }
  id

Proof of the Functor Composition:

  fmap f ∘ fmap g
≡ { definition }
  bind (return ∘ f) ∘ bind (return ∘ g)
≡ { Monad Associativity }
  bind (bind (return ∘ f) ∘ return ∘ g)
≡ { Monad Left unit }
  bind (return ∘ f ∘ g)
≡ { definition }
  fmap (f ∘ g)

A number of naturality conditions are simple equations between λ-terms. I'll need these later:

Lemma 1 (Yoneda):

  fmap f ∘ (λh. fmap h x)
≡ { defn. ∘, β-reduction }
  λg. fmap f (fmap g x)
≡ { defn. ∘ }
  λg. (fmap f ∘ fmap g) x
≡ { Functor Composition }
  λg. fmap (f ∘ g) x
≡ { abstract }
  λg. (λh. fmap h x) (f ∘ g)
≡ { defn. ∘, η-contraction }
  (λh. fmap h x) ∘ (f ∘)

Lemma 2:

  fmap f ∘ return
≡ { definition }
  bind (return ∘ f) ∘ return
≡ { Monad Left unit }
  return ∘ f

Lemma 3:

  bind f ∘ fmap g
≡ { definition fmap }
  bind f ∘ bind (return ∘ g)
≡ { Monad Associativity }
  bind (bind f ∘ return ∘ g)
≡ { Monad Left unit }
  bind (f ∘ g)

Lemma 4:

  bind (fmap f ∘ g)
≡ { definition fmap }
  bind (bind (return ∘ f) ∘ g)
≡ { Monad Associativity }
  bind (return ∘ f) ∘ bind g
≡ { definition fmap }
  fmap f ∘ bind g

The Applicative Functor condition is easy to prove and comes in as a handy lemma:

  ap ∘ pure
≡ { definition }
  λv. bind (λf. fmap f v) ∘ return
≡ { Monad Left unit }
  λv. λf. fmap f v
≡ { η-contraction }
  fmap

Proof of the Applicative Identity:

  ap (pure id)
≡ { Applicative Functor }
  fmap id
≡ { Functor Identity }
  id

Proof of the Applicative Homomorphism:

  ap (pure f) ∘ pure
≡ { Applicative Functor }
  fmap f ∘ pure
≡ { Lemma 2, defn. pure }
  pure ∘ f

Proof of the Applicative Interchange:

  ap (pure (λf.f x)) u
≡ { Applicative Functor }
  fmap (λf.f x) u
≡ { definition }
  bind (return ∘ (λf.f x)) u
≡ { defn. ∘, β-reduction }
  bind (λf. return (f x)) u
≡ { Lemma 2 }
  bind (λf. fmap f (return x)) u
≡ { definition }
  ap u (pure x)

The proof of the Applicative Composition condition is the least straightforward of the lot, as it requires ingenuity to choose the reduction to apply at each step. I started with a long, tedious derivation that required forward and backward reasoning; at the end I refactored it in byte-sized lemmas in order to simplify it as much as I could. As a heuristic, I always try to start from the most complicated expression to avoid having to guess where and what to abstract (that is, applying elimination rules requires neatness, while applying introduction rules requires backtracking):

  ap (ap (ap (pure (∘)) u) v) w
≡ { Applicative Functor }
  ap (ap (fmap (∘) u) v) w
≡ { definition }
  bind (λf. fmap f w) (bind (λf. fmap f v) (fmap (∘) u))
≡ { Lemma 3 with f := λf. fmap f v, g := (∘) }
  bind (λf. fmap f w) (bind ((λf. fmap f v) ∘ (∘)) u)
≡ { Monad Associativity with f := λf. fmap f w, g := (λf. fmap f v) ∘ (∘) }
  bind (bind (λf. fmap f w) ∘ (λf. fmap f v) ∘ (∘)) u
≡ { defn. ∘ (at right) }
  bind (λf. (bind (λf. fmap f w) ∘ (λf. fmap f v)) (f ∘)) u
≡ { defn. ∘, β-reduction }
  bind (λf. bind (λf. fmap f w) (fmap (f ∘) v)) u
≡ { Lemma 3 with f := λf. fmap f w, g := (f ∘) }
  bind (λf. bind ((λf. fmap f w) ∘ (f ∘)) v) u
≡ { Lemma 1 }
  bind (λf. bind (fmap f ∘ (λf. fmap f w)) v) u
≡ { Lemma 4 with g := λf. fmap f w }
  bind (λf. fmap f (bind (λf. fmap f w) v)) u
≡ { definition }
  ap u (ap v w)

What is this good for? I don't really know; Haskellers can't seem to be able to live without them while I can't seem to justify their application. I suspect laziness has a lot to do with it; in any case, the machinery is more palatable with the appropriate combinators:

module Functor (F : FUNCTOR) = struct
  include F
  let ( <$> ) = fmap
end

module Applicative (A : APPLICATIVE) = struct
  include A
  let ( <*> ) = ap
end

module Monad (M : MONAD) = struct
  include M
  include (ApplicativeFunctor (M)
         : APPLICATIVE_FUNCTOR with type 'a t := 'a t)
  let ( <$> )     = fmap
  let ( <*> )     = ap
  let ( >>= ) m f = bind f m
end

2012-07-12

A minor branch off Braun Trees

Revisiting the post on Braun Trees I noticed that, while pedagogical, the implementation of the root replacement operation rep can be streamlined a bit. By inlining the mutually recursive siftdown, specializing on the matches and adding guards, the result is as follows:

let rec rep compare e = function
| N (_, (N (el, _, _) as l), E)
  when compare el e  < 0 ->
  N (el, rep compare e l, E)
| N (_, (N (el, _, _) as l), (N (er, _, _) as r))
  when compare el e  < 0
    || compare er e  < 0 ->
    if compare el er < 0
      then N (el, rep compare e l, r)
      else N (er, l, rep compare e r)
| N (_, l, r) -> N (e, l, r)
| E           -> failwith "empty heap"

The guards are the strongest possible in order to minimize the needs to descend to the children. The conditional under the second case could be in turn lifted to a guard, since (el < eer < e) ∧ el < erel < eel < er, and similarly for the else case, but the loss of simplicity is quite substantial in my opinion.

2012-07-09

Existential Crisis

In a long discussion at LtU, user Jules Jacobs advances a use-case that for him would have been difficult to solve in a statically-typed language. I focus on his first two points:

  1. A data structure that is a container of T plus a function on T's. I would have needed existential types to hide the T, which I don't get in many languages.
  2. A couple of functions that take heterogeneous list of items that the function will convert to the interface it needs the elements to be with an extensible conversion function. […]

It's true, not many languages have existential types but OCaml happens to do, or at least allows existentials to be encoded. I'll take a big hint from MLton and I'll try to solve both problems (I won't pretend to solve his actual problems; just my interpretation of the problems he posed). Another reason why I write this post is because I always forget what the essence of existential types is. Pierce writes:

Similarly, there are two different ways of looking at an existential type, written {∃X,T}. The logical intuition is that an element of {∃X,T} is a value of type [XS]T, for some type S. The operational intuition, on the other hand, is that an element of {∃X,T} is a pair, written {*S,t}, of a type S and a term t of type [XS]T.

[…] To understand existential types, we need to know two things: how to build (or introduce, in the jargon of §9.4) elements that inhabit them, and how to use (or eliminate) these values in computations.

An existentially typed value is introduced by pairing a type with a term, written {*S,t}. A useful concrete intuition is to think of a value {*S,t} of type {∃X,T} as a simple form of package or module with one (hidden) type component and one term component. The type S is often called the hidden representation type, or sometimes (to emphasize a connection with logic, cf. §9.4) the witness type of the package. For example, the package p = {*Nat, {a=5, f=λx:Nat. succ(x)}} has the existential type {∃X, {a:X, f:XX}}. The type component of p is Nat, and the value component is a record containing a field a of type X and a field f of type XX, for some X (namely Nat).

(Types and Programming Languages, p. 363–364) I quote at some length because Pierce's presentation is so condensed that for me it really bears repeating; also, because every time I want a refresher I can come to this post instead of cracking open the physical book. The gist of it is pithily and more memorably stated in Mitchell and Plotkin's slogan, abstract types have existential type.

The complication is that in ML types are not values, so in order to pack an existential we must reify the type component as a term, for instance as a pair of functions, an injection and a projection:

module type UNIV = sig
  type t
  val embed : unit -> ('a -> t) * (t -> 'a)
end

Why UNIV when we're talking about existential types? Go ask Oleg. Actually, there is a type isomorphism analogous to the logical equivalence between (∃x. P x) ⇒ Q and x.(P x ⇒ Q); as Jeremy Gibbons writes …the justification being that a datatype declaration such as type e = ∃t. E of t foo introduces a constructor E : (∃t. t foo) → e, and this type is isomorphic to t.(t foo → e) because e is independent of t (Haskellisms paraphrased).

In any case, here the existential type is t, and embed produces a (inj, prj) pair that can be applied to values of some type α. Only the prj of the pair can recover the injected value; the use of any other prj will fail. There are at least two possible implementations, one using references and another one using exceptions (which are values of a single open, extensible, generative type). The latter is very clever:

module Univ : UNIV = struct
  type t = exn
  let embed (type u) () =
    let module E = struct
      exception E of u
      let inj x = E x
      let prj = function E x -> x | _ -> raise Not_found
    end in E.(inj, prj)
end

The use of the named type u and a local module are 3.12 features that allow declaring a polymorphic exception (compare the SML solution in MLton's website). Since the exception constructor E is different for every invocation of embed (this is the "generative" bit referred to above), only the prj of the pair can recover the value:

let () =
  let inj_int  , prj_int   = Univ.embed ()
  and inj_float, prj_float = Univ.embed () in
  let r = ref (inj_int 13) in
  let s1 = try string_of_int   (prj_int   !r) with Not_found -> "None" in
  r := inj_float 13.0;
  let s2 = try string_of_int   (prj_int   !r) with Not_found -> "None" in
  let s3 = try string_of_float (prj_float !r) with Not_found -> "None" in
  Printf.printf "%s %s %s\n" s1 s2 s3

Note that the reference r holds values "of different types" via the corresponding inj. This code typechecks and when run outputs:

13 None 13.

On top of this "universal" existential type we can build heterogeneous property lists à la Lisp (see again MLton's site):

module PList : sig
  type t
  val make     : unit -> t
  val property : unit -> (t -> 'a -> unit) * (t -> 'a)
end = struct
  type t = Univ.t list ref

  let make () = ref []

  let property (type u) () =
    let inj, prj = Univ.embed () in
    let put r v = r := inj v :: !r
    and get r   =
      let rec go = function
      | []      -> raise Not_found
      | x :: xs -> try prj x with Not_found -> go xs
      in go !r
    in (put, get)
end

Each property must be created explicitly but independently of any list using it. They encapsulate an existentially-typed value; look-up just proceeds by attempting to project out the corresponding value. These lists really are magical:

let () =
  let put_name  , get_name   = PList.property ()
  and put_age   , get_age    = PList.property ()
  and put_weight, get_weight = PList.property () in
  let show p = Printf.printf "%s: %d years, %.1f kg\n"
    (get_name   p)
    (get_age    p)
    (get_weight p)
  in
  let boy, girl = PList.make (), PList.make () in
  put_name   boy  "Tim";
  put_age    boy  13;
  put_weight boy  44.0;
  put_name   girl "Una";
  put_age    girl 12;
  put_weight girl 39.0;
  List.iter show [boy; girl]

Here boy and girl are two different, independent property lists that act as extensible records with labels get_name, get_age and get_weight. The list iteration prints the properties uniformly via show, without having to cast or do any strange contortions (at least not any outside the definitions). The output is:

Tim: 13 years, 44.0 kg
Una: 12 years, 39.0 kg

Of course nothing says that the property lists must be homogeneous in the properties they contain; looking for an inexistent property will simply fail. On the other hand, probably the preferred way to handle extensible records in OCaml is via objects, using structural subtyping in the same way dynamically-typed languages would use duck typing. This would make solving the original problem a little more familiar to Python or Ruby programmers; but then, recovering the original objects from the lists would be impossible without downcasts.

2012-07-02

Assessing Abstractions

Back to OCaml! Catching up with StackOverflow's OCaml questions, I found an interesting one about private type abbreviations in module signatures. One thing in that conversation that struck me as odd was the assertion that the compiler optimizes single-constructor variants, thereby subsuming the semantics of Haskell's all three declarators, data, type and newtype, into one. Definitive proof would be obtainable by diving into the compiler's source. I decided instead to read the output of ocamlopt to try to catch a glimpse of this, even if I know it doesn't constitute definitive proof. What I found is both good and bad, or as we say here una de cal y una de arena.

Consider the following module definitions:

module S = struct
  type t = P of int * int
  let make x y = P (x, y)
end

module T = struct
  type t = int * int
  let make x y = x, y
end

Module S (for "single sum type") declares a single-variant type, while T (for "tuple type") declares an abbreviation. Putting them in a file and compiling it with ocamlopt -S -c generates the following listing (prelude and other specificities omitted):

S.makeT.make
_camlAbbrev__make_1033:
        subq    $8, %rsp
        movq    %rax, %rdi
.L101:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L102
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    %rdi, (%rax)
        movq    %rbx, 8(%rax)
        addq    $8, %rsp
        ret
.L102:  call    _caml_call_gc
        jmp     .L101
_camlAbbrev__make_1038:
        subq    $8, %rsp
        movq    %rax, %rdi
.L105:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L106
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    %rdi, (%rax)
        movq    %rbx, 8(%rax)
        addq    $8, %rsp
        ret
.L106:  call    _caml_call_gc
        jmp     .L105

(ocamlopt -version is 3.12.1). The compiler generates the exact same code in both cases. The bolded instructions build the heap-allocated 3-word pair as tag, first component, second component. In order to test modular abstraction, I coerce these implementations into the following signatures:

module A : sig
  type t = int * int
  val make : int -> int -> t
end = T

module B : sig
  type t = private int * int
  val make : int -> int -> t
end = T

module X : sig
  type t = P of int * int
  val make : int -> int -> t
end = S

module Y : sig
  type t = private P of int * int
  val make : int -> int -> t
end = S

Modules A and B coerce module T with a public and a private type definition, respectively. Modules X and Y do the same with module S. The following code shows how to use each variant of type t in a destructuring pattern match:

let a () = A.(let   (x, y) =  make 2 3               in x + y)
let b () = B.(let   (x, y) = (make 2 3 :> int * int) in x + y)
let x () = X.(let P (x, y) =  make 2 3               in x + y)
let y () = Y.(let P (x, y) =  make 2 3               in x + y)

As explained in the manual, a private type abbreviation requires an explicit coercion for the types to be compatible. The use of a single-variant type makes for not only safer coding (the intent behind Haskell's newtype), it also allows for lighter syntax. I was surprised by the assembly code generated by ocamlopt (somewhat simplified):

ab
_camlAbbrev__a_1060:
        subq    $8, %rsp
.L109:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L110
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    $5, (%rax)
        movq    $7, 8(%rax)
        movq    8(%rax), %rbx
        movq    (%rax), %rax
        leaq    -1(%rax, %rbx), %rax
        addq    $8, %rsp
        ret
.L110:  call    _caml_call_gc
        jmp     .L109
_camlAbbrev__b_1063:
        subq    $8, %rsp
.L113:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L114
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    $5, (%rax)
        movq    $7, 8(%rax)
        movq    8(%rax), %rbx
        movq    (%rax), %rax
        leaq    -1(%rax, %rbx), %rax
        addq    $8, %rsp
        ret
.L114:  call    _caml_call_gc
        jmp     .L113
cd
_camlAbbrev__x_1066:
        subq    $8, %rsp
.L117:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L118
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    $5, (%rax)
        movq    $7, 8(%rax)
        movq    8(%rax), %rbx
        movq    (%rax), %rax
        leaq    -1(%rax, %rbx), %rax
        addq    $8, %rsp
        ret
.L118:  call    _caml_call_gc
        jmp     .L117
_camlAbbrev__y_1070:
        subq    $8, %rsp
.L121:  subq    $24, %r15
        movq    _caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L122
        leaq    8(%r15), %rax
        movq    $2048, -8(%rax)
        movq    $5, (%rax)
        movq    $7, 8(%rax)
        movq    8(%rax), %rbx
        movq    (%rax), %rax
        leaq    -1(%rax, %rbx), %rax
        addq    $8, %rsp
        ret
.L122:  call    _caml_call_gc
        jmp     .L121

Identical machine code generated in all four cases, which I find very encouraging. Here the compiler inlines the tupling constructor make (the three first bolded lines; note that an integer n is tagged as 2·n + 1 so that 2 is represented by 5 and 3 by 7). What I find a bummer is that the tuple is immediately destructured (into rbx and rax, next two bolded lines) and operated on its components (last bolded line; note in passing how addition of tagged integers is done with leaq, since 2·(m + n) + 1 = (2·m + 1) + (2·n + 1) - 1), without any kind of CSE performed. This is perfectly summarized by Pascal Cuoq in a comment to another StackOverflow question:

[…] Last time I checked it didn't even optimize the allocation of a, b in match a, b with x, y -> .... If you wish to check by yourself, I found that reading the x86 assembly generated by ocamlopt -S was convenient for me because I didn't have to learn a new representation.

This, I suspect, is an artifact of inlining in the presence of code emission that must be correct in the face of separate compilation. Disturbingly enough, ocamlopt seems to inline even across module boundaries. Separating the test into an abbrev.ml implementation:

module S = struct
  type t = P of int * int
  let make x y = P (x, y)
end

module T = struct
  type t = int * int
  let make x y = x, y
end

module A = T
module B = T
module X = S
module Y = S

with interface abbrev.mli:

module A : sig type t =              int * int val make : int -> int -> t end
module B : sig type t = private      int * int val make : int -> int -> t end
module X : sig type t =         P of int * int val make : int -> int -> t end
module Y : sig type t = private P of int * int val make : int -> int -> t end

the following code in test.ml:

open Abbrev

let a () = A.(let   (x, y) =  make 2 3               in x + y)
let b () = B.(let   (x, y) = (make 2 3 :> int * int) in x + y)
let x () = X.(let P (x, y) =  make 2 3               in x + y)
let y () = Y.(let P (x, y) =  make 2 3               in x + y)

compiles to the same assembly as above! Xavier Leroy spells out the conditions for ocamlopt to do cross-module inlining; I think it obvious that these tests do fulfill them. An interesting tidbit in that message is the command-line argument -dcmm to dump a RTL-like representation of the generated code:

(function camlTest__a_1038 (param/1056: addr)
 (let match/1057 (alloc 2048 5 7)
   (+ (+ (load match/1057) (load (+a match/1057 8))) -1)))

(function camlTest__b_1041 (param/1058: addr)
 (let match/1059 (alloc 2048 5 7)
   (+ (+ (load match/1059) (load (+a match/1059 8))) -1)))

(function camlTest__x_1044 (param/1060: addr)
 (let match/1061 (alloc 2048 5 7)
   (+ (+ (load match/1061) (load (+a match/1061 8))) -1)))

(function camlTest__y_1048 (param/1062: addr)
 (let match/1063 (alloc 2048 5 7)
   (+ (+ (load match/1063) (load (+a match/1063 8))) -1)))

I think it is not very productive to worry too much about the cost of abstractions in OCaml, as it probably is offset by the naïveté of the code generator.

2012-02-22

For the Right Hand

Useful generic functions are usually the result of composing useful generic functions; the fact that these are easy to write is, however, not an excuse for not having them in a rich prelude. Given a total function for splitting a list at a given point:


let rec split_at n = function
| []            -> [], []
| l when n == 0 -> [], l
| x :: xs       ->
  let l, r = split_at (pred n) xs in
  x :: l, r

and the anamorphism on lists, the basic functional iteration scheme giving the list of results:


let rec unfold f e = match f e with
| None         -> []
| Some (x, e') -> x :: unfold f e'

then splitting a list into chunks just requires some assembly:


let chunks n = unfold (fun l -> match split_at n l with [], [] -> None | p -> Some p)

As another example, suppose we need a Procrustean function fit that truncates a list or pads it with a value to a given length. Given an way to build an infinite list:


let all x = let rec l = x :: l in l

the desired function is just:


let fit n e l = fst (split_at n (l @ all e))

The only good thing about the paucity of Pervasives is that it forces one to practice often this kind of finger exercises. At least they are pretty.

2012-01-11

Eighteen Million Noises

Update: Here is the PNGPack source-code archive:

noise.tgz

Per second. Yes, it's benchmark time. I pitted Prof. Perlin's original implementation against Stefan Gustavson's, and the results are in! I ran two sets of benchmarks, one in Java and the other in OCaml. In both cases I compared three versions of Simplex Noise: Perlin's pipelined 3D Noise, and Gustavson's 2D Noise and 3D Noise:


  • noisep: Perlin's 3D Simplex Noise
  • noise2: Gustavson's 2D Simplex Noise
  • noise3: Gustavson's 3D Simplex Noise

The Java programs are copied directly from the linked papers (except for one modification, noted below); the OCaml versions are pixel-faithful translations. Each benchmarked function calls noise 1,000 times in a tight loop. The benchmark runs the functions three times for 10 seconds each. I ran the benchmark on a lightly loaded, cool MacBook (2 GHz Intel Core 2 Duo, 2 GB 533 Mhz DDR2 SDRAM running Mac OS X 10.6.8). By cool I mean "left to cool down"; the temperature reached 37 °C yesterday. These are the best-of-three timings for each function:


FunctionCalls/s
OCamlJava
noisep 2,123,490 3,417,300
noise218,206,09018,239,810
noise310,506,63010,404,280

The timings relative to Java's noisep are:


FunctionRelative speed
OCamlJava
noisep0.62141.0000
noise25.32765.3375
noise33.07453.0446

The big difference between OCaml's and Java's noisep is attributable to the fact that my translation is more structured than Perlin's version; for the other two functions the difference is within 1% (which I find quite amazing in and of itself).


In the case of Gustavson's Noise, I modified the code to randomly choose among 16 gradients on the unit cell, not 12 as in the code given in the paper. In contrast to Perlin's suggestion of duplicating existing vectors, I added four cell vertices so that the 2D projection of the gradient vectors onto the XY plane is unbiased. This allows indexing the gradient vector with a bit-mask instead of a modulo operation:


let grad3 = [|
( 1., 1., 0.);(-1., 1., 0.);( 1.,-1., 0.);(-1.,-1., 0.);
( 1., 0., 1.);(-1., 0., 1.);( 1., 0.,-1.);(-1., 0.,-1.);
( 0., 1., 1.);( 0.,-1., 1.);( 0., 1.,-1.);( 0.,-1.,-1.);
( 1., 1., 1.);(-1., 1., 1.);( 1.,-1.,-1.);(-1.,-1., 1.);
|]

These are the raw timings, for OCaml:


noisep: 10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.11/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2123.49/s (n=22184)
        10.45 WALL (10.44 usr +  0.01 sys = 10.45 CPU) @ 2122.31/s (n=22184)
noise2: 10.51 WALL (10.50 usr +  0.01 sys = 10.51 CPU) @ 18198.59/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18206.09/s (n=191223)
        10.50 WALL (10.50 usr +  0.01 sys = 10.50 CPU) @ 18204.68/s (n=191223)
noise3: 10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10504.05/s (n=110443)
        10.51 WALL (10.51 usr +  0.01 sys = 10.51 CPU) @ 10506.63/s (n=110443)
        10.57 WALL (10.56 usr +  0.01 sys = 10.57 CPU) @ 10449.81/s (n=110443)

          Rate     noisep noise3 noise2
noisep  2123+- 1/s     --   -80%   -88%
noise3 10487+-25/s   394%     --   -42%
noise2 18203+- 3/s   758%    74%     --

and for Java:


noisep: 10.04 WALL @ 3386.12/s (n=34000)
noisep: 10.24 WALL @ 3417.30/s (n=35000)
noisep: 10.24 WALL @ 3416.97/s (n=35000)
noise2: 10.01 WALL @ 18181.82/s (n=182000)
noise2: 10.00 WALL @ 18100.00/s (n=181000)
noise2: 10.03 WALL @ 18239.81/s (n=183000)
noise3: 10.04 WALL @ 10357.53/s (n=104000)
noise3: 10.09 WALL @ 10404.28/s (n=105000)
noise3: 10.01 WALL @ 10390.65/s (n=104000)

I used the following benchmark harness in OCaml:


let rand w = w *. (Random.float 1. -. 0.5)

let max_iter = 1_000
let width    = 4.

let noisep () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise p) done

let noise2 () =
  let p = (rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise2 p) done

let noise3 () =
  let p = (rand width, rand width, rand width) in
  for i = 1 to max_iter do ignore (Noise.noise3 p) done

let () =
  let open Benchmark in
  let bm = throughputN ~style:Auto ~repeat:3 10 [
    ("noisep", noisep, ());
    ("noise2", noise2, ());
    ("noise3", noise3, ());
  ] in
    print_newline();
  tabulate bm;

It uses Christophe Troestler's ocaml-benchmark module. The Java harness is similar but hand-coded:


public class NoiseBench {
  private static final double WIDTH = 4.0;
  private static final int NITER = 1000;
  private static final int NTEST = 3;
  private static final int NSECS = 10;

  private static void benchmark(String name, Runnable test) {
    for (int i = 0; i != NTEST; i++) {
      long now = System.currentTimeMillis(), lap = 0;
      int count = 0;
      while (lap < NSECS * 1000) {
        for (int j = 0; j != NITER; j++)
          test.run();
        lap = System.currentTimeMillis() - now;
        count += NITER;
      }
      System.out.printf("%s: %.2f WALL @ %.2f/s (n=%d)\n",
        name,
        (double) lap / 1000.0,
        (double) count * 1000. / (double) lap,
        count);
      System.out.flush();
    }
  }

  private static double rand() {
    return WIDTH * (Math.random() - 0.5);
  }

  private static void noisep() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      Noise3.noise(x, y, z);
  }

  private static void noise2() {
    double x = rand(), y = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y);
  }

  private static void noise3() {
    double x = rand(), y = rand(), z = rand();
    for (int i = 0; i != NITER; i++)
      SimplexNoise.noise(x, y, z);
  }

  public static void main(String[] args) {
    benchmark("noisep", new Runnable() { public void run() { noisep(); } });
    benchmark("noise2", new Runnable() { public void run() { noise2(); } });
    benchmark("noise3", new Runnable() { public void run() { noise3(); } });
  }
}

2012-01-09

Putting Noise to the Test

So how do you use Perlin's Simplex Noise? By using the GIF encoder, of course! I generated the test image shown in the last post:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

with a minimal application of both modules in test.ml (source code follows). For a visually richer but not much more complicated example, I also tried replicating Iñigo Quílez's turbulent textures in fbm.ml. The animation traces a tiny loop in noise space so that it seamlessly repeats over and over. The first frame looks like this:



I'd like to know of a very easy way to share source tarballs with you (no, Forges require waiting for new project approval; no, Github requires adhering to a philosophy of sharing for which I don't care; no, Google Code forces me to give up my own version control; no, file hosting is not an option. I'm open to suggestions.); in the meantime, here's a wall of code with the complete project.


gif.mli


val fold_string : ('a -> char -> 'a) -> 'a -> string -> 'a

type color = { red : int; green : int; blue : int; }

type palette

val palette : ?background:int -> color array -> palette

val num_colors : palette -> int

type t

val make    : ?ct:palette -> width:int -> height:int
           -> (t -> unit) -> (out_channel -> unit)

val image   : ?ct:palette -> ?transparent:int -> ?fps:int
           -> ?x:int -> ?y:int -> ?width:int -> ?height:int
           -> string -> t -> unit

val comment : string -> t -> unit

val repeat  : ?count:int -> t -> unit

val verify_gif : in_channel -> bool

val grayscale   : palette
val heatmap     : palette
val rainbow     : palette
val black_white : palette

val unwind : protect:('a -> unit) -> ('a -> 'b) -> ('a -> 'b)
val with_output_file : (out_channel -> 'a) -> (string -> 'a)
val with_input_file  : (in_channel  -> 'a) -> (string -> 'a)

gif.ml


let unwind ~(protect : 'a -> unit) f x =
  try let y = f x in protect x; y
  with e -> protect x; raise e

class output otch = object (self)
  val         buffer  =  String.create 255
  val mutable current = -1

  method private flush =
    if current > 0 then begin
      output_byte otch current;
      output      otch buffer 0 current;
      current <- 0
    end

  method byte n =
    let b = n land 255 in
    if current == -1 then output_byte otch b else begin
      if current == 255 then self#flush;
      buffer.[current] <- char_of_int b;
      current <- current + 1
    end

  method le16 n =
    self#byte  n;
    self#byte (n lsr 8)

  method string str =
    String.iter (fun c -> self#byte (int_of_char c)) str

  method begin_block =
    if current != -1 then failwith "begin_block";
    current <- 0

  method end_block =
    if current == -1 then failwith "end_block" else
    self#flush;
    current <- -1;
    self#byte 0

  method close =
    if current != -1 then self#end_block;
    flush otch
end

let fold_string f e s =
  let res = ref e in
  String.iter (fun c -> res := f !res c) s;
  !res

class lzw ?(bits=8) out =
  let count       = 1 lsl bits in
  let clear_code  = count
  and end_of_info = count + 1 in
object (self)
  val table : (int * int, int) Hashtbl.t = Hashtbl.create 5003
  val mutable code_size = 0
  val mutable next_code = 0
  val mutable buffer    = 0
  val mutable length    = 0

  method append n =
    buffer <- buffer lor ((n land pred (1 lsl code_size)) lsl length);
    length <- length + code_size;
    while length >= 8 do
      out#byte (buffer land 0xff);
      buffer <- buffer lsr 8;
      length <- length - 8
    done

  method finish =
    if length > 0 then
      out#byte (buffer land pred (1 lsl length))

  method private reset =
    Hashtbl.clear table;
    for i = 0 to count - 1 do
      Hashtbl.add table (-1, i) i
    done;
    code_size <- bits  + 1;
    next_code <- count + 2

  method private add_string s =
    (* Check code for overflow *)
    if next_code == 1 lsl code_size then
      code_size <- code_size + 1;
    Hashtbl.add table s next_code;
    next_code <- next_code + 1;
    (* Limit maximum code size to 12 bits *)
    if next_code == 0x1000 then begin
      self#append clear_code;
      self#reset;
    end

  method compress data =
    self#reset;
    self#append clear_code;
    let last = fold_string (fun prefix c ->
      let  k = int_of_char c in
      try  Hashtbl.find table (prefix, k)
      with Not_found ->
        self#append prefix;
        self#add_string (prefix, k);
        k) (-1) data in
    self#append last;
    self#append end_of_info;
    self#finish
end

(* 0 <= red, green, blue < 256 *)
type color = { red : int; green : int; blue : int; }

type palette = { background : int; bits : int; table : color array; }

let num_colors { table; _ } = Array.length table

let palette ?(background=0) table =
  let size = Array.length table in
  if size < 2
  || size > 256
  || size land (size - 1) != 0
  || background < 0
  || background >= size
  then invalid_arg "color_table" else
  let bits = truncate (log (float size) /. log 2.) in
  assert (size == 1 lsl bits);
  { background; bits; table = Array.copy table; }

type t = {
  width  : int;
  height : int;
  ct     : palette option;
  out    : output;
}

let header { out; _ } = out#string "GIF89a"

let trailer { out; _ } = out#byte 0x3b

let color_table { table; _ } { out; _ } =
  Array.iter (fun { red; green; blue } ->
    out#byte red  ;
    out#byte green;
    out#byte blue) table

let logical_screen ({ width; height; ct; out } as gif) =
  out#le16   width;
  out#le16   height;
  match ct with
  | None ->
    out#byte 0;
    out#byte 0;
    out#byte 0 (* default aspect ratio (1:1 = 49) *)
  | Some ({ background; bits; _ } as ct) ->
    out#byte (0xf0 lor pred bits);
    out#byte background;
    out#byte 0;
    color_table ct gif

let make ?ct ~width ~height proc otch =
  unwind ~protect:(fun { out; _ } -> out#close) (fun gif ->
    header gif;
    logical_screen gif;
    proc gif;
    trailer gif)
    { width; height; ct; out = new output otch; }

let graphics_control_extension transparent fps { out; _ } =
  let delay = if fps = 0 then 0 else (200 + fps) / (fps + fps) in
  out#byte 0x21; (* GIF Extension Code *)
  out#byte 0xf9; (* Graphic Control Label *)
  out#byte 0x04; (* Length of Data Sub-Block *)
  out#byte (match transparent with None -> 0 | Some _ -> 9);
  out#le16 delay;
  out#byte (match transparent with None -> 0 | Some c -> c);
  out#byte 0x00 (* Data Sub-Block Terminator *)

let image_descriptor ct x y width height ({ out; _ } as gif) =
  out#byte 0x2c;
  out#le16 x;
  out#le16 y;
  out#le16 width;
  out#le16 height;
  match ct with
  | None ->
    out#byte 0x00
  | Some ({ bits; _ } as ct) ->
    out#byte (0x80 lor pred bits);
    color_table ct gif

let image ?ct ?transparent ?(fps=0) ?(x=0) ?(y=0) ?width ?height bytes ({ out; _ } as gif) =
  let w = match width  with None -> gif.width  | Some w -> w
  and h = match height with None -> gif.height | Some h -> h in
  if x < 0 || x + w  > gif.width
  || y < 0 || x + h > gif.height
  || String.length bytes != w * h
    then invalid_arg "image";
  let bits = match ct, gif.ct with
  | Some ct, _
  | None   , Some ct -> max 2 ct.bits
  | None   , None    -> invalid_arg "image"
  in
  (match transparent, fps with
  | None, 0 -> ()
  | _   , _ -> graphics_control_extension transparent fps gif);
  image_descriptor ct x y w h gif;
  out#byte bits;
  out#begin_block;
  (new lzw ~bits out)#compress bytes;
  out#end_block

let comment text { out; _ } =
  out#byte 0x21;         (* GIF Extension Code *)
  out#byte 0xfe;         (* Comment Label *)
  out#begin_block;
  out#string text;
  out#end_block

let repeat ?(count=0) { out; _ } =
  out#byte   0x21;       (* GIF Extension code *)
  out#byte   0xff;       (* Application Extension Label *)
  out#byte   0x0b;       (* Length of application block *)
  out#string "NETSCAPE"; (* Application Identifier *)
  out#string "2.0";      (* Appl. Authentication Code *)
  out#byte   0x03;       (* Length of Data Sub-Block *)
  out#byte   0x01;       (* Loop sub-block ID *)
  out#le16   count;      (* Loop count (0 = forever) *)
  out#byte   0x00        (* Data Sub-Block Terminator *)

let verify_gif inch =
  let failf   fmt = Printf.kprintf failwith fmt in
  let check p fmt = if not p then failf fmt in
  let buffer = Buffer.create 16 in
  let input_le16 inch  =
    let b0 = input_byte inch in
    let b1 = input_byte inch in
    b0 lor (b1 lsl 8)
  in
  let verify_header inch =
    let buf = String.create 6 in
    really_input inch buf 0 6;
    check (buf = "GIF89a") "Expected GIF header"
  in
  let verify_color_table len inch =
    let cnt = 3 * (1 lsl len) in
    for i = 1 to cnt do
      ignore (input_byte inch)
    done;
    Printf.printf "CT   %d colors in %d bytes\n" (cnt / 3) cnt
  in
  let verify_blocks inch =
    let tot = ref 0 in
    Buffer.clear buffer;
    try while true do
      let cnt = input_byte inch in
      if cnt == 0 then raise Exit else
      Buffer.add_channel buffer inch cnt;
      incr tot
    done; assert false with Exit ->
      let contents = Buffer.contents buffer in
      Printf.printf "%d bytes in %d blocks, EOB = %010x\n%!"
        (String.length contents) !tot (pos_in inch);
      contents
  in
  let verify_logical_screen_descriptor inch =
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    let backgr = input_byte inch in
    let aspect = input_byte inch in
    Printf.printf "LSD  w = %d, h = %d, f = %2x, b = %d, a = %d\n"
      width height fields backgr aspect;
    if fields land 0x80 == 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_image_descriptor inch =
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    Printf.printf "ID   x = %d, y = %d, w = %d, h = %d, f = %2x\n"
      left top width height fields;
    if fields land 0x80 = 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_table_based_image inch =
    verify_image_descriptor inch;
    let bits = input_byte inch in
    Printf.printf "IMG  code size = %d, " bits;
    ignore (verify_blocks inch)
  in
  let verify_plain_text_extension inch =
    check (12 == input_byte inch) "Expected block size = 12\n";
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let _celwid = input_byte inch in
    let _celhgt = input_byte inch in
    let _fgcol  = input_byte inch in
    let _bgcol  = input_byte inch in
    Printf.printf "PTE  l = %d, t = %d, w = %d, h = %d "
      left top width height;
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_application_extension inch =
    check (11 == input_byte inch) "Expected block size = 11\n";
    let label = String.create 11 in
    really_input inch label 0 11;
    Printf.printf "AE   %s " label;
    ignore (verify_blocks inch)
  in
  let verify_graphic_control_extension inch =
    check (4 == input_byte inch) "Expected block size = 4\n";
    let fields = input_byte inch in
    let delay  = input_le16 inch in
    let transp = input_byte inch in
    check (0x00 == input_byte inch) "Expected block terminator\n";
    Printf.printf "GCE  f = %2x, d = %d, t = %d\n"
      fields delay transp
  in
  let verify_comment_extension inch =
    Printf.printf "CE   ";
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_extension inch =
    match input_byte inch with
    | 0xff -> verify_application_extension     inch
    | 0xfe -> verify_comment_extension         inch
    | 0xf9 -> verify_graphic_control_extension inch
    | 0x01 -> verify_plain_text_extension      inch
    | b    -> failf "Unknown extension introducer %2x" b
  in
  let verify_eof inch =
    try ignore (input_char inch); failf "Extra contents after EOF"
    with End_of_file -> ()
  in
  let rec verify_data inch =
    match input_byte inch with
    | 0x3b -> verify_eof inch
    | 0x2c -> verify_table_based_image inch; verify_data inch
    | 0x21 -> verify_extension         inch; verify_data inch
    | b    -> failf "Unknown content block %2x" b
  in
  try
    verify_header inch;
    verify_logical_screen_descriptor inch;
    verify_data inch;
    true
  with Failure e ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): %s\n%!" off off e;
    false
  | End_of_file ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): Unexpected EOF\n%!" off off;
    false

let grayscale = palette (Array.init 256 (fun i ->
  { red = i; green = i; blue = i }))

let heatmap =
  let ramp i l h =
    let j = 255 * (i - l) / (h - l) in
    if j < 0 then 0 else if j > 255 then 255 else j
  in palette (Array.init 256 (fun i ->
  { red = ramp i 0 84; green = ramp i 85 170; blue = ramp i 171 255 }))

let rainbow =
  let pi   = 3.14159265358979324
  and q1_3 = 0.333333333333333333 in
  let cs x =
    let c = cos (pi *. x) in
    truncate (255. *. c *. c +. 0.5)
  in palette (Array.init 256 (fun i ->
  let x = float i /. float 256 in
  { red = cs x; green = cs (x -. q1_3); blue = cs (x +. q1_3) }))

let black_white = palette [|
  { red = 255; green = 255; blue = 255; };
  { red =   0; green =   0; blue =   0; };
|]

let with_output_file proc fname =
  unwind ~protect:close_out proc (open_out_bin fname)

let with_input_file proc fname =
  unwind ~protect:close_in proc (open_in_bin fname)

noise.mli


val noise : float * float * float -> float
val fBM : ?octaves:int -> float * float * float -> float

(** Clamp an integer into the range [0, 255] *)
val clampb : int -> int
(** Convert a float in the range [-1, 1] into an integer in the range [0, 255] *)
val rescale : float -> int

noise.ml


let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]

let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  else
  if v >= w then 3 else
  if u >= v then 4 else 5

let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y
  done;
  !f

let fBM ?(octaves=5) =
  let rec go f w i (x, y, z as p) =
    if i == 0 then f else
    let f = f +. w *. noise p in
    go f (0.5 *. w) (pred i) (2. *. x, 2. *. y, 2. *. z)
  in go 0. 1. octaves

let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

test.ml


let () =
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  let x0, y0, z0 = -2., -2., 0.
  and sc = 4.0 in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      for i = 0 to height - 1 do
        let y = y0 +. sc *. float i /. float height in
        for j = 0 to width - 1 do
          let x = x0 +. sc *. float j /. float width in
          let p = Noise.noise (x, y, z0) in
          pixels.[i * width + j] <- char_of_int (Noise.rescale p)
        done
      done;
      Gif.image pixels gif)) "perlin.gif"

fbm.ml


let ( +/ ) (x0, y0, z0) (x1, y1, z1) = (x0 +. x1, y0 +. y1, z0 +. z1)
let ( */ ) k (x, y, z) = (k *. x, k *. y, k *. z)

let two_pi = 6.283185307179586476925286766552

let () =
  let count  = 100 in
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      if count > 1 then Gif.repeat gif;
      let sc = 1.0 in
      for c = 0 to count - 1 do
        let q = two_pi *. float c /. float count in
        let sq = 0.05 *. sin q
        and cq = 0.05 *. cos q in
        for i = 0 to height - 1 do
          let y = sc *. float i /. float height in
          for j = 0 to width - 1 do
            let x = sc *. float j /. float width in
            let p = (x, y, 0.) in
            let q = p +/ 2. */ (
              Noise.fBM (p +/ (   cq, sq, 0.)),
              Noise.fBM (p +/ (-. sq, cq, 0.)),
              Noise.fBM (p +/ (   0., 0., 1.))) in
            let r = p +/ 2. */ (
              Noise.fBM (q +/ (   cq, 0., sq)),
              Noise.fBM (q +/ (   0., 1., 0.)),
              Noise.fBM (q +/ (-. sq, 0., cq))) in
            let f = 2. *. Noise.fBM r  in
            pixels.[i * width + j] <- char_of_int (Noise.rescale f)
          done
        done;
        Printf.printf "%d/%d\n%!" (succ c) count;
        Gif.image ~fps:25 pixels gif
      done))
    "fbm.gif"

Makefile


# :mode=makefile:
OCAMLC=    ocamlfind ocamlc   -w Aelz -g
OCAMLOPT=  ocamlfind ocamlopt -w Aelz -unsafe -inline 1000 -ccopt -O2 -ccopt -fomit-frame-pointer
OCAMLLEX=  ocamllex.opt
OCAMLYACC= ocamlyacc
OCAMLDEP=  ocamldep $(PP)
OCAMLMKLIB=ocamlmklib #-custom -linkall

COMN=\
    gif.ml noise.ml

SRCS=\
    $(COMN) test.ml fbm.ml

OBJS=$(COMN:.ml=.cmo)

LIBS=

PROGRAMS=\
    test fbm

all: $(PROGRAMS) $(PROGRAMS:=.bc)

fbm.bc: $(OBJS) fbm.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
fbm: $(OBJS:.cmo=.cmx) fbm.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

test.bc: $(OBJS) test.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
test: $(OBJS:.cmo=.cmx) test.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

clean:
    rm -f *.cm[iox] *.o *.a *~
distclean: clean
    rm -f $(PROGRAMS) $(PROGRAMS:=.bc) *.gif

.SUFFIXES: .cmo .cmi .cmx .ml .mli .mll .mly

.mly.ml:
.mly.mli:
    $(OCAMLYACC) $<

.mll.ml:
    $(OCAMLLEX) $<

.ml.cmo:
    $(OCAMLC) -c $<

.mli.cmi:
    $(OCAMLC) -c $<

.ml.cmx:
    $(OCAMLOPT) -c $<

.depend:
    $(OCAMLDEP) $(SRCS) > .depend

include .depend

2012-01-06

Perlin's Simplex Noise

Graphics pioneer Ken Perlin is well-known for his work on procedural generation of textures. Recently he rewrote his classic Noise algorithm to reduce the complexity of generating pseudorandom gradients in higher dimensions by transforming a problem on N-cells to one on N-simplices. I couldn't find on his NYU home page a write-up by his own hand; apparently the only implementation exists in his Java applet. There is a well-known exegesis by Stefan Gustavson that completely rewrites the algorithm; I've also found a write-up by Perlin himself in a collection of course notes. The latter shows an extremely terse implementation in Java (Appendix B) that purports to correspond directly with hardware operations implemented in a specialized pipelined processor. I'll show an unraveled re-implementation of that Java algorithm that exactly corresponds to Prof. Perlin's, in the sense that the outputs are the same, but with the magic hopefully explained clearly. I will assume, however, that you are familiar with the original algorithm; if not, read Gustavson's analysis for a clear introduction to its workings.


The key to Perlin's Noise is that it interpolates pseudo-random gradients attached to lattice points. In the original algorithm, those gradients are computed by successive lookups using a permutation table. The new algorithm is geared towards an 8.8 fixed-point format; instead of tables it uses bit-slicing operations to compute two pseudo-random 3-bit hashes from the lattice points:


let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

I've expressed the patterns in octal to stress the fact that the result is a 6-bit number. btst simlpy extracts the b-th bit from n. bmix takes a 3-bit slice from the b-th bits of i, j and k, in that order, and uses it to index on patterns. shuffle adds up bmixes of eight circular rotations of (i, j, k) for each bit 0, 1 … 7. The procedure seems rather ad-hoc, but in fact produces rather well-distributed 3-bit fields among all the (28)3 = 16 Mi combinations of (i, j, k):


NP[Lower 3 bits = N]P[Upper 3 bits = N]
00.1244812011718750.125709712505340576
10.12536621093750.126831173896789551
20.1250.128595232963562
30.12463378906250.127514779567718506
40.1255187988281250.123155772686004639
50.12463378906250.120522260665893555
60.1250.12257838249206543
70.12536621093750.125092685222625732

These 3-bit hashes are used to compute a magnitude for vector (x, y, z) by selecting each coordinate with probability 3/4, applying a random sign to each with probability 1/2 and adding the resulting values:


let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

This amounts to finding the dot product of the vector (x, y, z) with a random gradient on the unit cell. Generating all combinations and extracting the gradient vectors we get the following diagram:


Distribution of the random basis vectors

Each dot shows the frequency with which that vector is selected; the diagonal vectors (±1, ±1, ±1) occur half as frequently as the mid-edge vectors (±1, ±1, 0) —and rotations—, to account for their greater magnitude and thus weight. All this is done without multiplying! If you study Perlin's paper and code, you will notice that he uses bit operations to select each of the x, y and z coordinates and to add or subtract them, to avoid using tables; however, the code doesn't quite correspond to the explanation given in the text. I've unrolled the conditionals into two jump tables in a way that duplicates the effect of the code; thus the first jump table is not equivalent to the one given in the text.


The key to the efficiency of this algorithm is not just in the use of bit operations for generating pseudo-random gradients but in the subdivision strategy used to combine them in ℝN. In the original algorithm, each of the 2N vertices in the N-cell are associated with a gradient which is cubically interpolated between pairs of hyperplanes. This not only generates visual artifacts since the interpolation scheme is not C²-continuous, it requires O(2N) work. The new algorithm mitigates the first problem by using a quartic interpolant, and solves the second by decomposing the N-cell into N! simplices:


let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]

This requires interpolating among N + 1 vertices taken in pairs, for O(N²) work total. The simplex in which a point (u, v, w) in the unit cell lies is determined by the permutation index of coordinates (u, v, w):


let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  else
  if v >= w then 3 else
  if u >= v then 4 else 5

Perlin's code uses a different method to directly travel along the simplex edges by incrementing each coordinate according to the permutation index; I've opted for using a table to simplify the code and eliminate the need for global variables. Each point (x, y, z) is decomposed into a sum of a lattice vector (i, j, k) and a point in the unit cell (u, v, w). To account for the different volumes in the simplicial decomposition of the unit N-cell, the vectors are skewed:


let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

and unskewed:


let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

along the main diagonal of the N-cell. This amounts to a change of basis between a cubic and a tetrahedral grid and back. In general, the skewing factor is (√(N+1) - 1)/N and the unskewing factor is (N + 1 - √(N+1))/(N(N+1)). A couple of simple functions will simplify the main routine:


let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

Now given a point p, noise finds the lattice vector l and the unit-cell coordinates x corresponding to it. It then selects the simplex s corresponding to x. Then, for each of its vertices v it finds the simplicial coordinates y of x relative to v. It computes in t a radial basis for y such that only the gradients attached to the simplex's vertices contribute to the final result. If it does, it computes the hash h corresponding to v, uses it to find the pseudo-random gradient applied to y and accumulates the corresponding contribution for this vertex:


let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y
  done;
  !f

That's it. noise returns a floating-point number in the interval [-1, 1]; in order to use it in pixmaps it is frequently more useful to rescale it to an unsigned byte:


let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

clampb is a nice little branchless routine to constrain an integer to the range [0, 255]. Note that in OCaml, integers are Sys.word_size - 1 bits long; in Java, for instance, the code would be:


public static int clampb(int n) {
  return (n | ((255-n) >>> 31)) & ~(n >>> 31) & 255;
}

(teasing that code apart is a nice exercise in bit twiddling.) This implementation produces the same result, pixel for pixel, as Perlin's code. This is a slice in the interval (-2, -2, 0)–(2, 2, 0) as a GIF file computed with this code:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

and this is the same slice as a PNG file computed in Java with the class given by Perlin in his Appendix B:


Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in Java

There are two avenues open for further explorations of this algorithm. The first is to generalize it to an arbitrary number of dimensions; Gustavson's paper is a good starting point for that. The other is to go further with Perlin's original intention of building a hardware generator and eliminate the use of floating-point operations. This is mostly mechanical, except for the calculation of t in the inner loop, which requires at least a 0.24 fixed-point format.