Alaska Ataca a Kamtchatka: Programming

Showing posts with label Programming. Show all posts

2012-10-25

How to Write a Simple Web Application Using Ocamlnet

This post might seem to be in apparent contradiction: Ocamlnet is a large, very opinionated framework for network programming that solves most, if not all, those infrastructure issues that need to be taken care of when writing a networked, distributed, fault-tolerant server, from process management to string decoding and including protocol parsing and building. The fact is that Ocamlnet is not particularly complicated for all that it does, and it does quite a bit; but there are (for me at least) three hurdles to overcome in order to start using it effectively:

It is big. The project includes Posix wrappers, string handling, process management, CGI programming, various protocols, RPC registration, interprocess communication…
It is strange. Most of the API is object oriented rather than functional, using inheritance extensively
It is underdocumented. While the API documents are complete, and the project page includes some cookbook examples, for most non-trivial needs you have to go deep into the source code

In this instance I'll follow a tutorial, top-down style and I won't necessarily show complete compile-and-run code, which means that you'll have to reconstruct the source code to a compilable state, but I hope it will still be useful as a starting point and guide to writing HTTP services with Ocamlnet. This tutorial assumes you have installed OCaml 4.0 with Findlib, Ocamlnet 3.6, Yojson and its dependencies and PCRE-Ocaml.

Also, I won't attempt to compare it with Ocsigen since I've never used it, not least because ~~I'm not really fond of convention-over-configuration-style frameworks~~ I don't know it at all, and maybe I'm irrationally prejudiced :-). If you want full control of your code and feel comfortable with a large number of tweakable knobs, then Ocamlnet is for you. If you have work that needs to be done quickly and you're used to RoR or Django, Ocamlnet will definitely set your hair on fire.

Setting up a Netplex server: `http.ml`

The architecture I use is a Web application written in HTML that consumes JSON services published in the back-end. Ocamlnet lets me serve both the static content (HTML, CSS, JavaScript and graphic assets) and route a number of dynamic services. I can use just that for development, and have a production Apache server that serves the static content and proxies the dynamic requests to the Ocamlnet-published services. Another simplification I make is that I manually route the method URLs instead of letting Ocamlnet do it itself, as it is otherwise perfectly capable to. This makes it simpler to configure the services, at the cost of having to explicitely handle routing in code.

Let's begin by taking a look at the main function:

let main () =
  let database f =
    db_file := Io.get_absolute_path f;
    if not (Sys.file_exists !db_file) then raise (Arg.Bad ("File not found: " ^ f))
  in
  let usage = "usage: " ^ (Filename.basename Sys.executable_name) ^ " [options]" in
  let opt_list, cmdline_cfg = Netplex_main.args () in
  let opt_list = Arg.align ([
"-db"        , Arg.String database,                   "<file>  Database file";
  ] @ opt_list) in
  Arg.parse opt_list (fun s -> raise (Arg.Bad ("Invalid argument: " ^ s))) usage;
  Netsys_signal.init ();
  Netplex_main.startup
    ~late_initializer:(fun _ _container ->
      Netlog.logf `Notice "Starting up")
    (Netplex_mp.mp ())
    Netplex_log.logger_factories
    Netplex_workload.workload_manager_factories
    [
      service_factory ();
    ]
    cmdline_cfg

let () = if not !Sys.interactive then main ()

Netplex is the part of Ocamlnet that orchestrates the management and intercommunication between the processes that make up a network service. It has a number of command-line options for configuration, most notably -fg to launch the service in the foreground instead of as a detached dæmon. Netplex_main.args gives back a list of needed options upon which to add program-specific ones. In this case the only option is to pass a database file. Every filesystem resource must be accessed by absolute path, since Netplex changes the working directory to / upon startup. This file is stored in a global reference:

let db_file = ref (Io.get_absolute_path "myfile.db")

Once the command line is parsed, the service is created. First, Ocamlnet has to take over signal handling to give the service an orderly lifecycle (Netsys is the collection of modules providing cross-platform POSIX functionality). Netplex is then started with the multi-process parallelizer, the standard log and workload factories that set up themselves out of the configuration file, and a single custom service factory that will create the HTTP services themselves:

let service_factory = Nethttpd_plex.nethttpd_factory
  ~name:"nethttpd"
  ~hooks:(new environment_hooks)
  ~config_cgi:Netcgi.({
    default_config with
    default_exn_handler = false;
    permitted_input_content_types = [
      "application/json";
    ] @ default_config.permitted_input_content_types
  })
  ~handlers:[
    "json_service", Net.json_service ~dyn_uri:"/cgi" [
      "daterange", with_db Daterange.json;
      "calculate", with_db Calculate.json;
      "calendar" , with_db Calendar.json;
    ];
  ]
  ~services:Nethttpd_plex.default_services

The service name must correspond with that defined in the config file. In order to arrange for the workers to have access to the database file I intercept the service creation with hooks to their lifecycle process to open it and close it as needed. Netcgi sets up the environment that each HTTP service requires to function; in this case I use a minimal configuration that augments valid POST MIME types with a type for JSON requests (not otherwise used in this case) and opt out of the standard exception handler in exchange for my own. I configure a single "json_service" handler that will dispatch to the relevant methods of type cgi_activation → Yojson.Basic.json. The Netplex services for this service are the default Nethttpd_plex ones required by the infrastructure in order to manage the lifecycle of the process group: creation, tear-down and IPC. Note well that the factory is a thunk, not a data structure, the resulting type is unit → Netplex_types.processor_factory.

The lifecycle hooks are specified as a subclass of Netplex_kit.empty_processor_hooks. It uses the Netplex environment plug-in to store a shared reference to the database in a way that both thread- and process-based services can access in an uniform manner:

class environment_hooks = object
  inherit Netplex_kit.empty_processor_hooks ()
  method! post_add_hook _ container =
    container#add_plugin Netplex_mutex.plugin;
    container#add_plugin Netplex_sharedvar.plugin
  method! post_start_hook container =
    Netlog.logf `Info "(%s) opening database \"%s\"" (Net.cur_sys_id ()) !db_file;
    try  set_db (DB.open_database !db_file)
    with DE42x.Error (msg) -> container#log `Crit msg
  method! pre_finish_hook _container =
    Netlog.logf `Info "(%s) closing database" (Net.cur_sys_id ());
    match get_db () with
    | Some db -> DB.close_db db
    | None    -> ()
end

In this case I open and close database connections (represented here as an open file descriptor) which are stored in a per-process environment:

let get_db, set_db =
  let env_id = "MyService.localenv" in
  let module E = Netplex_cenv.Make_var_type (struct type t = DB.t end) in
  (fun () -> try Some (E.get env_id) with Netplex_cenv.Container_variable_not_found _ -> None),
  E.set env_id

Neplex_cenv makes a strongly-typed accessor for shared variables; in this case I have just one keyed by env_id. As a utility I arrange for my service methods to be closed over a reference to the database (cf the handler setup above):

let with_db proc arg = match get_db () with
| None    -> Net.failf "no database!"
| Some db -> proc db arg

A `Nethttp` JSON framework: `net.ml`

Every Nethttpd_plex-based service follows the same structure, while the specifics will make up for the bulk of the application. In this example these details have to do with utilities that make consuming and producing JSON data easier. I have a Net module with a number of helpers, of which I've used two already, cur_sys_id and failf:

let failf fmt = Printf.ksprintf failwith fmt
and argf  fmt = Printf.ksprintf invalid_arg fmt

let cur_sys_id () = match Netplex_cenv.current_sys_id () with
| `Process pid -> Printf.sprintf "PID %d" pid
| `Thread  tid -> Printf.sprintf "TID %d" tid

Another useful function is an encoding-safe string wrapper:

let text = Netencoding.Html.encode_from_latin1

Normally, Nethttp sends HTML 4.01-formatted error messages. In a JSON-based application it is preferable to have standardized JSON errors:

let error_json (env : Netcgi.cgi_environment) status fields cause message =
  let json_of_header hdr =
    try `String (env#input_header_field hdr)
    with Not_found -> `Null
  in
  try
    let script_name = env#cgi_script_name in
    let code = Nethttp.int_of_http_status status in
    env#log_error (Printf.sprintf "%s: %s (Status %i)" script_name message code);
    env#set_output_header_fields []; (* reset *)
    env#set_output_header_field "Content-type" "application/json; charset=utf-8";
    env#set_status status;
    env#set_multiple_output_header_field "Cache-control"
      [ "max-age=0"; "must-revalidate" ];
    let secs = Netdate.mk_mail_date (Unix.time ()) in
    env#set_output_header_field "Expires" secs;
    List.iter (fun (n,v) -> env#set_multiple_output_header_field n v) fields;
    env#send_output_header();
    if env#cgi_request_method <> "HEAD" then
      Yojson.Basic.to_output env#out_channel (`Assoc [
        "status"       , `Int    code;
        "statusText"   , `String (Nethttp.string_of_http_status status);
        "cause"        , `String cause;
        "message"      , `String message;
        "scriptName"   , `String script_name;
        "requestMethod", `String env#cgi_request_method;
        "queryString"  , `String env#cgi_query_string;
        "referrer"     ,  json_of_header "referer"
      ]);
    env#out_channel#flush ();
    env#out_channel#close_out ()
  with e ->
    Netlog.logf `Crit "Unexpected exception %s" (Printexc.to_string e)

This is a good example of how to use the cgi_environment to query the CGI execution and to exert maximum control over the HTTP response. I raise standard Ocaml exceptions from the method handlers and translate them into the relevant HTTP status codes by wrapping them in a higher-order protective function:

let protect handler (cgi : Netcgi.cgi_activation) =
  try handler cgi
  with
  | Netcgi.Argument.Oversized ->
    error_json cgi#environment `Request_entity_too_large []
      "Oversized" "A POST parameter exceeds maximum allowed size"
  | Invalid_argument msg ->
    error_json cgi#environment `Bad_request []
      "Bad request" (text msg)
  | Failure msg ->
    error_json cgi#environment `Internal_server_error []
      "Method failure" (text msg)
  | Not_found ->
    error_json cgi#environment `Not_implemented []
      "Not implemented" "The requested operation is not implemented"
  | exn ->
    let msg = Printexc.to_string exn in
    error_json cgi#environment `Internal_server_error []
      "Internal server error" ("Unexpected exception: " ^ text msg)

It is not normally necessary to manipulate the cgi_environment in such a detailed, low-level manner; the cgi_activation does pretty much the same thing in a easier-to-use way:

let send_json json (cgi : Netcgi.cgi_activation) =
  cgi#set_header ~content_type:"application/json; charset=utf-8" ~cache:`No_cache ();
  Yojson.Basic.to_output cgi#out_channel json;
  cgi#output#commit_work ()

Note well that Yojson doesn't provide a streaming interface: you must build the entire JSON value, which gets serialized in bulk; this makes necessary to configure the HTTP service so that the cgi_activations it creates are at least buffered:

let json_service ?dyn_uri handlers =
  let dyn_translator path =
      let len  = String.length path in
      let path = if len != 0 && path.[0] = '/'
        then String.sub path 1 (len - 1)
        else path
      in
      Pcre.replace ~pat:"/" ~templ:"_" path
  and dyn_handler env cgi =
    protect (fun cgi ->
      let h = List.assoc env#cgi_path_translated handlers in
      send_json (h cgi) cgi) cgi
  in Nethttpd_services.({
    dyn_handler;
    dyn_activation = std_activation `Std_activation_buffered;
    dyn_uri;
    dyn_translator;
    dyn_accept_all_conditionals = true;
  })

The dynamic path translator removes the leading slash and turns subsequent slashes into underscores, so that a method in the namespace /cgi, say can serve as a look-up key in the list of handlers: a call to /cgi/my/method/name will turn into a key my_method_name. This is, of course, a completely arbitrary decision. The dynamic handler in turn looks up the method handler (recall, of type cgi_activation → Yojson.Basic.json) by this key, calls it with the cgi_activtion expecting a JSON response and sends it out. Since the handling is protected against exceptions, any missing method, parameter validation error or exceptional condition is sent out as the corresponding HTTP error response.

Speaking of parameter extraction, I don't use anything fancy like parsing combinators, just plain old higher-order functions and regular expressions validating the result of CGI accessor functions:

let with_arg arg f = Io.unwind ~protect:(fun arg -> arg#finalize ()) f arg

let get_arg cgi name =
  try  Some (with_arg (cgi#argument name) (fun arg -> arg#value))
  with Not_found -> None

let parse ?default ~validate ~parse cgi name =
  match default, get_arg cgi name with
  | None  , None    -> argf "Missing parameter \"%s\"" name
  | Some v, None    -> v
  | _     , Some p  ->
    try  parse (Pcre.extract ~rex:validate ~full_match:false p)
    with Not_found -> argf "Invalid parameter \"%s\"" name

Since CGI arguments can be, if large, buffered into a temporary file, Netcgi requires explicit finalization. Every error is signaled with an Invalid_argument exception which protect catches and translates into a HTTP 400 (Bad Request) via error_json. Parsing specific argument types is straightforward:

let re_char     = Pcre.regexp "^(.)$"
let re_bool     = Pcre.regexp "^(true|false)$"
let re_int      = Pcre.regexp "^([-+]?\\d+)$"
let re_float    = Pcre.regexp "^([-+]?\\d+(?:.\\d*))$"
let re_date     = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})$"
let re_datetime = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})[ T](\\d{2}):(\\d{2}):(\\d{2})(Z|[-+]\\d{4})$"

let parse_char ?default cgi = parse ?default ~validate:re_char
  ~parse:(fun res -> res.(0).[0]) cgi

let parse_bool cgi = parse ~default:false ~validate:re_bool
  ~parse:(fun res -> bool_of_string res.(0)) cgi

let parse_int ?default cgi = parse ?default ~validate:re_int
  ~parse:(fun res -> int_of_string res.(0)) cgi

let parse_float ?default cgi = parse ?default ~validate:re_float
  ~parse:(fun res -> float_of_string res.(0)) cgi

let parse_date ?default cgi = parse ?default ~validate:re_date ~parse:(fun res ->
  let year   = int_of_string res.(0)
  and month  = int_of_string res.(1)
  and day    = int_of_string res.(2)
  in let dummy = Netdate.create 0. in
  Netdate.({ dummy with year; month; day; })) cgi

let parse_datetime ?default cgi = parse ?default ~validate:re_datetime ~parse:(fun res ->
  let year   = int_of_string res.(0)
  and month  = int_of_string res.(1)
  and day    = int_of_string res.(2)
  and hour   = int_of_string res.(3)
  and minute = int_of_string res.(4)
  and second = int_of_string res.(5)
  and zone   = match res.(6) with
  |"Z" -> 0
  | dt ->
    let hrs = int_of_string (String.sub dt 1 2)
    and mns = int_of_string (String.sub dt 3 2) in
    let off = 60 * hrs + mns in
    if dt.[0] == '+' then off else -off
  in let dummy = Netdate.create 0. in
  Netdate.({ dummy with year; month; day; hour; minute; second; zone; })) cgi

Writing the JSON methods: `myservice.ml`

That is the infrastructure, in broad strokes. I put each method in its own module, following a simple convention:

I define a type t of parsed method arguments:
```
type t = {
  lon : float;
  lat : float;
  dt  : Netdate.t;
  jd  : Jd.t;
  tz  : int;
  lim : Jd.t option;
}
```
(in this instance, Jd.t is the type of dates represented as Julian dates)

I define a validate function to parse the CGI arguments into a value of type t:

let validate db cgi =
  let open Net in
  let lon = parse_float    cgi "lon"
  and lat = parse_float    cgi "lat"
  and dt  = parse_datetime cgi "dt" in
  let jd  = jd_of_netdate   dt in
  let tz  = parse_int      cgi "tz"  ~default:dt.Netdate.zone in
  let lim = if parse_bool  cgi "tab"
    then Some (Jd.min db.DB.max_date (jd_of_netdate (Net.parse_date cgi "lim")))
    else None
  in
  if not (-180. <= lon && lon <= 180.) then
    Net.argf "Longitude out of range"
  else
  if not (-66.56 <= lat && lat <= 66.56) then
    Net.argf "Latitude out of range"
  else
  if Jd.compare jd db.DB.min_date < 0 then
    Net.argf "Date too early"
  else
  if Jd.compare jd db.DB.max_date > 0 then
    Net.argf "Date too late"
  else
  { lon = lon /. Trig.radian;
    lat = lat /. Trig.radian;
    dt;
    jd;
    tz;
    lim; }

I define and export a json function to generate the actual output:

let json db cgi =
  let req = validate db cgi in
  let tz  = req.dt.Netdate.zone / 60 in
  let tdt = Jd.dynamic_time req.jd in
  (* … *)
  `Assoc [
    "jd"           , `Float  t;
    "dt"           ,  Net.time   Jd.(tdt <-> req.jd);
    "lst"          ,  Net.time  (lst /. Jd.secs_day);
    "lon"          ,  Net.angle  ~reduce:false req.lon;
    "lat"          ,  Net.angle  ~reduce:false req.lat;
    (* … *)
  ]

(functions Net.time and Net.angle return appropriate JSON values). This exported function goes into the dynamic method map, as seen in the service_factory above.

Configuring the `Netplex` server: `myservice.conf`

That is mostly it, code-wise. It remains the detail of configuring Netplex. I use a simple myservice.conf file:

netplex {
  controller {
    max_level = "info";
    logging {
      type = "file";
      file = "/var/log/myservice.log";
      component = "*";
      subchannel = "*";
      max_level = "info";
    };
  };

  service {
    name = "myservice";
    protocol {
      name = "http";
      tcp_nodelay = true;
      address {
        type = "internet";
        bind = "0.0.0.0:8080";
      };
    };
    processor {
      type = "nethttpd";
      timeout = 60.0;
      timeout_next_request = 6.0;
      access_log = "enabled";
      suppress_broken_pipe = true;
      host {
        pref_name = "localhost";
        pref_port = 8080;
        names = "127.0.0.1:0";
        uri {
          path = "/";
          service {
            type = "file";
            docroot = "/path/to/static/";
            media_types_file = "/etc/mime.types";
            default_media_type = "application/xhtml+xml";
            enable_gzip = true;
            enable_listings = false;
            index_files = "index.html";
          };
        };
        uri {
          path = "/cgi";
          method {
            allow = "GET";
            service {
              type = "dynamic";
              handler = "json_service";
            };
          };
        };
      };
    };
    workload_manager {
      type = "dynamic";
      max_jobs_per_thread = 1;
      min_free_jobs_capacity = 2;
      max_free_jobs_capacity = 5;
      max_threads = 50;
    };
  };
}

Note that the Nethttpd_plex section declares two URIs: the root path maps to a file service that will serve the static content, defaulting to XHTML, while the /cgi prefix will map to the dynamic JSON handler. This is useful for development, since it only requires launching myservice -fg and trying it with a Web browser on http://127.0.0.1:8080/. In production I set up Apache with mod_proxy like this:

Alias /myservice /path/to/static

<Directory /path/to/static>
  Options FollowSymLinks
  AllowOverride All
  Order allow,deny
  Allow from all
</Directory>

<Location /myservice/>
  AuthType Digest
  AuthName "SERVICE"
  AuthDigestDomain /myservice/
  AuthUserFile /etc/httpd/passwd
  Require valid-user
</Location>

ProxyPass /myservice/cgi  http://127.0.0.1:8080/cgi

(where /path/to/static and /cgi must match what is configured in myservice.conf). Of course you can map your application to the HTTP root, in this case I have a single Apache instance serving various sub-paths.

Compiling: `Makefile`

It is not necessary to complicate the build process with anything more than a properly written Makefile. In this case I have one interface and one implementation for each JSON method (which you will note don't quite correspond to the dynamic service set-up I've shown first). Note well the list of PACKAGES required for building:

OCAMLFLAGS        = -thread -w @a -unsafe
OCAMLOPTFLAGS     = $(OCAMLFLAGS) -inline 10000
OCAMLLIBS         = unix.cma
CFLAGS            = -I/opt/ocaml/lib/ocaml -arch x86_64 -O3 -Wall -Wextra
PACKAGES          = -package threads,pcre,yojson,netplex,netcgi2,nethttpd

SRC  =            \
    net.ml        \
    myservice.ml  \
    http.ml

PROGS = myservice

all: $(PROGS)

myservice: $(SRC:%.ml=%.cmx)
    ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -linkpkg $+ -o $@

%.cmi: %.mli
    ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $<

%.cmo: %.ml
    ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $<

%.cmx: %.ml
    ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -c $<

%.o: %.c
    ocamlfind ocamlc -ccopt "$(CFLAGS)" -c $<

clean:
    /bin/rm -rf *.o *.a *.so *.cmi *.cmo *.cmx *~ *.log

distclean: clean
    /bin/rm -rf $(PROGS) depend

depend:
    $(OCAMLDEP) -one-line $(OCAMLLIBS) *.ml *.mli > depend

include depend

.PHONY: clean distclean all

Conclusion

There are a number of more advanced issues I'd like to address in the future. As it is, this framework can handle simple GET and POST requests but won't parse multipart attachments nor handle file transfers. Another thing it doesn't handle is HTTP Authorization; for simple requirements a simple filter can work, while for more complicated set-ups the best way to go is, in my opinion, to leverage Apache as I did here.

For those of you that have to interoperate with SOAP Web Services, the same architectural strategy is perfectly applicable with the aid of PXP and perhaps OC-SOAP.

A big field for further exploration is how to structure a complex Web application into independent services; Netplex makes that possible, if not necessarily easy. There is a hot architectural trend making some noise now called Command-Query Separation (CQS); this pattern can be profitably implemented with a single Netplex RPC service that handles all commands to which the Nethttpd_plex workers delegate. The advantages of this separation are enforced separation of concerns and automatic, transparent fault tolerance and distribution, both of which are the guiding principles behind Ocamlnet's design.

A closing remark that I don't want to miss on making is that the payoff of using Ocamlnet's process model is that it is really fast. My "production" server is an ancient 400 MHz PPC with 384 MB RAM which is perfectly capable of producing and serving really computationally-intensive content with minimal environmental requirements. This is something that I simply couldn't hope to pull off with PHP or Java. I encourage you to try Ocamlnet and see if you find, like I do, that Ocaml is the language of choice of discriminating Web developers.

2012-02-22

For the Right Hand

Useful generic functions are usually the result of composing useful generic functions; the fact that these are easy to write is, however, not an excuse for not having them in a rich prelude. Given a total function for splitting a list at a given point:

let rec split_at n = function
| []            -> [], []
| l when n == 0 -> [], l
| x :: xs       ->
  let l, r = split_at (pred n) xs in
  x :: l, r

and the anamorphism on lists, the basic functional iteration scheme giving the list of results:

let rec unfold f e = match f e with
| None         -> []
| Some (x, e') -> x :: unfold f e'

then splitting a list into chunks just requires some assembly:

let chunks n = unfold (fun l -> match split_at n l with [], [] -> None | p -> Some p)

As another example, suppose we need a Procrustean function fit that truncates a list or pads it with a value to a given length. Given an way to build an infinite list:

let all x = let rec l = x :: l in l

the desired function is just:

let fit n e l = fst (split_at n (l @ all e))

The only good thing about the paucity of Pervasives is that it forces one to practice often this kind of finger exercises. At least they are pretty.

2012-01-09

Putting Noise to the Test

So how do you use Perlin's Simplex Noise? By using the GIF encoder, of course! I generated the test image shown in the last post:

Perlin simplex noise in the interval (-2, -2, 0)-(2, 2, 0) computed in OCaml

with a minimal application of both modules in test.ml (source code follows). For a visually richer but not much more complicated example, I also tried replicating Iñigo Quílez's turbulent textures in fbm.ml. The animation traces a tiny loop in noise space so that it seamlessly repeats over and over. The first frame looks like this:

I'd like to know of a very easy way to share source tarballs with you (no, Forges require waiting for new project approval; no, Github requires adhering to a philosophy of sharing for which I don't care; no, Google Code forces me to give up my own version control; no, file hosting is not an option. I'm open to suggestions.); in the meantime, here's a wall of code with the complete project.

`gif.mli`

val fold_string : ('a -> char -> 'a) -> 'a -> string -> 'a

type color = { red : int; green : int; blue : int; }

type palette

val palette : ?background:int -> color array -> palette

val num_colors : palette -> int

type t

val make    : ?ct:palette -> width:int -> height:int
           -> (t -> unit) -> (out_channel -> unit)

val image   : ?ct:palette -> ?transparent:int -> ?fps:int
           -> ?x:int -> ?y:int -> ?width:int -> ?height:int
           -> string -> t -> unit

val comment : string -> t -> unit

val repeat  : ?count:int -> t -> unit

val verify_gif : in_channel -> bool

val grayscale   : palette
val heatmap     : palette
val rainbow     : palette
val black_white : palette

val unwind : protect:('a -> unit) -> ('a -> 'b) -> ('a -> 'b)
val with_output_file : (out_channel -> 'a) -> (string -> 'a)
val with_input_file  : (in_channel  -> 'a) -> (string -> 'a)

`gif.ml`

let unwind ~(protect : 'a -> unit) f x =
  try let y = f x in protect x; y
  with e -> protect x; raise e

class output otch = object (self)
  val         buffer  =  String.create 255
  val mutable current = -1

  method private flush =
    if current > 0 then begin
      output_byte otch current;
      output      otch buffer 0 current;
      current <- 0
    end

  method byte n =
    let b = n land 255 in
    if current == -1 then output_byte otch b else begin
      if current == 255 then self#flush;
      buffer.[current] <- char_of_int b;
      current <- current + 1
    end

  method le16 n =
    self#byte  n;
    self#byte (n lsr 8)

  method string str =
    String.iter (fun c -> self#byte (int_of_char c)) str

  method begin_block =
    if current != -1 then failwith "begin_block";
    current <- 0

  method end_block =
    if current == -1 then failwith "end_block" else
    self#flush;
    current <- -1;
    self#byte 0

  method close =
    if current != -1 then self#end_block;
    flush otch
end

let fold_string f e s =
  let res = ref e in
  String.iter (fun c -> res := f !res c) s;
  !res

class lzw ?(bits=8) out =
  let count       = 1 lsl bits in
  let clear_code  = count
  and end_of_info = count + 1 in
object (self)
  val table : (int * int, int) Hashtbl.t = Hashtbl.create 5003
  val mutable code_size = 0
  val mutable next_code = 0
  val mutable buffer    = 0
  val mutable length    = 0

  method append n =
    buffer <- buffer lor ((n land pred (1 lsl code_size)) lsl length);
    length <- length + code_size;
    while length >= 8 do
      out#byte (buffer land 0xff);
      buffer <- buffer lsr 8;
      length <- length - 8
    done

  method finish =
    if length > 0 then
      out#byte (buffer land pred (1 lsl length))

  method private reset =
    Hashtbl.clear table;
    for i = 0 to count - 1 do
      Hashtbl.add table (-1, i) i
    done;
    code_size <- bits  + 1;
    next_code <- count + 2

  method private add_string s =
    (* Check code for overflow *)
    if next_code == 1 lsl code_size then
      code_size <- code_size + 1;
    Hashtbl.add table s next_code;
    next_code <- next_code + 1;
    (* Limit maximum code size to 12 bits *)
    if next_code == 0x1000 then begin
      self#append clear_code;
      self#reset;
    end

  method compress data =
    self#reset;
    self#append clear_code;
    let last = fold_string (fun prefix c ->
      let  k = int_of_char c in
      try  Hashtbl.find table (prefix, k)
      with Not_found ->
        self#append prefix;
        self#add_string (prefix, k);
        k) (-1) data in
    self#append last;
    self#append end_of_info;
    self#finish
end

(* 0 <= red, green, blue < 256 *)
type color = { red : int; green : int; blue : int; }

type palette = { background : int; bits : int; table : color array; }

let num_colors { table; _ } = Array.length table

let palette ?(background=0) table =
  let size = Array.length table in
  if size < 2
  || size > 256
  || size land (size - 1) != 0
  || background < 0
  || background >= size
  then invalid_arg "color_table" else
  let bits = truncate (log (float size) /. log 2.) in
  assert (size == 1 lsl bits);
  { background; bits; table = Array.copy table; }

type t = {
  width  : int;
  height : int;
  ct     : palette option;
  out    : output;
}

let header { out; _ } = out#string "GIF89a"

let trailer { out; _ } = out#byte 0x3b

let color_table { table; _ } { out; _ } =
  Array.iter (fun { red; green; blue } ->
    out#byte red  ;
    out#byte green;
    out#byte blue) table

let logical_screen ({ width; height; ct; out } as gif) =
  out#le16   width;
  out#le16   height;
  match ct with
  | None ->
    out#byte 0;
    out#byte 0;
    out#byte 0 (* default aspect ratio (1:1 = 49) *)
  | Some ({ background; bits; _ } as ct) ->
    out#byte (0xf0 lor pred bits);
    out#byte background;
    out#byte 0;
    color_table ct gif

let make ?ct ~width ~height proc otch =
  unwind ~protect:(fun { out; _ } -> out#close) (fun gif ->
    header gif;
    logical_screen gif;
    proc gif;
    trailer gif)
    { width; height; ct; out = new output otch; }

let graphics_control_extension transparent fps { out; _ } =
  let delay = if fps = 0 then 0 else (200 + fps) / (fps + fps) in
  out#byte 0x21; (* GIF Extension Code *)
  out#byte 0xf9; (* Graphic Control Label *)
  out#byte 0x04; (* Length of Data Sub-Block *)
  out#byte (match transparent with None -> 0 | Some _ -> 9);
  out#le16 delay;
  out#byte (match transparent with None -> 0 | Some c -> c);
  out#byte 0x00 (* Data Sub-Block Terminator *)

let image_descriptor ct x y width height ({ out; _ } as gif) =
  out#byte 0x2c;
  out#le16 x;
  out#le16 y;
  out#le16 width;
  out#le16 height;
  match ct with
  | None ->
    out#byte 0x00
  | Some ({ bits; _ } as ct) ->
    out#byte (0x80 lor pred bits);
    color_table ct gif

let image ?ct ?transparent ?(fps=0) ?(x=0) ?(y=0) ?width ?height bytes ({ out; _ } as gif) =
  let w = match width  with None -> gif.width  | Some w -> w
  and h = match height with None -> gif.height | Some h -> h in
  if x < 0 || x + w  > gif.width
  || y < 0 || x + h > gif.height
  || String.length bytes != w * h
    then invalid_arg "image";
  let bits = match ct, gif.ct with
  | Some ct, _
  | None   , Some ct -> max 2 ct.bits
  | None   , None    -> invalid_arg "image"
  in
  (match transparent, fps with
  | None, 0 -> ()
  | _   , _ -> graphics_control_extension transparent fps gif);
  image_descriptor ct x y w h gif;
  out#byte bits;
  out#begin_block;
  (new lzw ~bits out)#compress bytes;
  out#end_block

let comment text { out; _ } =
  out#byte 0x21;         (* GIF Extension Code *)
  out#byte 0xfe;         (* Comment Label *)
  out#begin_block;
  out#string text;
  out#end_block

let repeat ?(count=0) { out; _ } =
  out#byte   0x21;       (* GIF Extension code *)
  out#byte   0xff;       (* Application Extension Label *)
  out#byte   0x0b;       (* Length of application block *)
  out#string "NETSCAPE"; (* Application Identifier *)
  out#string "2.0";      (* Appl. Authentication Code *)
  out#byte   0x03;       (* Length of Data Sub-Block *)
  out#byte   0x01;       (* Loop sub-block ID *)
  out#le16   count;      (* Loop count (0 = forever) *)
  out#byte   0x00        (* Data Sub-Block Terminator *)

let verify_gif inch =
  let failf   fmt = Printf.kprintf failwith fmt in
  let check p fmt = if not p then failf fmt in
  let buffer = Buffer.create 16 in
  let input_le16 inch  =
    let b0 = input_byte inch in
    let b1 = input_byte inch in
    b0 lor (b1 lsl 8)
  in
  let verify_header inch =
    let buf = String.create 6 in
    really_input inch buf 0 6;
    check (buf = "GIF89a") "Expected GIF header"
  in
  let verify_color_table len inch =
    let cnt = 3 * (1 lsl len) in
    for i = 1 to cnt do
      ignore (input_byte inch)
    done;
    Printf.printf "CT   %d colors in %d bytes\n" (cnt / 3) cnt
  in
  let verify_blocks inch =
    let tot = ref 0 in
    Buffer.clear buffer;
    try while true do
      let cnt = input_byte inch in
      if cnt == 0 then raise Exit else
      Buffer.add_channel buffer inch cnt;
      incr tot
    done; assert false with Exit ->
      let contents = Buffer.contents buffer in
      Printf.printf "%d bytes in %d blocks, EOB = %010x\n%!"
        (String.length contents) !tot (pos_in inch);
      contents
  in
  let verify_logical_screen_descriptor inch =
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    let backgr = input_byte inch in
    let aspect = input_byte inch in
    Printf.printf "LSD  w = %d, h = %d, f = %2x, b = %d, a = %d\n"
      width height fields backgr aspect;
    if fields land 0x80 == 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_image_descriptor inch =
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let fields = input_byte inch in
    Printf.printf "ID   x = %d, y = %d, w = %d, h = %d, f = %2x\n"
      left top width height fields;
    if fields land 0x80 = 0x80 then
      verify_color_table (1 + fields land 0x07) inch
  in
  let verify_table_based_image inch =
    verify_image_descriptor inch;
    let bits = input_byte inch in
    Printf.printf "IMG  code size = %d, " bits;
    ignore (verify_blocks inch)
  in
  let verify_plain_text_extension inch =
    check (12 == input_byte inch) "Expected block size = 12\n";
    let left   = input_le16 inch in
    let top    = input_le16 inch in
    let width  = input_le16 inch in
    let height = input_le16 inch in
    let _celwid = input_byte inch in
    let _celhgt = input_byte inch in
    let _fgcol  = input_byte inch in
    let _bgcol  = input_byte inch in
    Printf.printf "PTE  l = %d, t = %d, w = %d, h = %d "
      left top width height;
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_application_extension inch =
    check (11 == input_byte inch) "Expected block size = 11\n";
    let label = String.create 11 in
    really_input inch label 0 11;
    Printf.printf "AE   %s " label;
    ignore (verify_blocks inch)
  in
  let verify_graphic_control_extension inch =
    check (4 == input_byte inch) "Expected block size = 4\n";
    let fields = input_byte inch in
    let delay  = input_le16 inch in
    let transp = input_byte inch in
    check (0x00 == input_byte inch) "Expected block terminator\n";
    Printf.printf "GCE  f = %2x, d = %d, t = %d\n"
      fields delay transp
  in
  let verify_comment_extension inch =
    Printf.printf "CE   ";
    Printf.printf "  \"%s\"\n%!" (verify_blocks inch)
  in
  let verify_extension inch =
    match input_byte inch with
    | 0xff -> verify_application_extension     inch
    | 0xfe -> verify_comment_extension         inch
    | 0xf9 -> verify_graphic_control_extension inch
    | 0x01 -> verify_plain_text_extension      inch
    | b    -> failf "Unknown extension introducer %2x" b
  in
  let verify_eof inch =
    try ignore (input_char inch); failf "Extra contents after EOF"
    with End_of_file -> ()
  in
  let rec verify_data inch =
    match input_byte inch with
    | 0x3b -> verify_eof inch
    | 0x2c -> verify_table_based_image inch; verify_data inch
    | 0x21 -> verify_extension         inch; verify_data inch
    | b    -> failf "Unknown content block %2x" b
  in
  try
    verify_header inch;
    verify_logical_screen_descriptor inch;
    verify_data inch;
    true
  with Failure e ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): %s\n%!" off off e;
    false
  | End_of_file ->
    let off = pos_in inch in
    Printf.printf "Offset %5d (%010x): Unexpected EOF\n%!" off off;
    false

let grayscale = palette (Array.init 256 (fun i ->
  { red = i; green = i; blue = i }))

let heatmap =
  let ramp i l h =
    let j = 255 * (i - l) / (h - l) in
    if j < 0 then 0 else if j > 255 then 255 else j
  in palette (Array.init 256 (fun i ->
  { red = ramp i 0 84; green = ramp i 85 170; blue = ramp i 171 255 }))

let rainbow =
  let pi   = 3.14159265358979324
  and q1_3 = 0.333333333333333333 in
  let cs x =
    let c = cos (pi *. x) in
    truncate (255. *. c *. c +. 0.5)
  in palette (Array.init 256 (fun i ->
  let x = float i /. float 256 in
  { red = cs x; green = cs (x -. q1_3); blue = cs (x +. q1_3) }))

let black_white = palette [|
  { red = 255; green = 255; blue = 255; };
  { red =   0; green =   0; blue =   0; };
|]

let with_output_file proc fname =
  unwind ~protect:close_out proc (open_out_bin fname)

let with_input_file proc fname =
  unwind ~protect:close_in proc (open_in_bin fname)

`noise.mli`

val noise : float * float * float -> float
val fBM : ?octaves:int -> float * float * float -> float

(** Clamp an integer into the range [0, 255] *)
val clampb : int -> int
(** Convert a float in the range [-1, 1] into an integer in the range [0, 255] *)
val rescale : float -> int

`noise.ml`

let patterns = [| 0o25; 0o70; 0o62; 0o54; 0o15; 0o23; 0o07; 0o52 |]

let btst n b = (n lsr b) land 1

let bmix i j k b = patterns.((btst i b lsl 2) lor (btst j b lsl 1) lor (btst k b))

let shuffle (i, j, k) =
  bmix i j k 0 + bmix j k i 1 + bmix k i j 2 + bmix i j k 3 +
  bmix j k i 4 + bmix k i j 5 + bmix i j k 6 + bmix j k i 7

let magnitude h (x, y, z) =
  let p, q, r = match h land 7 with
  | 0 -> z , x , y
  | 1 -> x , y , 0.
  | 2 -> y , z , 0.
  | 3 -> z , x , 0.
  | 4 -> z , x , y
  | 5 -> x , 0., z
  | 6 -> y , 0., x
  | 7 -> z , 0., y
  | _ -> assert false
  in match (h lsr 3) land 7 with
  | 0 -> -. p -. q +. r
  | 1 -> +. p -. q -. r
  | 2 -> -. p +. q -. r
  | 3 -> +. p +. q +. r
  | 4 -> +. p +. q -. r
  | 5 -> -. p +. q +. r
  | 6 -> +. p -. q +. r
  | 7 -> -. p -. q -. r
  | _ -> assert false

let simplices = [|
[| (0,0,0); (1,0,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (1,0,0); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,1,0); (1,1,0); (1,1,1) |];
[| (0,0,0); (0,1,0); (0,1,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (1,0,1); (1,1,1) |];
[| (0,0,0); (0,0,1); (0,1,1); (1,1,1) |]
|]

let permindex (u, v, w) =
  if u >= w then
    if u >= v then
      if v >= w then 0 else 1
    else 2
  else
  if v >= w then 3 else
  if u >= v then 4 else 5

let int x =
  if x < 0. then pred (truncate x) else truncate x

let skew (x, y, z) =
  let s = (x +. y +. z) /. 3. in
  let i = int (x +. s)
  and j = int (y +. s)
  and k = int (z +. s) in
  (i, j, k)

let unskew (x, y, z) (i, j, k) =
  let s = float (i + j + k) /. 6. in
  let u = x -. float i +. s
  and v = y -. float j +. s
  and w = z -. float k +. s in
  (u, v, w)

let norm2 (x, y, z) = x *. x +. y *. y +. z *. z

let addi3 (i, j, k) (i', j', k') = (i + i', j + j', k + k')

let noise p =
  let l = skew   p   in
  let x = unskew p l in
  let s = simplices.(permindex x) in
  let f = ref 0. in
  for i = 0 to 3 do
    let v = s.(i) in
    let y = unskew x v in
    let t = 0.6 -. norm2 y in
    if t > 0. then
      let h = shuffle (addi3 l v) in
      let t = t *. t in
      f := !f +. 8. *. t *. t *. magnitude h y
  done;
  !f

let fBM ?(octaves=5) =
  let rec go f w i (x, y, z as p) =
    if i == 0 then f else
    let f = f +. w *. noise p in
    go f (0.5 *. w) (pred i) (2. *. x, 2. *. y, 2. *. z)
  in go 0. 1. octaves

let clampb n =
  (n lor ((255-n) asr (Sys.word_size-2))) land lnot (n asr (Sys.word_size-2)) land 255

let rescale f =
  clampb (int (0.5 +. ldexp (f +. 1.) 7))

`test.ml`

let () =
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  let x0, y0, z0 = -2., -2., 0.
  and sc = 4.0 in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      for i = 0 to height - 1 do
        let y = y0 +. sc *. float i /. float height in
        for j = 0 to width - 1 do
          let x = x0 +. sc *. float j /. float width in
          let p = Noise.noise (x, y, z0) in
          pixels.[i * width + j] <- char_of_int (Noise.rescale p)
        done
      done;
      Gif.image pixels gif)) "perlin.gif"

`fbm.ml`

let ( +/ ) (x0, y0, z0) (x1, y1, z1) = (x0 +. x1, y0 +. y1, z0 +. z1)
let ( */ ) k (x, y, z) = (k *. x, k *. y, k *. z)

let two_pi = 6.283185307179586476925286766552

let () =
  let count  = 100 in
  let width  = 256
  and height = 256 in
  let pixels = String.create (width * height) in
  Gif.with_output_file
    (Gif.make ~ct:Gif.grayscale ~width ~height (fun gif ->
      if count > 1 then Gif.repeat gif;
      let sc = 1.0 in
      for c = 0 to count - 1 do
        let q = two_pi *. float c /. float count in
        let sq = 0.05 *. sin q
        and cq = 0.05 *. cos q in
        for i = 0 to height - 1 do
          let y = sc *. float i /. float height in
          for j = 0 to width - 1 do
            let x = sc *. float j /. float width in
            let p = (x, y, 0.) in
            let q = p +/ 2. */ (
              Noise.fBM (p +/ (   cq, sq, 0.)),
              Noise.fBM (p +/ (-. sq, cq, 0.)),
              Noise.fBM (p +/ (   0., 0., 1.))) in
            let r = p +/ 2. */ (
              Noise.fBM (q +/ (   cq, 0., sq)),
              Noise.fBM (q +/ (   0., 1., 0.)),
              Noise.fBM (q +/ (-. sq, 0., cq))) in
            let f = 2. *. Noise.fBM r  in
            pixels.[i * width + j] <- char_of_int (Noise.rescale f)
          done
        done;
        Printf.printf "%d/%d\n%!" (succ c) count;
        Gif.image ~fps:25 pixels gif
      done))
    "fbm.gif"

`Makefile`

# :mode=makefile:
OCAMLC=    ocamlfind ocamlc   -w Aelz -g
OCAMLOPT=  ocamlfind ocamlopt -w Aelz -unsafe -inline 1000 -ccopt -O2 -ccopt -fomit-frame-pointer
OCAMLLEX=  ocamllex.opt
OCAMLYACC= ocamlyacc
OCAMLDEP=  ocamldep $(PP)
OCAMLMKLIB=ocamlmklib #-custom -linkall

COMN=\
    gif.ml noise.ml

SRCS=\
    $(COMN) test.ml fbm.ml

OBJS=$(COMN:.ml=.cmo)

LIBS=

PROGRAMS=\
    test fbm

all: $(PROGRAMS) $(PROGRAMS:=.bc)

fbm.bc: $(OBJS) fbm.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
fbm: $(OBJS:.cmo=.cmx) fbm.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

test.bc: $(OBJS) test.cmo
    $(OCAMLC) -o $@ $(LIBS) $^
test: $(OBJS:.cmo=.cmx) test.cmx
    $(OCAMLOPT) -o $@ $(LIBS:.cma=.cmxa) $^

clean:
    rm -f *.cm[iox] *.o *.a *~
distclean: clean
    rm -f $(PROGRAMS) $(PROGRAMS:=.bc) *.gif

.SUFFIXES: .cmo .cmi .cmx .ml .mli .mll .mly

.mly.ml:
.mly.mli:
    $(OCAMLYACC) $<

.mll.ml:
    $(OCAMLLEX) $<

.ml.cmo:
    $(OCAMLC) -c $<

.mli.cmi:
    $(OCAMLC) -c $<

.ml.cmx:
    $(OCAMLOPT) -c $<

.depend:
    $(OCAMLDEP) $(SRCS) > .depend

include .depend

2011-12-30

(One by) Four by Nine

The four nines puzzle asks which positive numbers can be obtained from arithmetic expressions involving four "9" digits and an assortment of operations, minimally "+", "-" (binary and unary), "×" and "/", also frequently "√" and sometimes "!" (factorial). I'll show how to find them by brute force. In this case, I'll forgo using factorials; this means that not every number under 100 can be so obtained. As it is, at 130-odd lines, this is a longish program.

Expressions are labeled trees, where type α is the type of labels and type β is the type of leaves:

type ('a, 'b) expr =
| Con of 'a * 'b
| Neg of 'a * ('a, 'b) expr
| Sqr of 'a * ('a, 'b) expr
| Add of 'a * ('a, 'b) expr * ('a, 'b) expr
| Sub of 'a * ('a, 'b) expr * ('a, 'b) expr
| Mul of 'a * ('a, 'b) expr * ('a, 'b) expr
| Div of 'a * ('a, 'b) expr * ('a, 'b) expr

Later I'll make clear what labels are useful for. For now, the simplest operation on an expression is extracting its label:

let label = function
| Con (l, _    )
| Neg (l, _    )
| Sqr (l, _    )
| Add (l, _, _ )
| Sub (l, _, _ )
| Mul (l, _, _ )
| Div (l, _, _ ) -> l

Note that this is not a recursive function; it just extracts the label of the root. Converting expression trees into algebraic syntax is a tedious exercise in unparsing a precedence operator grammar. In this case, since the leaves are of an arbitrary type, format needs a conversion function cnv:

let format cnv e =
  let buf = Buffer.create 10 in
  let rec go level e =
    let prf op prec e =
      Buffer.add_string buf op;
      if prec < level then Buffer.add_char buf '(';
      go prec e;
      if prec < level then Buffer.add_char buf ')'
    in
    let bin op prec e e' =
      if prec < level then Buffer.add_char buf '(';
      go prec e;
      Buffer.add_char   buf ' ';
      Buffer.add_string buf op;
      Buffer.add_char   buf ' ';
      go prec e';
      if prec < level then Buffer.add_char buf ')'
    in match e with
    | Con (_, x    ) -> Buffer.add_string buf (cnv x)
    | Neg (_, e    ) -> prf "-" 10 e
    | Sqr (_, e    ) -> prf "\xe2\x88\x9a" 10 e
    | Add (_, e, e') -> bin "+"  1 e e'
    | Sub (_, e, e') -> bin "-"  2 e e'
    | Mul (_, e, e') -> bin "*"  5 e e'
    | Div (_, e, e') -> bin "/"  6 e e'
  in go 0 e; Buffer.contents buf

The inner function prf formats prefix operators with precedence prec, while bin formats binary operators. In both cases, if op's binding power prec is less than the current precedence level, the whole expression is surrounded by parentheses. Note that I use the UTF-8 representation of the radical sign U+221A; who said that OCaml doesn't do Unicode?

If expressions are labeled with their values, they can be evaluated in constant time with label; to avoid losing precision, I use rational labels. If any expression is out of range, I use the "fraction" 1/0, infinity_ratio, as a sentinel. For this to work with OCaml ratios, I must turn off error checking on null denominators:

let () = Arith_status.set_error_when_null_denominator false

let infinity_ratio = Ratio.(inverse_ratio (ratio_of_int 0))

How to compute the square root of a fraction? If both the numerator and denominator are positive perfect squares, the answer is clear. In any other case, I signal failure via infinity_ratio:

let sqrt_ratio r =
  let open Ratio in
  let open Big_int in
  match sign_ratio r with
  | -1 -> infinity_ratio
  |  0 -> r
  |  1 ->
    let r1, r2 = numerator_ratio r, denominator_ratio r in
    let q1, q2 = sqrt_big_int   r1, sqrt_big_int     r2 in
    let s1, s2 = square_big_int q1, square_big_int   q2 in
    if eq_big_int r1 s1 && eq_big_int r2 s2
      then div_big_int_ratio q1 (ratio_of_big_int q2)
      else infinity_ratio
  |  _ -> assert false

Low-level but straightforward. Now smart constructors make sure that every expression is correctly labeled with its value:

let con n    = Con (Ratio.ratio_of_int        n            , n    )
and neg e    = Neg (Ratio.minus_ratio  (label e)           , e    )
and sqr e    = Sqr (      sqrt_ratio   (label e)           , e    )
and add e e' = Add (Ratio.add_ratio    (label e) (label e'), e, e')
and sub e e' = Sub (Ratio.sub_ratio    (label e) (label e'), e, e')
and mul e e' = Mul (Ratio.mult_ratio   (label e) (label e'), e, e')
and div e e' = Div (Ratio.div_ratio    (label e) (label e'), e, e')
and abs e    =
  let r = label e in
  if Ratio.sign_ratio r != -1 then e else
  Neg (Ratio.minus_ratio r, e)

The stage is set for generating all the expressions. If nines size generates all the expressions using size nines, the recursion schema it obeys is something like this:

  if size == 1 then [   9] else
  if size == 2 then [  99] @ mix (nines 1) (nines 1) else
  if size == 3 then [ 999] @ mix (nines 2) (nines 1) @ mix (nines 1) (nines 2) else
  if size == 4 then [9999] @ mix (nines 3) (nines 1) @ mix (nines 2) (nines 2) @ mix (nines 1) (nines 3) else...

where mix is a hypothetical function that merges two lists of expressions using all the possible binary operators. In each case, the constant 10^size - 1 = 99…9 is the base of the recursion, which proceeds by building all binary trees of a given size by partitioning size in two summands. OK, maybe it is simpler to show the actual code than trying to explain it in English. A small function generates the number having its n digits equal to d:

let rec rep d n = if n == 0 then 0 else d + 10 * (rep d (pred n))

(Yes, the same code can solve the four fours puzzle, or the nine nines puzzle, or…) Another basic function generates a list of integers in a given range:

let range i j =
  let rec go l i j =
    if i > j then l else go (j :: l) i (pred j)
  in go [] i j

(This one is tail recursive just because.) Now, given an expression e, I will count it as valid by adding it to the list es of expressions if it is finite and positive. Furthermore, if it is valid and it has a rational square root, I will count the latter as valid too:

let with_root e es =
  let open Ratio in
  let e = abs e in
  if null_denominator (label e)
  || sign_ratio (label e) == 0 then es else
  let q = sqr e in
  if null_denominator (label q) then e :: es else q :: e :: es

Finally, the workhorse:

let rec puzzle digit size =
  List.fold_right (fun i ->
    let j = size - i in
    List.fold_right (fun e0 ->
      List.fold_right (fun e1 ->
        List.fold_right (fun op ->
          with_root (op e0 e1)
        ) [add; sub; mul; div]
      ) (puzzle digit j)
    ) (puzzle digit i)
  ) (range 1 (size - 1))
  (with_root (con (rep digit size)) [])

It looks more complicated than it is, really; just a list comprehension in a different shape that forces one to read it from the outside in, alas. It all starts with the constant "dd… d", together with_(its)root if it is valid, as a seed for the list of expressions. Then for each i in the range from 1 to size - 1, it recursively builds solutions e0 of size i and e1 of size j = size - i. For each binary operation op, it builds the expression op e0 e1 and adds it to the resulting list of solutions, together with_(its)root if they are valid.

Almost done! The problem is that there could be many possible expressions for a given number; it would be best to find just one exemplar for it. I've written about grouping lists in another era:

let rec group ?(by=(=)) =
  let rec filter e l = match l with
  | [] -> [], []
  | x :: xs ->
    if not (by e x) then [], l else
    let gs, ys = filter e xs in
    x :: gs, ys
  in function
  | [] -> []
  | x :: xs ->
    let gs, ys = filter x xs in
    (x :: gs) :: group ~by ys

A bit of syntax will let me build a filtering pipeline to select the best candidates:

let (|>) x f = f x

Best in this case means that, out of all the expressions evaluating to a given integer, I prefer the one having the shortest representation. Now from all expressions I filter those that have integral value, decorate the expression as a string with its value, sort them and group them by value and select the first:

let fournines =
  let cmp (n, s) (n', s') =
    let c = Pervasives.compare n n' in
    if c != 0 then c else
    let c = Pervasives.compare (String.length s) (String.length s') in
    if c != 0 then c else
    Pervasives.compare s s'
  in puzzle 9 4
  |> List.filter (fun e ->  Ratio.is_integer_ratio (label e))
  |> List.map    (fun e -> (Ratio.int_of_ratio (label e), format string_of_int e))
  |> List.sort cmp
  |> group ~by:(fun (n, _) (n', _) -> n == n')
  |> List.map List.hd

Very operational. It only remains to format the list to standard output:

let () = List.iter (fun (n, s) -> Printf.printf "%4d = %s\n" n s) fournines

Behold, in all its glory, the solution to the puzzle:

1	1 =	99 / 99
2	2 =	99 / 9 - 9
3	3 =	(9 + 9 + 9) / 9
4	4 =	9 / 9 + 9 / √9
5	5 =	9 - 9 / 9 - √9
6	6 =	9 * 9 / 9 - √9
7	7 =	9 - (9 + 9) / 9
8	8 =	99 / 9 - √9
9	9 =	√(99 - 9 - 9)
10	10 =	(99 - 9) / 9
11	11 =	(9 + 9) / 9 + 9
12	12 =	(9 + 99) / 9
13	13 =	9 + 9 / 9 + √9
14	14 =	99 / 9 + √9
15	15 =	9 + 9 - 9 / √9
16	17 =	9 + 9 - 9 / 9
17	18 =	99 - 9 * 9
18	19 =	9 + 9 + 9 / 9
19	20 =	9 + 99 / 9
20	21 =	9 + 9 + 9 / √9
21	24 =	99 / √9 - 9
22	26 =	9 * √9 - 9 / 9
23	27 =	9 * 9 * √9 / 9
24	28 =	9 * √9 + 9 / 9
25	30 =	(99 - 9) / √9
26	32 =	(99 - √9) / √9
27	33 =	99 * √9 / 9
28	34 =	(99 + √9) / √9
29	36 =	9 + 9 + 9 + 9
30	39 =	9 * √9 + 9 + √9
31	42 =	9 + 99 / √9
32	45 =	9 * √9 + 9 + 9
33	51 =	(9 + 9) * √9 - √9
34	54 =	9 * 9 - 9 * √9
35	57 =	(9 + 9) * √9 + √9
36	63 =	9 * 9 - 9 - 9
37	69 =	9 * 9 - 9 - √9
38	72 =	99 - 9 * √9
39	75 =	9 * 9 - 9 + √9
40	78 =	9 * 9 - 9 / √9
41	80 =	9 * 9 - 9 / 9
42	81 =	99 - 9 - 9
43	82 =	9 * 9 + 9 / 9
44	84 =	9 * 9 + 9 / √9
45	87 =	99 - 9 - √9
46	90 =	(9 + 9 / 9) * 9
47	93 =	99 - 9 + √9
48	96 =	99 - 9 / √9
49	98 =	99 - 9 / 9
50	99 =	9 * 99 / 9
51	100 =	9 / 9 + 99
52	102 =	9 / √9 + 99
53	105 =	9 + 99 - √9
54	108 =	99 + √(9 * 9)
55	111 =	999 / 9
56	117 =	9 + 9 + 99
57	126 =	9 * √9 + 99
58	135 =	(9 + 9 - √9) * 9
59	144 =	(9 + √9) * (9 + √9)
60	153 =	(9 + 9) * 9 - 9
61	159 =	(9 + 9) * 9 - √9
62	162 =	9 * 9 + 9 * 9
63	165 =	(9 + 9) * 9 + √9
64	171 =	(9 + 9) * 9 + 9
65	180 =	9 * 9 + 99
66	189 =	(9 + 9 + √9) * 9
67	198 =	99 + 99
68	216 =	(9 * 9 - 9) * √9
69	234 =	9 * 9 * √9 - 9
70	240 =	9 * 9 * √9 - √9
71	243 =	(9 + 9 + 9) * 9
72	246 =	9 * 9 * √9 + √9
73	252 =	9 * 9 * √9 + 9
74	270 =	(99 - 9) * √9
75	288 =	99 * √9 - 9
76	294 =	99 * √9 - √9
77	297 =	9 * 99 / √9
78	300 =	99 * √9 + √9
79	306 =	9 + 99 * √9
80	324 =	(9 + 99) * √9
81	333 =	999 / √9
82	486 =	(9 + 9) * 9 * √9
83	594 =	(9 - √9) * 99
84	648 =	(9 * 9 - 9) * 9
85	702 =	(9 * 9 - √9) * 9
86	720 =	9 * 9 * 9 - 9
87	726 =	9 * 9 * 9 - √9
88	729 =	9 * 9 * √(9 * 9)
89	732 =	9 * 9 * 9 + √9
90	738 =	9 * 9 * 9 + 9
91	756 =	(9 * 9 + √9) * 9
92	810 =	(99 - 9) * 9
93	864 =	(99 - √9) * 9
94	882 =	9 * 99 - 9
95	888 =	9 * 99 - √9
96	891 =	99 * √(9 * 9)
97	894 =	9 * 99 + √9
98	900 =	9 * 99 + 9
99	918 =	(99 + √9) * 9
100	972 =	(9 + 99) * 9
101	990 =	999 - 9
102	996 =	999 - √9
103	1002 =	999 + √9
104	1008 =	9 + 999
105	1188 =	(9 + √9) * 99
106	1458 =	(9 + 9) * 9 * 9
107	1782 =	(9 + 9) * 99
108	2187 =	9 * 9 * 9 * √9
109	2673 =	9 * 99 * √9
110	2997 =	999 * √9
111	6561 =	9 * 9 * 9 * 9
112	8019 =	9 * 9 * 99
113	8991 =	9 * 999
114	9801 =	99 * 99
115	9999 =	9999

(The table is built out of the actual output to Mac OS Terminal. The Unicode characters are printed perfectly.) Using factorials to fill in the gaps is left as an exercise to the reader (it is not simple).

2011-12-21

A Better (Gauss) Error Function

If you do statistics you know of erf and erfc; if you work in OCaml you surely miss them. It is not very difficult to port the canonical implementation given by Numerical Recipes (which I won't show and not just for licensing reasons); if you Google for a coefficient you'll see that this approximation is, indeed, ubiquitous. There exists a better approximation in the literature, one that is more than 40 years old. Since I find it inconceivable that it is not more widely used (again, Google for a coefficient), I present a straightforward implementation of it.

This approximation boasts a relative error of 10^-19.5 in the range |x| ≤ 0.46875 = 15/32, of 10^-18.59 in the range 0.46875 ≤ |x| ≤ 4, and of 10^-18.24 in the range 4 ≤ |x|. The only difficulty is to accurately transcribe the coefficient tables into code; I've double-checked them and ran some tests to assess the proper functioning of the code:

let r0 = [|
3.20937_75891_38469_47256_2e+03, 2.84423_68334_39170_62227_3e+03;
3.77485_23768_53020_20813_7e+02, 1.28261_65260_77372_27564_5e+03;
1.13864_15415_10501_55649_5e+02, 2.44024_63793_44441_73305_6e+02;
3.16112_37438_70565_59694_7e+00, 2.36012_90952_34412_09349_9e+01;
1.85777_70618_46031_52673_0e-01, 1.00000_00000_00000_00000_0e+00;
|]

let r1 = [|
1.23033_93547_97997_25272e+03, 1.23033_93548_03749_42043e+03;
2.05107_83778_26071_46532e+03, 3.43936_76741_43721_63696e+03;
1.71204_76126_34070_58314e+03, 4.36261_90901_43247_15820e+03;
8.81952_22124_17690_90411e+02, 3.29079_92357_33459_62678e+03;
2.98635_13819_74001_31132e+02, 1.62138_95745_66690_18874e+03;
6.61191_90637_14162_94775e+01, 5.37181_10186_20098_57509e+02;
8.88314_97943_88375_94118e+00, 1.17693_95089_13124_99305e+02;
5.64188_49698_86700_89180e-01, 1.57449_26110_70983_47253e+01;
2.15311_53547_44038_46343e-08, 1.00000_00000_00000_00000e+00;
|]

let r2 = [|
-6.58749_16152_98378_03157e-04, 2.33520_49762_68691_85443e-03;
-1.60837_85148_74227_66278e-02, 6.05183_41312_44131_91178e-02;
-1.25781_72611_12292_46204e-01, 5.27905_10295_14284_12248e-01;
-3.60344_89994_98044_39429e-01, 1.87295_28499_23460_47209e+00;
-3.05326_63496_12323_44035e-01, 2.56852_01922_89822_42072e+00;
-1.63153_87137_30209_78498e-02, 1.00000_00000_00000_00000e+00;
|]

The tables are laid out as pairs of coefficients (p_i, q_i) for the numerator and denominator polynomials, respectively, in ascending order of monomial power (that is, the independent coefficients are in the first position of the arrays). The rational functions can be evaluated with a suitable variation of Horner's schema:

let horner2 r x =
  let n = Array.length r in
  let s = ref 0.
  and t = ref 0. in
  for i = n - 1 downto 0 do
    let p, q = Array.unsafe_get r i in
    s := !s *. x +. p;
    t := !t *. x +. q
  done;
  !s /. !t

For clarity I haven't inlined Horner's recursion in the following code; this is not very difficult to do, so I've left it as an exercise for the implementor:

let iqpi = 5.64189_58354_77562_86948_1e-01

let erfc x =
  let z  = abs_float x in
  let z2 = z *. z in
  let y =
    if z < 0.46875 then   1. -.         z   *. horner2 r0 z2 else
    if z < 4.      then         exp (-. z2) *. horner2 r1 z  else
    let z'  = 1. /. z  in
    let z'2 = z' *. z' in       exp (-. z2) *. z' *. (iqpi +. z'2 *. horner2 r2 z'2)
  in if x < 0. then 2. -. y else y

As explained in the paper, erfc x must be everywhere equal to 1. -. erf x:

let erf x = 1. -. erfc x

If you have Mathematica™ lying around, you might want to output a list of points to test the accuracy of the approximation:

let mma f x0 x1 n =
  let buf = Buffer.create 256 in
  Buffer.add_string buf "l = Map[N[\
(1 - 2 BitAnd[1, BitShiftRight[#, 63]])\
2^(BitAnd[16^^7ff, BitShiftRight[#, 52]] - 1023)\
(1 +  2^-52 BitAnd[16^^fffffffffffff, #]),\
$MachinePrecision]&, #, {2}]&@{";
  for i = 0 to n do
    let x = (x0 *. float (n - i) +. x1 *. float i) /. float n in
    let y = f x in
    if i != 0 then Buffer.add_char buf ',';
    Printf.bprintf buf "{16^^%Lx,16^^%Lx}"
      (Int64.bits_of_float x)
      (Int64.bits_of_float y)
  done;
  Buffer.add_string buf "};";
  Buffer.contents buf

In order to exactly marshal the IEEE 754 double values I convert them to rational numbers equivalent to machine floats on Mathematica's end. My informal tests show that around the cut-off points Mathematica evaluates erf to the same exact values this code does (that is, the absolute error is zero). When evaluated to sufficient precision, the maximum absolute error among 5,000 equispaced points in the interval [3.99, 4.01] is 5.5437×10^-17.

To mathematical libraries implementors everywhere, I urge you: please steal this code!

References

W. J. Cody. Rational Chebyshev Approximations for the Error Function, Mathematics of Computation Vol. 23, No. 107 (Jul., 1969), pp. 631-63. Online.

2011-11-17

Brainfuck in Java

(I don't feel like writing a punning title.) Last time I showed how the use of functors allows for modular and type-safe specification of semantics. I wrote I can't imagine this flexibility in object-oriented languages like Java or Python; fighting words for which I was justly taken to task. Here's my penance: a morally equivalent reimplementation in Java of the Brainfuck interpreter. The translation was mechanic, but I realized with some surprise that the result is more flexible than the original OCaml: whereas the latter only allows static composition of semantics (unless you use first-class modules), Java allows fully dynamic, run-time stacking of semantics.

Even though the Java is better decoupled than the original (which uses the same memory representation for every interpreter variant), this is more by necessity than design. In the process, the interpreter suffered more than a 300% code expansion by line count at 520 lines. Bear with me! This is a single-file, multiple-class program; feel free to package and refactor it to your heart's content. The external requirements are minimal:

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

As advertised, the top-level Brainfuck class parses, prints and evaluates Brainfuck programs (a List of Instructions) using fully compositional semantics:

public final class Brainfuck {
    public static void main(String[] args) {
        String hello =
">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]"+
"<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+.";
        Semantics semantics = new S3_(5000, new S2(new S1_(new S0())));
        List<Instruction> program = Brainfuck.parse(hello);
        System.out.print(Brainfuck.print(program));
        Brainfuck.evaluate(semantics, program);
    }

The class is completely static and reentrant; the main operations delegate to appropriate visitors:

private Brainfuck() { }

    public static void evaluate(Semantics semantics, List<Instruction> program) {
        new Evaluator(semantics).evaluate(program);
    }

    public static String print(List<Instruction> program) {
        return new Printer().print(program);
    }

The parser is standard, recursive-descent but with explicitly-managed recursion. For simple (concrete) instructions it just adds the corresponding Instruction to the current program. When it sees an opening bracket it uses a Stack to hold the partially-parsed List of Instructions and starts fresh. When it encounters a closing bracket, it builds a Repeat instruction from the just-parsed program, pops the context and adds the former to the latter:

public static List<Instruction> parse(String text) {
        final Stack<List<Instruction> > stack = new Stack<List<Instruction> >();
        List<Instruction> program = new ArrayList<Instruction>();

        for (int i = 0; i != text.length(); i++)
            switch (text.charAt(i)) {
            case '>': program.add(Instruction.FWD); break;
            case '<': program.add(Instruction.BAK); break;
            case '+': program.add(Instruction.INC); break;
            case '-': program.add(Instruction.DEC); break;
            case '.': program.add(Instruction.OUT); break;
            case ',': program.add(Instruction.INP); break;
            case '[':
                stack.push(program);
                program = new ArrayList<Instruction>();
                break;
            case ']':
                if (stack.empty())
                    throw new IllegalArgumentException("unmatched `]'");
                final Instruction rep = Instruction.REP(program);
                program = stack.pop();
                program.add(rep);
                break;
            }
        if (!stack.empty())
            throw new IllegalArgumentException("unmatched `['");
        return program;
    }
}

A textbook example that completes the class. Now for the real program. An Instruction is an abstract class implementing the case class pattern à la Scala (cf Wadler's expression problem). Every instruction supports two operations, evaluation and printing, by using the visitor pattern:

abstract class Instruction {
    private Instruction() { }

    public abstract void evaluate(Evaluator evaluator);
    public abstract void print(Printer printer);

Each concrete Instruction is tucked away inside this class. They simply forward the call to the passed-in visitor. The first six are mind-numbingly straightforward:

private static final class Forward extends Instruction {
        private Forward() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateForward();
        }

        @Override
        public void print(Printer printer) {
            printer.printForward();
        }

    private static final class Backward extends Instruction {
        private Backward() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateBackward();
        }

        @Override
        public void print(Printer printer) {
            printer.printBackward();
        }

    private static final class Increment extends Instruction {
        private Increment() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateIncrement();
        }

        @Override
        public void print(Printer printer) {
            printer.printIncrement();
        }

    private static final class Decrement extends Instruction {
        private Decrement() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateDecrement();
        }

        @Override
        public void print(Printer printer) {
            printer.printDecrement();
        }

    private static final class Output extends Instruction {
        private Output() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateOutput();
        }

        @Override
        public void print(Printer printer) {
            printer.printOutput();
        }

    private static final class Input extends Instruction {
        private Input() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateInput();
        }

        @Override
        public void print(Printer printer) {
            printer.printInput();
        }

The last Instruction is fractionally more interesting in that it has a program as a parameter:

private static final class Repeat extends Instruction {
        private final List<Instruction> program;

        private Repeat(List<Instruction> program) {
            this.program = program;
        }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateRepeat(program);
        }

        @Override
        public void print(Printer printer) {
            printer.printRepeat(program);
        }

These classes are fully private to the enclosing case class Instruction; the Brainfuck interpreter uses static members to refer to each:

public static final Instruction FWD = new Forward();
    public static final Instruction BAK = new Backward();
    public static final Instruction INC = new Increment();
    public static final Instruction DEC = new Decrement();
    public static final Instruction OUT = new Output();
    public static final Instruction INP = new Input();

    public static Instruction REP(List<Instruction> program) {
        return new Repeat(program);
    }
}

This is all the (essentially boilerplate) code necessary just to reproduce an algebraic data type over which OCaml code would pattern match. The visitors implement each arm of this match, so more boilerplate follows. The first one is the Evaluator:

class Evaluator {
    private final Semantics semantics;
    private final Memory memory;

It takes a Semantics and sets itself to operate upon an initial Memory:

public Evaluator(Semantics semantics) {
        this.semantics = semantics;
        this.memory = semantics.initial();
    }

This is all a bit abstract because both Semantics and Memory are interfaces that can be plugged-in at will. To evaluate a program, it visits each Instruction in turn:

public void evaluate(List<Instruction> program) {
        for (Instruction instruction : program)
            instruction.evaluate(this);
    }

Again, this is a bog-standard visitor. Each visiting method simply delegates to the chosen Semantics with the current state of the Memory:

public void evaluateForward() {
        semantics.forward(this.memory);
    }

    public void evaluateBackward() {
        semantics.backward(this.memory);
    }

    public void evaluateIncrement() {
        semantics.increment(this.memory);
    }

    public void evaluateDecrement() {
        semantics.decrement(this.memory);
    }

    public void evaluateOutput() {
        semantics.output(this.memory);
    }

    public void evaluateInput() {
        semantics.input(this.memory);
    }

The Repeat instruction is an Action over a Memory that corresponds to evaluate-ing the repeated program:

public void evaluateRepeat(final List<Instruction> program) {
        semantics.repeat(new Action<Memory>() {
            @Override
            public void apply(Memory _unused) {
                evaluate(program);
            }
        }, this.memory);
    }
}

(Why is the Memory _unused? Because evaluate already has it in scope, but the Semantics are fully parametric with respect to the store. If you find an elegant solution to this that doesn't involve each Instruction passing the Memory around, let me know.) This is the moral equivalent of a higher-order closure; someday you might be able to write a lambda instead of that. The second visitor is the Printer:

class Printer {
    private final StringBuffer buffer = new StringBuffer(72);
    private int linelen;

It formats the Brainfuck program in neat lines of 72 characters:

public Printer() {
        linelen = 0;
    }

    private void putc(char c)  {
        if (linelen == 72 || c == '\n') {
            buffer.append('\n');
            linelen = 0;
        }
        if (c != '\n') {
            buffer.append(c);
            linelen++;
        }
    }

The visitor sets the buffer up, prints each instruction by visiting them in turn and closes the buffer:

public String print(List<Instruction> program) {
        buffer.setLength(0);
        for (Instruction instruction : program)
            instruction.print(this);
        putc('\n');
        return buffer.toString();
    }

Each visited Instruction accumulates its representation in the buffer:

public void printForward()   { putc('>'); }
    public void printBackward()  { putc('<'); }
    public void printIncrement() { putc('+'); }
    public void printDecrement() { putc('-'); }
    public void printOutput()    { putc('.'); }
    public void printInput()     { putc(','); }

Aside: It is not the responsibility of the Instruction to know its external representation since it denotes Brainfuck's abstract syntax. Do not be misled by phony invocations to the Law of Demeter.

Repeat just emits its list of Instructions between brackets by mutual recursion between it and this visitor:

public void printRepeat(List<Instruction> program) {
        putc('[');
        for (Instruction instruction : program)
            instruction.print(this);
        putc(']');
    }
}

Neat. Now for the raison d'être of the exercise: A Semantics encapsulates the effects of each Brainfuck instruction upon the Memory:

interface Semantics {
    Memory initial();
    void forward(Memory memory);
    void backward(Memory memory);
    void increment(Memory memory);
    void decrement(Memory memory);
    void output(Memory memory);
    void input(Memory memory);
    void repeat(Action<Memory> program, Memory memory);
}

The effect of a repetition depends on the program being repeated; this is represented by an Action taking the Memory as a parameter:

interface Action<T> {
    void apply(T argument);
}

Aside: Don't confuse a program as a List of Instructions with its meaning as an Action or effect. It wouldn't be appropriate for repeat to take anything else.

In object-oriented languages there are two reuse mechanisms: inheritance and composition. The first is static, compile-time and determined by the providing (i.e. library) code; the second is dynamic, run-time and controlled by the consuming (i.e. client) code. Coplien's envelope pattern allows to combine both in a sort of dynamic inheritance:

abstract class DelegatingSemantics implements Semantics {
    private final Semantics delegate;

    public DelegatingSemantics(Semantics delegate) {
        this.delegate = delegate;
    }

    @Override
    public Memory initial()              { return delegate.initial(); }

    @Override
    public void forward(Memory memory)   { delegate.forward(memory); }

    @Override
    public void backward(Memory memory)  { delegate.backward(memory); }

    @Override
    public void increment(Memory memory) { delegate.increment(memory); }

    @Override
    public void decrement(Memory memory) { delegate.decrement(memory); }

    @Override
    public void output(Memory memory)    { delegate.output(memory); }

    @Override
    public void input(Memory memory)     { delegate.input(memory); }

    @Override
    public void repeat(Action<Memory> program, Memory memory) {
        delegate.repeat(program, memory);
    }
}

Envelopes take a delegating letter to which to forward by default the implemented methods; subclasses need only override the methods of interest and let the envelope handle the rest. The initial semantics S0 is, however, fully concrete. It operates on an unbounded memory of machine integers:

class S0 implements Semantics {
    @Override
    public Memory initial()              { return new UnboundedMemory(); }

    @Override
    public void forward(Memory memory)   { memory.next(); }

    @Override
    public void backward(Memory memory)  { memory.prev(); }

    @Override
    public void increment(Memory memory) { memory.set(memory.get() + 1); }

    @Override
    public void decrement(Memory memory) { memory.set(memory.get() - 1); }

    @Override
    public void output(Memory memory) {
        System.out.print((char) (memory.get() & 255));
    }

The input behavior on EOF is to leave the memory untouched:

@Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            if (c != -1) memory.set(c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }

The meaning of repeat is to execute the enclosed program until the current memory cell is zero:

@Override
    public void repeat(Action<Memory> program, Memory memory) {
        int c;
        while ( (c = memory.get()) != 0 )
            program.apply(memory);
    }
}

This interpretation of repeat will probably never change (but consider a non-Turing-complete version of Brainfuck with bounded repetition, e.g. for server-side Core Wars competitions). The first difference, however, is on input:

class S1 extends DelegatingSemantics {
    public S1(Semantics delegate) { super(delegate); }

    @Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            memory.set(c == -1 ? 0 : c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }
}

The S1 interpretation overrides a Semantics so that it sets the current cell to zero on EOF. The other variant is to set it to -1:

class S1_ extends DelegatingSemantics {
    public S1_(Semantics delegate) { super(delegate); }

    @Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            memory.set(c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }
}

The second difference is in the memory cell size, in this case 8-bit-wide:

class S2 extends DelegatingSemantics {
    public S2(Semantics delegate) { super(delegate); }

    @Override
    public void increment(Memory memory) {
        memory.set((memory.get() + 1) & 255);
    }

    @Override
    public void decrement(Memory memory) {
        memory.set((memory.get() - 1) & 255);
    }
}

Although it uses the delegate Semantics's memory, it truncates it on mutation. The third difference is in the organization of the Memory itself:

class S3 extends DelegatingSemantics {
    protected final int length;

    public S3(int length, Semantics delegate) {
        super(delegate);
        this.length = length;
    }

    @Override
    public Memory initial() { return new BoundedMemory(length); }
}

It uses a bounded memory of the size provided, which disallows moving past its bounds. A variation is to wrap around the current cell when reaching either end:

class S3_ extends DelegatingSemantics {
    protected final int length;

    public S3_(int length, Semantics delegate) {
        super(delegate);
        this.length = length;
    }

    @Override
    public Memory initial() { return new CircularMemory(length); }
}

The same code except for the underlying store selected. The Memory interface itself is simple:

interface Memory {
    int get();
    void set(int value);
    void next();
    void prev();
}

The only operations on it are getting and setting the current cell's value and moving it one position in each direction. The implementation of an UnboundedMemory is, however, a bit sophisticated. It uses the standard Turing Machine trick of simulating a doubly-infinite tape with two semi-infinite ones:

class UnboundedMemory implements Memory {
    private final List<Integer> left  = new ArrayList<Integer>();
    private final List<Integer> right = new ArrayList<Integer>();
    private int cursor;

The right tape contains cells with index 0, 1, 2… while the left tape contains cells indexed by -1, -2, -3… Initially only the zero-th cell exists:

public UnboundedMemory() {
        right.add(0);
        cursor = 0;
    }

The class maintains the invariant that the current cell, to which cursor points, always exists:

@Override
    public int get() {
        if (cursor < 0)
            return left.get(-1 - cursor);
        else
            return right.get(cursor);
    }

    @Override
    public void set(int value) {
        if (cursor < 0)
            left.set(-1 - cursor, value);
        else
            right.set(cursor, value);
    }

New cells are created on demand on the appropriate tape upon cursor movement:

@Override
    public void next() {
        cursor++;
        if (cursor >= 0 && right.size() == cursor)
            right.add(0);
    }

    @Override
    public void prev() {
        --cursor;
        if (cursor < 0 && left.size() == -1 - cursor)
            left.add(0);
    }
}

A BoundedMemory is, by contrast, much simpler:

class BoundedMemory implements Memory {
    protected final int[] tape;
    protected int cursor;

    public BoundedMemory(int length) {
        this.tape = new int[length];
        cursor = 0;
    }

    @Override
    public int get() {
        return tape[cursor];
    }

    @Override
    public void set(int value) {
        tape[cursor] = value;
    }

The only intelligence required is in avoiding moving the cursor outside its bounds:

@Override
    public void next() {
        if (cursor != tape.length - 1)
            cursor++;
    }

    @Override
    public void prev() {
        if (cursor != 0)
            --cursor;
    }
}

A CircularMemory is just a BoundedMemory with a different out-of-bounds behavior:

class CircularMemory extends BoundedMemory {
    public CircularMemory(int length) {
        super(length);
    }

    @Override
    public void next() {
        cursor++;
        if (cursor == tape.length)
            cursor = 0;
    }

    @Override
    public void prev() {
        if (cursor == 0)
            cursor = tape.length;
        --cursor;
    }
}

tl; dr: writing interpreters in Java is tedious. Interestingly enough, I find the object-oriented expression of the Semantics over the different types of Memory very natural, more so even than what modular programming allows. The downsides I find, at least for Java, are two:

The sheer amount of boilerplate necessary to express the problem. Even allowing for the lack of closures, algebraic data types and pattern matching, it lacks an automatic delegation mechanism. This is actually a shortcoming of every mainstream object-oriented language (and no, metaprogramming is not a substitute for a language construct that can be checked and verified statically for a number of properties that run-time reflection would not)
The need to name everything. Nominal type disciplines not only suck on principle, they actually encourage naming trivial program constructs (classes, interfaces, methods) that therefore take an artificial existence of their own, bloating (module) interfaces and opening new ports for coupling

Still with me? Let me thank you and express my admiration for your fortitude.

Alaska Ataca a Kamtchatka

2012-10-25

How to Write a Simple Web Application Using Ocamlnet

Setting up a Netplex server: `http.ml`

A `Nethttp` JSON framework: `net.ml`

Writing the JSON methods: `myservice.ml`

Configuring the `Netplex` server: `myservice.conf`

Compiling: `Makefile`

Conclusion

2012-02-22

For the Right Hand

2012-01-09

Putting Noise to the Test

`gif.mli`

`gif.ml`

`noise.mli`

`noise.ml`

`test.ml`

`fbm.ml`

`Makefile`

2011-12-30

(One by) Four by Nine

2011-12-21

A Better (Gauss) Error Function

References

2011-11-17

Brainfuck in Java

About Me

Statement of Rights

Labels

Blog Archive

Alaska Ataca a Kamtchatka

2012-10-25

How to Write a Simple Web Application Using Ocamlnet

Setting up a Netplex server: http.ml

A Nethttp JSON framework: net.ml

Writing the JSON methods: myservice.ml

Configuring the Netplex server: myservice.conf

Compiling: Makefile

Conclusion

2012-02-22

For the Right Hand

2012-01-09

Putting Noise to the Test

gif.mli

gif.ml

noise.mli

noise.ml

test.ml

fbm.ml

Makefile

2011-12-30

(One by) Four by Nine

2011-12-21

A Better (Gauss) Error Function

References

2011-11-17

Brainfuck in Java

About Me

Statement of Rights

Labels

Blog Archive

Setting up a Netplex server: `http.ml`

A `Nethttp` JSON framework: `net.ml`

Writing the JSON methods: `myservice.ml`

Configuring the `Netplex` server: `myservice.conf`

Compiling: `Makefile`

`gif.mli`

`gif.ml`

`noise.mli`

`noise.ml`

`test.ml`

`fbm.ml`

`Makefile`