This post might seem to be in apparent contradiction: Ocamlnet
is a large, very opinionated framework for network programming that solves most, if not all, those infrastructure issues that need to be taken care of when writing a networked, distributed, fault-tolerant server, from process management to string decoding and including protocol parsing and building. The fact is that Ocamlnet
is not particularly complicated for all that it does, and it does quite a bit; but there are (for me at least) three hurdles to overcome in order to start using it effectively:
- It is big. The project includes Posix wrappers, string handling, process management, CGI programming, various protocols, RPC registration, interprocess communication…
- It is strange. Most of the API is object oriented rather than functional, using inheritance extensively
- It is underdocumented. While the API documents are complete, and the project page includes some cookbook examples, for most non-trivial needs you have to go deep into the source code
In this instance I'll follow a tutorial, top-down style and I won't necessarily show complete compile-and-run code, which means that you'll have to reconstruct the source code to a compilable state, but I hope it will still be useful as a starting point and guide to writing HTTP services with Ocamlnet
. This tutorial assumes you have installed OCaml 4.0 with Findlib, Ocamlnet 3.6, Yojson and its dependencies and PCRE-Ocaml.
Also, I won't attempt to compare it with Ocsigen since I've never used it, not least because I'm not really fond of convention-over-configuration-style frameworks I don't know it at all, and maybe I'm irrationally prejudiced :-)
. If you want full control of your code and feel comfortable with a large number of tweakable knobs, then Ocamlnet
is for you. If you have work that needs to be done quickly and you're used to RoR or Django, Ocamlnet
will definitely set your hair on fire.
Setting up a Netplex server: http.ml
The architecture I use is a Web application written in HTML that consumes JSON services published in the back-end. Ocamlnet
lets me serve both the static content (HTML, CSS, JavaScript and graphic assets) and route a number of dynamic services. I can use just that for development, and have a production Apache server that serves the static content and proxies the dynamic requests to the Ocamlnet
-published services. Another simplification I make is that I manually route the method URLs instead of letting Ocamlnet
do it itself, as it is otherwise perfectly capable to. This makes it simpler to configure the services, at the cost of having to explicitely handle routing in code.
Let's begin by taking a look at the main function:
let main () = let database f = db_file := Io.get_absolute_path f; if not (Sys.file_exists !db_file) then raise (Arg.Bad ("File not found: " ^ f)) in let usage = "usage: " ^ (Filename.basename Sys.executable_name) ^ " [options]" in let opt_list, cmdline_cfg = Netplex_main.args () in let opt_list = Arg.align ([ "-db" , Arg.String database, "<file> Database file"; ] @ opt_list) in Arg.parse opt_list (fun s -> raise (Arg.Bad ("Invalid argument: " ^ s))) usage; Netsys_signal.init (); Netplex_main.startup ~late_initializer:(fun _ _container -> Netlog.logf `Notice "Starting up") (Netplex_mp.mp ()) Netplex_log.logger_factories Netplex_workload.workload_manager_factories [ service_factory (); ] cmdline_cfg let () = if not !Sys.interactive then main ()
Netplex
is the part of Ocamlnet
that orchestrates the management and intercommunication between the processes that make up a network service. It has a number of command-line options for configuration, most notably -fg
to launch the service in the foreground instead of as a detached dæmon. Netplex_main.args
gives back a list of needed options upon which to add program-specific ones. In this case the only option is to pass a database file. Every filesystem resource must be accessed by absolute path, since Netplex
changes the working directory to /
upon startup. This file is stored in a global reference:
let db_file = ref (Io.get_absolute_path "myfile.db")
Once the command line is parsed, the service is created. First, Ocamlnet
has to take over signal handling to give the service an orderly lifecycle (Netsys
is the collection of modules providing cross-platform POSIX functionality). Netplex
is then started with the multi-process parallelizer, the standard log and workload factories that set up themselves out of the configuration file, and a single custom service factory that will create the HTTP services themselves:
let service_factory = Nethttpd_plex.nethttpd_factory ~name:"nethttpd" ~hooks:(new environment_hooks) ~config_cgi:Netcgi.({ default_config with default_exn_handler = false; permitted_input_content_types = [ "application/json"; ] @ default_config.permitted_input_content_types }) ~handlers:[ "json_service", Net.json_service ~dyn_uri:"/cgi" [ "daterange", with_db Daterange.json; "calculate", with_db Calculate.json; "calendar" , with_db Calendar.json; ]; ] ~services:Nethttpd_plex.default_services
The service name
must correspond with that defined in the config
file. In order to arrange for the workers to have access to the database file I intercept the service creation with hooks
to their lifecycle process to open it and close it as needed. Netcgi
sets up the environment that each HTTP service requires to function; in this case I use a minimal configuration that augments valid POST
MIME types with a type for JSON requests (not otherwise used in this case) and opt out of the standard exception handler in exchange for my own. I configure a single "json_service"
handler that will dispatch to the relevant methods of type cgi_activation → Yojson.Basic.json
. The Netplex
services for this service are the default Nethttpd_plex
ones required by the infrastructure in order to manage the lifecycle of the process group: creation, tear-down and IPC. Note well that the factory is a thunk, not a data structure, the resulting type is unit → Netplex_types.processor_factory
.
The lifecycle hooks are specified as a subclass of Netplex_kit.empty_processor_hooks
. It uses the Netplex
environment plug-in to store a shared reference to the database in a way that both thread- and process-based services can access in an uniform manner:
class environment_hooks = object inherit Netplex_kit.empty_processor_hooks () method! post_add_hook _ container = container#add_plugin Netplex_mutex.plugin; container#add_plugin Netplex_sharedvar.plugin method! post_start_hook container = Netlog.logf `Info "(%s) opening database \"%s\"" (Net.cur_sys_id ()) !db_file; try set_db (DB.open_database !db_file) with DE42x.Error (msg) -> container#log `Crit msg method! pre_finish_hook _container = Netlog.logf `Info "(%s) closing database" (Net.cur_sys_id ()); match get_db () with | Some db -> DB.close_db db | None -> () end
In this case I open and close database connections (represented here as an open file descriptor) which are stored in a per-process environment:
let get_db, set_db = let env_id = "MyService.localenv" in let module E = Netplex_cenv.Make_var_type (struct type t = DB.t end) in (fun () -> try Some (E.get env_id) with Netplex_cenv.Container_variable_not_found _ -> None), E.set env_id
Neplex_cenv
makes a strongly-typed accessor for shared variables; in this case I have just one keyed by env_id
. As a utility I arrange for my service methods to be closed over a reference to the database (cf the handler setup above):
let with_db proc arg = match get_db () with | None -> Net.failf "no database!" | Some db -> proc db arg
A Nethttp
JSON framework: net.ml
Every Nethttpd_plex
-based service follows the same structure, while the specifics will make up for the bulk of the application. In this example these details have to do with utilities that make consuming and producing JSON data easier. I have a Net
module with a number of helpers, of which I've used two already, cur_sys_id
and failf
:
let failf fmt = Printf.ksprintf failwith fmt and argf fmt = Printf.ksprintf invalid_arg fmt let cur_sys_id () = match Netplex_cenv.current_sys_id () with | `Process pid -> Printf.sprintf "PID %d" pid | `Thread tid -> Printf.sprintf "TID %d" tid
Another useful function is an encoding-safe string
wrapper:
let text = Netencoding.Html.encode_from_latin1
Normally, Nethttp
sends HTML 4.01-formatted error messages. In a JSON-based application it is preferable to have standardized JSON errors:
let error_json (env : Netcgi.cgi_environment) status fields cause message = let json_of_header hdr = try `String (env#input_header_field hdr) with Not_found -> `Null in try let script_name = env#cgi_script_name in let code = Nethttp.int_of_http_status status in env#log_error (Printf.sprintf "%s: %s (Status %i)" script_name message code); env#set_output_header_fields []; (* reset *) env#set_output_header_field "Content-type" "application/json; charset=utf-8"; env#set_status status; env#set_multiple_output_header_field "Cache-control" [ "max-age=0"; "must-revalidate" ]; let secs = Netdate.mk_mail_date (Unix.time ()) in env#set_output_header_field "Expires" secs; List.iter (fun (n,v) -> env#set_multiple_output_header_field n v) fields; env#send_output_header(); if env#cgi_request_method <> "HEAD" then Yojson.Basic.to_output env#out_channel (`Assoc [ "status" , `Int code; "statusText" , `String (Nethttp.string_of_http_status status); "cause" , `String cause; "message" , `String message; "scriptName" , `String script_name; "requestMethod", `String env#cgi_request_method; "queryString" , `String env#cgi_query_string; "referrer" , json_of_header "referer" ]); env#out_channel#flush (); env#out_channel#close_out () with e -> Netlog.logf `Crit "Unexpected exception %s" (Printexc.to_string e)
This is a good example of how to use the cgi_environment
to query the CGI execution and to exert maximum control over the HTTP response. I raise standard Ocaml exceptions from the method handlers and translate them into the relevant HTTP status codes by wrapping them in a higher-order protect
ive function:
let protect handler (cgi : Netcgi.cgi_activation) = try handler cgi with | Netcgi.Argument.Oversized -> error_json cgi#environment `Request_entity_too_large [] "Oversized" "A POST parameter exceeds maximum allowed size" | Invalid_argument msg -> error_json cgi#environment `Bad_request [] "Bad request" (text msg) | Failure msg -> error_json cgi#environment `Internal_server_error [] "Method failure" (text msg) | Not_found -> error_json cgi#environment `Not_implemented [] "Not implemented" "The requested operation is not implemented" | exn -> let msg = Printexc.to_string exn in error_json cgi#environment `Internal_server_error [] "Internal server error" ("Unexpected exception: " ^ text msg)
It is not normally necessary to manipulate the cgi_environment
in such a detailed, low-level manner; the cgi_activation
does pretty much the same thing in a easier-to-use way:
let send_json json (cgi : Netcgi.cgi_activation) = cgi#set_header ~content_type:"application/json; charset=utf-8" ~cache:`No_cache (); Yojson.Basic.to_output cgi#out_channel json; cgi#output#commit_work ()
Note well that Yojson doesn't provide a streaming interface: you must build the entire JSON value, which gets serialized in bulk; this makes necessary to configure the HTTP service so that the cgi_activation
s it creates are at least buffered:
let json_service ?dyn_uri handlers = let dyn_translator path = let len = String.length path in let path = if len != 0 && path.[0] = '/' then String.sub path 1 (len - 1) else path in Pcre.replace ~pat:"/" ~templ:"_" path and dyn_handler env cgi = protect (fun cgi -> let h = List.assoc env#cgi_path_translated handlers in send_json (h cgi) cgi) cgi in Nethttpd_services.({ dyn_handler; dyn_activation = std_activation `Std_activation_buffered; dyn_uri; dyn_translator; dyn_accept_all_conditionals = true; })
The dynamic path translator removes the leading slash and turns subsequent slashes into underscores, so that a method in the namespace /cgi
, say can serve as a look-up key in the list of handlers
: a call to /cgi/my/method/name
will turn into a key my_method_name
. This is, of course, a completely arbitrary decision. The dynamic handler in turn looks up the method handler (recall, of type cgi_activation → Yojson.Basic.json
) by this key, calls it with the cgi_activtion
expecting a JSON response and sends it out. Since the handling is protect
ed against exceptions, any missing method, parameter validation error or exceptional condition is sent out as the corresponding HTTP error response.
Speaking of parameter extraction, I don't use anything fancy like parsing combinators, just plain old higher-order functions and regular expressions validating the result of CGI accessor functions:
let with_arg arg f = Io.unwind ~protect:(fun arg -> arg#finalize ()) f arg let get_arg cgi name = try Some (with_arg (cgi#argument name) (fun arg -> arg#value)) with Not_found -> None let parse ?default ~validate ~parse cgi name = match default, get_arg cgi name with | None , None -> argf "Missing parameter \"%s\"" name | Some v, None -> v | _ , Some p -> try parse (Pcre.extract ~rex:validate ~full_match:false p) with Not_found -> argf "Invalid parameter \"%s\"" name
Since CGI arguments can be, if large, buffered into a temporary file, Netcgi
requires explicit finalization. Every error is signaled with an Invalid_argument
exception which protect
catches and translates into a HTTP 400 (Bad Request) via error_json
. Parsing specific argument types is straightforward:
let re_char = Pcre.regexp "^(.)$" let re_bool = Pcre.regexp "^(true|false)$" let re_int = Pcre.regexp "^([-+]?\\d+)$" let re_float = Pcre.regexp "^([-+]?\\d+(?:.\\d*))$" let re_date = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})$" let re_datetime = Pcre.regexp "^(\\d{4})-(\\d{2})-(\\d{2})[ T](\\d{2}):(\\d{2}):(\\d{2})(Z|[-+]\\d{4})$" let parse_char ?default cgi = parse ?default ~validate:re_char ~parse:(fun res -> res.(0).[0]) cgi let parse_bool cgi = parse ~default:false ~validate:re_bool ~parse:(fun res -> bool_of_string res.(0)) cgi let parse_int ?default cgi = parse ?default ~validate:re_int ~parse:(fun res -> int_of_string res.(0)) cgi let parse_float ?default cgi = parse ?default ~validate:re_float ~parse:(fun res -> float_of_string res.(0)) cgi let parse_date ?default cgi = parse ?default ~validate:re_date ~parse:(fun res -> let year = int_of_string res.(0) and month = int_of_string res.(1) and day = int_of_string res.(2) in let dummy = Netdate.create 0. in Netdate.({ dummy with year; month; day; })) cgi let parse_datetime ?default cgi = parse ?default ~validate:re_datetime ~parse:(fun res -> let year = int_of_string res.(0) and month = int_of_string res.(1) and day = int_of_string res.(2) and hour = int_of_string res.(3) and minute = int_of_string res.(4) and second = int_of_string res.(5) and zone = match res.(6) with |"Z" -> 0 | dt -> let hrs = int_of_string (String.sub dt 1 2) and mns = int_of_string (String.sub dt 3 2) in let off = 60 * hrs + mns in if dt.[0] == '+' then off else -off in let dummy = Netdate.create 0. in Netdate.({ dummy with year; month; day; hour; minute; second; zone; })) cgi
Writing the JSON methods: myservice.ml
That is the infrastructure, in broad strokes. I put each method in its own module, following a simple convention:
I define a type
t
of parsed method arguments:type t = { lon : float; lat : float; dt : Netdate.t; jd : Jd.t; tz : int; lim : Jd.t option; }
(in this instance,
Jd.t
is the type of dates represented as Julian dates)I define a
validate
function to parse the CGI arguments into a value of typet
:let validate db cgi = let open Net in let lon = parse_float cgi "lon" and lat = parse_float cgi "lat" and dt = parse_datetime cgi "dt" in let jd = jd_of_netdate dt in let tz = parse_int cgi "tz" ~default:dt.Netdate.zone in let lim = if parse_bool cgi "tab" then Some (Jd.min db.DB.max_date (jd_of_netdate (Net.parse_date cgi "lim"))) else None in if not (-180. <= lon && lon <= 180.) then Net.argf "Longitude out of range" else if not (-66.56 <= lat && lat <= 66.56) then Net.argf "Latitude out of range" else if Jd.compare jd db.DB.min_date < 0 then Net.argf "Date too early" else if Jd.compare jd db.DB.max_date > 0 then Net.argf "Date too late" else { lon = lon /. Trig.radian; lat = lat /. Trig.radian; dt; jd; tz; lim; }
I define and export a
json
function to generate the actual output:let json db cgi = let req = validate db cgi in let tz = req.dt.Netdate.zone / 60 in let tdt = Jd.dynamic_time req.jd in (* … *) `Assoc [ "jd" , `Float t; "dt" , Net.time Jd.(tdt <-> req.jd); "lst" , Net.time (lst /. Jd.secs_day); "lon" , Net.angle ~reduce:false req.lon; "lat" , Net.angle ~reduce:false req.lat; (* … *) ]
(functions
Net.time
andNet.angle
return appropriate JSON values). This exported function goes into the dynamic method map, as seen in theservice_factory
above.
Configuring the Netplex
server: myservice.conf
That is mostly it, code-wise. It remains the detail of configuring Netplex
. I use a simple myservice.conf file:
netplex { controller { max_level = "info"; logging { type = "file"; file = "/var/log/myservice.log"; component = "*"; subchannel = "*"; max_level = "info"; }; }; service { name = "myservice"; protocol { name = "http"; tcp_nodelay = true; address { type = "internet"; bind = "0.0.0.0:8080"; }; }; processor { type = "nethttpd"; timeout = 60.0; timeout_next_request = 6.0; access_log = "enabled"; suppress_broken_pipe = true; host { pref_name = "localhost"; pref_port = 8080; names = "127.0.0.1:0"; uri { path = "/"; service { type = "file"; docroot = "/path/to/static/"; media_types_file = "/etc/mime.types"; default_media_type = "application/xhtml+xml"; enable_gzip = true; enable_listings = false; index_files = "index.html"; }; }; uri { path = "/cgi"; method { allow = "GET"; service { type = "dynamic"; handler = "json_service"; }; }; }; }; }; workload_manager { type = "dynamic"; max_jobs_per_thread = 1; min_free_jobs_capacity = 2; max_free_jobs_capacity = 5; max_threads = 50; }; }; }
Note that the Nethttpd_plex
section declares two URIs: the root path maps to a file
service that will serve the static content, defaulting to XHTML, while the /cgi prefix will map to the dynamic JSON handler. This is useful for development, since it only requires launching myservice -fg and trying it with a Web browser on http://127.0.0.1:8080/. In production I set up Apache with mod_proxy
like this:
Alias /myservice /path/to/static <Directory /path/to/static> Options FollowSymLinks AllowOverride All Order allow,deny Allow from all </Directory> <Location /myservice/> AuthType Digest AuthName "SERVICE" AuthDigestDomain /myservice/ AuthUserFile /etc/httpd/passwd Require valid-user </Location> ProxyPass /myservice/cgi http://127.0.0.1:8080/cgi
(where /path/to/static and /cgi must match what is configured in myservice.conf). Of course you can map your application to the HTTP root, in this case I have a single Apache instance serving various sub-paths.
Compiling: Makefile
It is not necessary to complicate the build process with anything more than a properly written Makefile. In this case I have one interface and one implementation for each JSON method (which you will note don't quite correspond to the dynamic service set-up I've shown first). Note well the list of PACKAGES
required for building:
OCAMLFLAGS = -thread -w @a -unsafe OCAMLOPTFLAGS = $(OCAMLFLAGS) -inline 10000 OCAMLLIBS = unix.cma CFLAGS = -I/opt/ocaml/lib/ocaml -arch x86_64 -O3 -Wall -Wextra PACKAGES = -package threads,pcre,yojson,netplex,netcgi2,nethttpd SRC = \ net.ml \ myservice.ml \ http.ml PROGS = myservice all: $(PROGS) myservice: $(SRC:%.ml=%.cmx) ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -linkpkg $+ -o $@ %.cmi: %.mli ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $< %.cmo: %.ml ocamlfind ocamlc $(OCAMLFLAGS) $(PACKAGES) -c $< %.cmx: %.ml ocamlfind ocamlopt $(OCAMLOPTFLAGS) $(PACKAGES) -c $< %.o: %.c ocamlfind ocamlc -ccopt "$(CFLAGS)" -c $< clean: /bin/rm -rf *.o *.a *.so *.cmi *.cmo *.cmx *~ *.log distclean: clean /bin/rm -rf $(PROGS) depend depend: $(OCAMLDEP) -one-line $(OCAMLLIBS) *.ml *.mli > depend include depend .PHONY: clean distclean all
Conclusion
There are a number of more advanced issues I'd like to address in the future. As it is, this framework can handle simple GET
and POST
requests but won't parse multipart attachments nor handle file transfers. Another thing it doesn't handle is HTTP Authorization; for simple requirements a simple filter can work, while for more complicated set-ups the best way to go is, in my opinion, to leverage Apache as I did here.
For those of you that have to interoperate with SOAP Web Services, the same architectural strategy is perfectly applicable with the aid of PXP and perhaps OC-SOAP.
A big field for further exploration is how to structure a complex Web application into independent services; Netplex
makes that possible, if not necessarily easy. There is a hot architectural trend making some noise now called Command-Query Separation (CQS); this pattern can be profitably implemented with a single Netplex
RPC service that handles all commands to which the Nethttpd_plex
workers delegate. The advantages of this separation are enforced separation of concerns and automatic, transparent fault tolerance and distribution, both of which are the guiding principles behind Ocamlnet
's design.
A closing remark that I don't want to miss on making is that the payoff of using Ocamlnet
's process model is that it is really fast. My "production" server is an ancient 400 MHz PPC with 384 MB RAM which is perfectly capable of producing and serving really computationally-intensive content with minimal environmental requirements. This is something that I simply couldn't hope to pull off with PHP or Java. I encourage you to try Ocamlnet
and see if you find, like I do, that Ocaml is the language of choice of discriminating Web developers.
2 comments:
Can you elaborate on convention over configuration? What convention does Ocsigen impose? They say everywhere that they don't want to impose anything and want to provide the full set of possibilities the technology offers.
Hi Matías,
There was some time that you posted something. Hope you are doing well. A webserver can also be written using Cohttp and run on Mirage (possibly on Cubieboards.
Post a Comment