## 2012-08-02

### A Helping Phantom Hand

You don't have to be writing an interpreter or some other kind of abstract code to profit from some phantom types. Suppose you have two or more functions that work by "cooking" a simple value (a `float`, say) with a lengthy computation before proceeding:

```let sun_geometric_longitude j =
let je = to_jcen (to_jde j) in
(* … computation with je … *)

let sun_apparent_longitude j =
let je = to_jcen (to_jde j) in
let q  = sun_geometric_longitude j in
(* … computation with je … *)
```

In this case j is a date expressed in Julian Days as a `float`, and `to_jde` computes the Ephemeris Time as a 63-term trigonometric polynomial correction on it. `sun_apparent_longitude` calls `sun_geometric_longitude` and both call `to_jde`. Obviously this unnecessary duplication can be factored out:

```let sun_geometric_longitude je =
let t  = to_jcen je in
(* … computation with je … *)

let sun_apparent_longitude je =
let q  = sun_geometric_longitude je in
let t  = to_jcen je in
(* … computation with je … *)

let foo j =
let je = to_jde j in
let l  = sun_apparent_longitude je in
(* … *)
```

(`to_jcen` is cheap and not worth factoring out.) But now a naked `float` represent two different things, Universal Time and Ephemeris Time, and we have a valid concern of mixing them up. We can wrap the time in an ADT:

```type dt = UT of float | TD of float

let to_jcen j = match j with
| UT j ->
(* … lengthy computation … *)
TD j
| TD _ -> invalid_arg "to_jcen"

let sun_geometric_longitude je = match je with
| TD je ->
let t  = to_jcen je in
(* … computation with je … *)
| UT _  -> invalid_arg "sun_geometric_longitude"

let sun_apparent_longitude je = match je with
| TD je ->
let q  = sun_geometric_longitude je in
let t  = to_jcen je in
(* … computation with je … *)
| UT _  -> invalid_arg "sun_apparent_longitude"

let foo j =
let je = to_jde j in
(* … computation with sun_apparent_longitude je … *)
```

but this forces us to check at run-time whether we mixed times up in our code. A better technique is to use a phantom type. First fix an abstract signature for the module implementing these functions:

```module Test : sig
type 'a dt

val datetime : yy:int -> mm:int -> dd:int -> hh:int -> nn:int -> ss:float -> [ `JD ] dt
val to_jde   : [ `JD ] dt -> [ `JDE ] dt
val to_jcen  : 'a dt -> float

val sun_geometric_longitude : [ `JDE ] dt -> float
val sun_apparent_longitude  : [` JDE ] dt -> float
end = struct
(* … *)
```

We have a way to construct our type `α dt` from a Universal `datetime`, a way to convert it to Dynamical Time with `to_jde` and operations that respect the kind of measure. The implementation is as before:

```  (* … *)
type 'a dt = float (* phantom type! *)

let datetime ~yy ~mm ~dd ~hh ~nn ~ss = (* … *)

let to_jde  j = (* … *)
let to_jcen j = (* … *)

let sun_geometric_longitude je =
(* … computation with a statically checked je … *)

let sun_apparent_longitude je =
let q  = sun_geometric_longitude je in
(* … computation with a statically checked je … *)
end
```

Now the compiler checks for us that we don't mix up measures. The only inconvenient of this approach is that the type `α dt` is fully abstract, and you must provide coercions, `string_of`s and pretty printers for it if you need to show them or debug your code. There is a way out, though; just make it a `private` type abbreviation:

```module Test : sig
type 'a dt = private float
(* … signature exactly as before … *)
end = struct
type 'a dt = float (* phantom type! *)
(* … implementation exactly as before … *)
end
```

Now `α dt` will be shown in the top-level, can be printed with a coercion `(je :> float)`, etc.

For another simple example, suppose you want to represent sets as `list`s. The best way to do that is to keep them sorted so that set operations run in linear time. If you want to provide some operations that temporarily destroy the ordering, a phantom type can keep track of the invariant "this `list` is sorted":

```module Set : sig
type ('a, 'b) t = private 'a list

val of_list : 'a list -> ('a, [ `S ]) t
val as_set  : ('a, [ `U ]) t -> ('a, [ `S ]) t
val empty   : ('a, [ `S ]) t
val mem     : 'a -> ('a, [ `S ]) t -> bool
val add     : 'a -> ('a, [ `S ]) t -> ('a, [ `S ]) t
val union   : ('a, [ `S ]) t -> ('a, [ `S ]) t -> ('a, [ `S ]) t
val inter   : ('a, [ `S ]) t -> ('a, [ `S ]) t -> ('a, [ `S ]) t
val append  : ('a, 'b) t -> ('a, 'c) t -> ('a, [ `U ]) t
end = struct
type ('a, 'b) t = 'a list

let of_list l   = List.sort compare l
and as_set  l   = List.sort compare l
and empty       = []
let union   l r = (* … linear merge … *)
and inter   l r = (* … linear merge … *)
let mem     e   = List.mem e
and add     e   = union [e]
and append      = List.append
end
```

The phantom type `[ `S | `U ]` tracks the sortedness of the `list`. Note that in the case of `append` the argument lists can have any ordering but the result is known to be unsorted. Note also how the fact that the empty `list` is by definition sorted is directly reflected in the type.

Pierre-Alexandre said...

I'm not alone !

I'm writing a 10k sloc commercial application on top on OcamlNet, which get and send JSon data, using Atdgen to serialize/deserialize.

OcamlNet isn't simple, it's tree, but it'as great framework which scale transparently. By using a Memcache for global memory, my app can scale from 100 request to 1 million easily.

Matías Giovannini said...

Pierre-Alexandre, methinks you commented in the wrong entry… maybe you're commenting to this one?