2009-05-26

Is Immutable Data Slow?

Is immutatble data always slower than mutable state? I was challenged in a discussion over at Reddit about this issue, and I decided to perform a little experiment to back up a rough estimate with some numbers. I was surprised and delighted to see that I could make my case very strong indeed by showing that no, immutable data can be twice as fast as mutable data in a modern functional language.

I concocted a trivial synthetic benchmark using simple records intended to emulate typical business entities:

module P0 = struct
  type person = { name: string; age: int; balance: float }
end

module P1 = struct
  type person = { name: string; mutable age: int; mutable balance: float }
end

let test0 =
  let module E = struct
    open P0
    let rec test p iters =
      if iters == 0 then p.balance else
      test { p with balance = p.balance +. 1. } (iters - 1)
  end in E.test { P0.name="John Doe"; P0.age=19; P0.balance=100.}

let test1 =
  let module E = struct
    open P1
    let test p iters =
      for i = 1 to iters do
        p.balance <- p.balance +. 1.
      done
  end in E.test { P1.name="John Doe"; P1.age=19; P1.balance=100.}

The module P0 encapsulates an immutable business record; test0 is the attendant test that uses functional record updating and recursion in a classicaly pure idiom. The module P1, on the other hand, with its corresponding test test1 show the typical imperative use of a mutable record. The test and timing harness for both functions is simple enough:

let test times f iters =
  let best = ref infinity in
  for i = 1 to times do
    let t = Sys.time () in
    let _ = f iters in
    let t = Sys.time () -. t in
    if !best > t then
      best := t
  done;
  float iters /. !best

let () =
  let iters = 1_000_000 in
  Printf.printf "Immutable record: %f updates/s\n"
    (test 5 test0 iters);
  Printf.printf "Mutable record:   %f updates/s\n"
    (test 5 test1 iters)

I tested this code with the OCaml bytecode compiler on a 1.6/2.0 GHz Intel Atom computer with Windows XP, and compiled with both the bytecode and native compilers, without optimization, on a 2.2 GHz Core2 Duo MacBook Pro computer with Mac OS X 10.5.7. The results are as follows:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (Δ%)
1.6 GHz Atom/Win XP/Bytecode3,048,7803,558,71916.73
2.0 GHz Atom/Win XP/Bytecode4,000,0004,566,210 14.16
2.2 GHz Core2 Duo/Mac OS/Bytecode17,750,32420,418,58115.03
2.2 GHz Core2 Duo/Mac OS/Native128,882,58858,400,981-54.69

The speedup afforded by imperative mutation in bytecode is modest, in all cases around 15%. On the other hand, the penalty incurred by mutation under the native compiler is more than double.

Of course, my correspondent was right in that the situation with current object-oriented languages is very different. I translated the benchmark straightforwardly into Java, and the result is what "common sense" would let us expect: Java is heavily oriented to mutation, or rather, it penalizes immutability quite heavily. I tried both client and server VMs, under the Windows environment detailed above:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (×)
1.6 GHz Atom/Win XP/Client22,831,050107,526,8824.71
1.6 GHz Atom/Win XP/Server37,735,849322,580,6458.55

The difference is dramatic: mutation is between 4.7 and 8.5 times faster than functional update in Java, and even twice as fast than native OCaml code on a faster machine. Moral of the story: test your assumptions with experiments. The benchmark code is:

public final class MutTest {
  private static final class Person0 {
    public final String name;
    public final int age;
    public final double balance;
    
    public Person0(String name, int age, double balance) {
      this.name = name;
      this.age = age;
      this.balance = balance;
    }
    
    public Person0 deposit(double amount) {
      return new Person0(this.name, this.age, this.balance + amount);
    }
  }
  
  private static final class Person1 {
    public String name;
    public int age;
    public double balance;
    
    public Person1(String name, int age, double balance) {
      this.name = name;
      this.age = age;
      this.balance = balance;
    }
    
    public void deposit(double amount) {
      this.balance += amount;
    }  
  }
  
  private interface Test {
    double test(int iters);
  }

  private static final Test TEST0 = new Test() {
    public double test(int iters) {
      Person0 person = new Person0("John Doe", 19, 100.0);
      for (int i = 0; i != iters; i++) {
        person = person.deposit(1.0);
      }
      return person.balance;
    }
  };
  
  private static final Test TEST1 = new Test() {
    public double test(int iters) {
      Person1 person = new Person1("John Doe", 19, 100.0);
      for (int i = 0; i != iters; i++) {
        person.deposit(1.0);
      }
      return person.balance;
    }
  };
  
  private static double test(int times, Test test, int iters) {
    long best = Long.MAX_VALUE;
    double balance;
    for (int i = 0; i != times; i++) {
      long now = System.currentTimeMillis();
      balance = test.test(iters);
      now = System.currentTimeMillis() - now;
      if (best > now)
        best = now;
    }
    return (double) iters / ((double) best / 1000.0);
  }

  public static void main(String[] args) {
    final int iters = 10000000;
    System.out.printf("Immutable record: %f updates/s\n", test(5, TEST0, iters));
    System.out.printf("Mutable record:  %f updates/s\n", test(5, TEST1, iters));
  }
}

2009-05-14

A Polymorphic Question

Chris Conway asked a question that I replied, unwittingly, with a reference to a thread he started on the OCaml mailing list exactly two years ago: what is the difference between types 'a . int -> 'a vec -> 'a and int -> 'a vec -> 'a?

Strictly speaking, the latter is, in isolation, ill formed: the type variable 'a is unbounded! The convention is to implicitly universally quantify all unbound type variables in the type. This is the so-called prenex restriction:

The most popular of these is the let-polymorphism of ML […] which is sometimes called prenex polymorphism because it can be viewed as a fragment of System F in which type variables range only over quantifier-free types (monotypes) and in which quantified types (polytypes, or type schemes) are not allowed to appear on the left-hand side of arrows.

Pierce, Benjamin C. Types and Programming Languages, p. 356.

Suppose the latter type were a complete type abbreviation, to avoid dealing with the label:

type 'a t = int -> 'a vec -> 'a 

which as a TaPL-style type (a "tree" type) would be equivalent to a definition

∀X.(T = int→Vec(X)→X)

You cannot directly use such a definition as is; the only thing you can do is to instantiate it with ("apply it" to) some type before expanding the definition, for instance:

   (∀X.(T = int→Vec(X)→X))[X:=U]
→β
   T = int→Vec(U)→U

In other words, there is no way to expand the right-hand side of T while keeping the variable X uninstantiated.

The first type, on the other hand, written as a type abbreviation would be:

type t = 'a . int -> 'a vec -> 'a 

which as a tree type would be equivalent to the definition:

T = ∀X.int→Vec(X)→X

Now with this equation you can immediately replace T by its right-hand side, which would introduce a universal quantification as a subterm of the type being substituted into.

Seen in another way, and to bring it closer to the actual examples being discussed, labels are functions from records to the corresponding type. For instance:

type 'a t = { index : int -> 'a vec -> 'a }

introduces the function index: ∀X.T→int→Vec(X)→X (note the position of the quantification). This is exactly the first definition I wrote, but with definitional equality replaced by the functional arrow. Now a polymorphic record type

type t = { index: 'a . int -> 'a vec -> 'a }

introduces a different function index: T→∀X.(int→Vec(X)→X), which corresponds term by term to the second "tree" definition.

To see it in a whole different way, game semantics help in understanding the differences between linear formulas. In game theoretic terms, the type ∀X.T→int→Vec(X)→X corresponds to the game: "You get to choose and fix an X whatsoever. I give you a function. You give me a T, and I'll give you another function. You give me an int and I'll give you yet another function. You give me a Vec of the Xs you yourself chose, and I'll give you a certain X."

The second type T→∀X.(int→Vec(X)→X) corresponds to a different game: "You give me a T, and I give you back a thing, with the proviso that I reserve the right to choose and fix an X, which if given to you would result in a function." Only then you can give me back an int and so on. This is what makes X opaque to you.

I hope this helps.

2009-05-13

Okasaki's Random Access Lists

To close the chapter on Okasaki's Random Access Lists, I must record that there is an errata on the bottom of page 145 of his Purely Functional Data Structures. The argument to the recursive call to update it must read Zero (update (i div 2, p, ps)) end. It took me a while to figure it out, thinking that I had made a mistake.

For completeness, here's the translation for the chapter's pseudo-SML into functional, typesafe OCaml:

module Vec : sig
  type 'a t
  val nil : 'a t
  val cons : 'a -> 'a t -> 'a t
  val head : 'a t -> 'a
  val tail : 'a t -> 'a t
  val length : 'a t -> int
  val index : int -> 'a t -> 'a
  val update : int -> 'a -> 'a t -> 'a t
end = struct
  type 'a t = Nil | Zero of ('a * 'a) t | One  of 'a * ('a * 'a) t

  let nil = Nil

  type cons = { cons : 'a . 'a -> 'a t -> 'a t }
  let cons =
    let rec cons = { cons = fun x l -> match l with
    | Nil         -> One (x, Nil)
    | Zero    ps  -> One (x, ps)
    | One (y, ps) -> Zero (cons.cons (x, y) ps)
    } in cons.cons

  type uncons = { uncons : 'a . 'a t -> 'a * 'a t }
  let uncons =
    let rec uncons = { uncons = function
    | Nil          -> failwith "uncons"
    | One (x, Nil) -> (x, Nil)
    | One (x, ps ) -> (x, Zero ps)
    | Zero    ps   ->
      let ((x, y), ps) = uncons.uncons ps in (x, One (y, ps))
    } in uncons.uncons

  let head l = fst (uncons l)
  and tail l = snd (uncons l)

  type length = { length : 'a . 'a t -> int }
  let length v =
    let rec length = { length = function
    | Nil         -> 0
    | One (_, ps) -> 1 + length.length (Zero ps)
    | Zero    ps  -> 2 * length.length ps
    } in length.length v

  type index = { index : 'a . int -> 'a t -> 'a }
  let index n =
    let rec index = { index = fun n l -> match l with
    | Nil                    -> failwith "index"
    | One (x, ps) when n = 0 -> x
    | One (x, ps)            -> index.index (n - 1) (Zero ps)
    | Zero    ps             ->
      let (l, r) = index.index (n / 2) ps in
      if n mod 2 = 0 then l else r
    } in index.index n

  type update = { update : 'a . int -> ('a -> 'a) -> 'a t -> 'a t }
  let update n e =
    let rec update = { update = fun n f l -> match l with
    | Nil                    -> failwith "update"
    | One (x, ps) when n = 0 -> One (f x, ps)
    | One (x, ps)            -> cons x (update.update (pred n) f (Zero ps))
    | Zero ps                ->
      let g (x, y) = if n mod 2 = 0 then (f x, y) else (x, f y) in
      Zero (update.update (n / 2) g ps)
    } in update.update n (fun _ -> e)
end

Enjoy!

2009-05-12

Rank-2 polymorphism and non-uniform types, revisited

I've committed a number of errors and inaccuracies in my last post that warrant clarification. The most egregious and urgent is that the definition I gave for length is utterly wrong: the type encodes the length from the inside out, so that its "most significant bit" is the inner constructor. The recursion must take the form of a right fold, not of a left fold as I wrote. The correct function is then:

type length = { length : 'a . 'a vec -> int }

let length v =
  let rec length = { length = function
    Nil         -> 0
  | One (_, ps) -> 1 + length.length (Zero ps)
  | Zero    ps  -> 2 * length.length ps
  } in length.length v

Second, the option -rectypes is not necessary in OCaml 3.11.0 in order to use non-uniform recursion with this technique. The language permits non-uniform type definitions just fine. But, as Okasaki remarks, "[a]lthough Standard ML allows the definition of non-uniform recursive datatypes, the type system disallows most of the interesting functions on such types." (Okasaki, Chris. Purely Functional Data Structures, p. 143). What -rectype does is to disable the so-called occurs check in the typechecking algorithm that forces an error if a type is found to be defined in terms of itself when reconstructing the type of an expression via unification. This is why the types resulting in the compilation of a function using non-uniform recursion have the form E[α] as α, for a type expression E.

Third, I spoke of the type of a non-uniform parameter as being existential, which is technically true in this particular case but not the usual characterization of such a type. The way in which the type t→∀α.E[α] is more commonly described is as rank-2:

A type is said to be of rank 2 if no path from its root to a ∀ quantifier passes to the left of 2 or more arrows, when the type is drawn as a tree. For example, (∀X.X→X)→Nat is of rank 2, as are Nat→Nat and Nat→(∀X.X→X)→Nat→Nat, but ((∀X.X→X)→Nat)→Nat is not.

(Pierce, Benjamin C. Types and Programming Languages, p. 359)

A type definition type α t = … is a function from α to the declared type, so it counts as an arrow. Furthermore, there is an encoding of existential types in terms of rank-2 types that for this application suffices. This is the source of my confusion. Abstract signatures implement existential types in OCaml, which is why I said that this technique can be applied by means of a module. Also, OCaml allows rank-2 polymorphism in exactly two places: in record type declarations and in class declarations. So I missed a third way to encode non-uniform recursion in OCaml, namely via objects.

In a comment to the last post, bluestorm shows examples of the three encodings, using local module definitions to tidy up the global namespace of types and record labels. I want to compare the efficiency of the three approaches. The following function serves as a simple timing loop:

let test lap f e =
  let now = Sys.time () in
  let rec go cnt =
    let dt = Sys.time () -. now in
    if dt > lap then dt /. float cnt else
    (ignore (f e); go (succ cnt))
  in go 0

(the toplevel in my machine has an overhead of 4.9µs to execute the empty loop). Note that the result of the function is thrown away; fortunately, OCaml is a strict language. The test would measure the time it takes for different versions of index to find an element. This will require some test data. The simplest way to build a long list is with range:

let range m n =
  let rec go l i =
    if i < m then l else
    go (cons i l) (pred i)
  in go Nil n

Now everything works as expected:

# length (range 1 500_000) ;;
- : int = 500000

The first version of index uses records to encode the rank-2 type:

let index1 n v =
  let module Index = struct
    type index = { index : 'a . int -> 'a vec -> 'a }
    let rec index = { index = fun n l -> match l with
      Nil                    -> failwith "index"
    | One (x, ps) when n = 0 -> x
    | One (x, ps)            -> index.index (n - 1) (Zero ps)
    | Zero    ps             ->
      let (l, r) = index.index (n / 2) ps in
      if n mod 2 = 0 then l else r
    }
  end in Index.index.Index.index n v

(Malkovich Malkovich Malkovich). This is what I wrote in the last post, but now using a local module to encapsulate the type definitions, as per bluestorm's suggestion. The second version uses an object to encode the rank-2 type:

let index2 n v =
  let index = object (self)
    method index : 'a . int -> 'a vec -> 'a =
      fun n l -> match l with
        Nil                    -> failwith "index"
      | One (x, ps) when n = 0 -> x
      | One (x, ps)            -> self#index (n - 1) (Zero ps)
      | Zero    ps             ->
        let (l, r) = self#index (n / 2) ps in
        if n mod 2 = 0 then l else r
  end in index#index n v

Objects in OCaml are always self-recursive. The third version uses a recursive module to encode the non-uniform recursion by forcing the type of the second argument to remain generic:

let index3 n v =
  let module M = struct
    module rec Index : sig
      val index : int -> 'a vec -> 'a
    end = struct
      let index n l = match l with
        Nil                    -> failwith "index"
      | One (x, ps) when n = 0 -> x
      | One (x, ps)            -> Index.index (n - 1) (Zero ps)
      | Zero    ps             ->
        let (l, r) = Index.index (n / 2) ps in
        if n mod 2 = 0 then l else r
    end
  end in M.Index.index n v

(alas, local modules cannot be recursive). Now, in order to eliminate interference by the garbage collector, I predeclare my test data:

let data = range 1 1_000_000 ;;

I test each function for two seconds:

# test 2. (index1 999_999) data ;;
- : float = 1.45106870782848256e-05
# test 2. (index2 999_999) data ;;
- : float = 1.60240840297084448e-05
# test 2. (index3 999_999) data ;;
- : float = 1.91896990107750189e-05

The encoding using an object is 10% slower than the one using a record, which is to be expected as the open recursion is performed through a method table. On the other hand, the encoding using a recursive module is almost 25% slower than the one using a record, which is completely unexpected as I would believe the call to be statically resolved. Note also that there is no need to use a local module if the code is in a module bound by an abstract signature. All in all, this shows that these lists are extremely fast: It takes less than 15µs to find the last element in the list.

2009-05-10

Polymorphic Recursion

Re-reading for the n-th time Chris Okasaki's From Fast Exponentiation to Square Matrices: An Adventure in Types, I tried my hand at seeing how does the latest OCaml version cope with non-regular types and the consequent polymorphic recursion. I first scoped the terrain and Googled what others had to say about this topic. I found a ten-year old OCaml Mailing List post where Pierre Weiss mentions work by Pessaux and Leroy that ended in the inclusion of support for recursive types. So I fired OCaml with the not-always-useful option -rectypes.

That post gives uses as a running example the type of non-regular lists, or functional vectors, which Okasaki calls "random-access lists" (Okasaki, Chris. Purely Functional Data Structures, p. 143 and ss.). The type in question encodes in constructors the length of the vector as a binary integer, with a recursive type argument:

type 'a vec =
  Nil
| Zero of ('a * 'a) vec
| One  of 'a * ('a * 'a) vec

(I depart from the names given in the above reference). This permits indexing in logarithmic, as opposed to linear, time. For instance, the vector [1; 2; 3; 4; 5] is represented as:

One (1, Zero (One (((2, 3), (4, 5)), Nil)))

The problem is with the function cons:

(* Doesn't work *)
let rec cons x = function
  Nil         -> One (x, Nil)
| Zero    ps  -> One (x, ps)
| One (y, ps) -> Zero (cons (x, y) ps)

Consing an element to a One (x, xs) list must "carry over" through a recursive call to cons whose argument is not α but α×α. The typechecker unifies both types in the recursive type α×α as α, whose only value is:

let rec omega = (omega, omega)

Such that cons omega Nil works, but the more useful cons 1 Nil doesn't. The correct type declaration for cons looks like:

val cons : 'a -> 'a vec -> 'a vec

and such a declaration would be required in Haskell, since polymorphic recursion is undecidable. It is, however, not quite right: the parameter is not universally quantified, since that would mean that the recursion must have the same type as the argument. The type variable should be, instead, existentially quantified.

Existential types are also undecidable and so need explicit declarations. OCaml has two ways to introduce existential types: abstract types in modules, and records. The second alternative is more lightweight:

type cons = { cons : 'a . 'a -> 'a vec -> 'a vec }

(I abuse the different namespaces in OCaml punning types and values). The parameter α is existential since it is "to the right" of a type arrow. Now I can write cons through such a record:

let cons =
  let rec f = { cons = fun x l -> match l with
    Nil         -> One (x, Nil)
  | Zero    ps  -> One (x, ps)
  | One (y, ps) -> Zero (f.cons (x, y) ps)
  } in f.cons

Voilà! Now I can write:

# cons 1 (cons 2 (cons 3 (cons 4 (cons 5 Nil)))) ;;
- : int vec = One (1, Zero (One (((2, 3), (4, 5)), Nil)))

As another example, the length of such a vector can be found in logarithmic time:

type length = { length : 'a . int -> 'a vec -> int }

let length v =
  let rec f = { length = fun n l -> match l with
    Nil         -> n
  | Zero    ps  -> f.length (    2 * n) ps
  | One (_, ps) -> f.length (1 + 2 * n) ps
  } in f.length 0 v

For a more challenging, and useful, example, indexing on a vector must take into account the position and the length of the current vector. I follow directly Okasaki's implementation (op. cit. p. 145):

type index = { index : 'a . int -> 'a vec -> 'a }

let index n v =
  let rec f = { index = fun n l -> match l with
    Nil         -> failwith "index"
  | One (x, ps) -> if n = 0 then x else f.index (n - 1) (Zero ps)
  | Zero    ps  ->
    let (l, r) = f.index (n / 2) ps in
    if n mod 2 = 0 then l else r
  } in f.index n v

Update: Thank you to SpiceGuid for pointing out the error that prevented index from typechecking. Also, be sure to read bluestorm's comment for alternatives to achieve the same effect.