## 2009-05-26

### Is Immutable Data Slow?

Is immutatble data always slower than mutable state? I was challenged in a discussion over at Reddit about this issue, and I decided to perform a little experiment to back up a rough estimate with some numbers. I was surprised and delighted to see that I could make my case very strong indeed by showing that no, immutable data can be twice as fast as mutable data in a modern functional language.

I concocted a trivial synthetic benchmark using simple records intended to emulate typical business entities:

module P0 = struct
type person = { name: string; age: int; balance: float }
end

module P1 = struct
type person = { name: string; mutable age: int; mutable balance: float }
end

let test0 =
let module E = struct
open P0
let rec test p iters =
if iters == 0 then p.balance else
test { p with balance = p.balance +. 1. } (iters - 1)
end in E.test { P0.name="John Doe"; P0.age=19; P0.balance=100.}

let test1 =
let module E = struct
open P1
let test p iters =
for i = 1 to iters do
p.balance <- p.balance +. 1.
done
end in E.test { P1.name="John Doe"; P1.age=19; P1.balance=100.}

The module P0 encapsulates an immutable business record; test0 is the attendant test that uses functional record updating and recursion in a classicaly pure idiom. The module P1, on the other hand, with its corresponding test test1 show the typical imperative use of a mutable record. The test and timing harness for both functions is simple enough:

let test times f iters =
let best = ref infinity in
for i = 1 to times do
let t = Sys.time () in
let _ = f iters in
let t = Sys.time () -. t in
if !best > t then
best := t
done;
float iters /. !best

let () =
let iters = 1_000_000 in
(test 5 test0 iters);
(test 5 test1 iters)

I tested this code with the OCaml bytecode compiler on a 1.6/2.0 GHz Intel Atom computer with Windows XP, and compiled with both the bytecode and native compilers, without optimization, on a 2.2 GHz Core2 Duo MacBook Pro computer with Mac OS X 10.5.7. The results are as follows:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (Δ%)
1.6 GHz Atom/Win XP/Bytecode3,048,7803,558,71916.73
2.0 GHz Atom/Win XP/Bytecode4,000,0004,566,210 14.16
2.2 GHz Core2 Duo/Mac OS/Bytecode17,750,32420,418,58115.03
2.2 GHz Core2 Duo/Mac OS/Native128,882,58858,400,981-54.69

The speedup afforded by imperative mutation in bytecode is modest, in all cases around 15%. On the other hand, the penalty incurred by mutation under the native compiler is more than double.

Of course, my correspondent was right in that the situation with current object-oriented languages is very different. I translated the benchmark straightforwardly into Java, and the result is what "common sense" would let us expect: Java is heavily oriented to mutation, or rather, it penalizes immutability quite heavily. I tried both client and server VMs, under the Windows environment detailed above:

SystemImmutable (upd/s)Mutable (upd/s)Speedup (×)
1.6 GHz Atom/Win XP/Client22,831,050107,526,8824.71
1.6 GHz Atom/Win XP/Server37,735,849322,580,6458.55

The difference is dramatic: mutation is between 4.7 and 8.5 times faster than functional update in Java, and even twice as fast than native OCaml code on a faster machine. Moral of the story: test your assumptions with experiments. The benchmark code is:

public final class MutTest {
private static final class Person0 {
public final String name;
public final int age;
public final double balance;

public Person0(String name, int age, double balance) {
this.name = name;
this.age = age;
this.balance = balance;
}

public Person0 deposit(double amount) {
return new Person0(this.name, this.age, this.balance + amount);
}
}

private static final class Person1 {
public String name;
public int age;
public double balance;

public Person1(String name, int age, double balance) {
this.name = name;
this.age = age;
this.balance = balance;
}

public void deposit(double amount) {
this.balance += amount;
}
}

private interface Test {
double test(int iters);
}

private static final Test TEST0 = new Test() {
public double test(int iters) {
Person0 person = new Person0("John Doe", 19, 100.0);
for (int i = 0; i != iters; i++) {
person = person.deposit(1.0);
}
return person.balance;
}
};

private static final Test TEST1 = new Test() {
public double test(int iters) {
Person1 person = new Person1("John Doe", 19, 100.0);
for (int i = 0; i != iters; i++) {
person.deposit(1.0);
}
return person.balance;
}
};

private static double test(int times, Test test, int iters) {
long best = Long.MAX_VALUE;
double balance;
for (int i = 0; i != times; i++) {
long now = System.currentTimeMillis();
balance = test.test(iters);
now = System.currentTimeMillis() - now;
if (best > now)
best = now;
}
return (double) iters / ((double) best / 1000.0);
}

public static void main(String[] args) {
final int iters = 10000000;
System.out.printf("Immutable record: %f updates/s\n", test(5, TEST0, iters));
System.out.printf("Mutable record:  %f updates/s\n", test(5, TEST1, iters));
}
}

brab said...

Hello,

I'm the person who collects and write the Caml Weekly News. For the last few weeks I've been including links to recent posts on the Planet Ocamlcore aggregator, which includes links to your posts. Unfortunately the links you provide to ocamlcore seem to be to the comment feed of the article and not to the article itself. Maybe you may want.

And thanks for your very interesting posts!

Alan

Matías Giovannini said...

@brab:

Alan, I don't think I have a choice of feed production with Blogger, not that I've found. In any case, I receive the Planet Ocamlcore feed and I see the post correctly in Thunderbird. I've had problems in the past reading the feed with the Mac OS Mail reader, maybe Blogger is putting out different feed versions with different content? In any case, if you have recommendations I'd be happy to try them out. You can reach me at "mgiovann" AT "gmail dot com".

brab said...

Thanks for your reply. I've been digging a bit more into it, and it seems that Blogger gives many links with its RSS entries, and the (stupid) program that I use to get the correct link out of it does not work. I'm sure the problem is on my side, I'll look into it a bit further.

Thanks again,

Alan

Ketan said...

It's not accurate to say that Java penalizes immutability if Java's immutability is the fastest immutable implementation you've seen. At this point you have more data to support that the performance disparity is due to the idiom rather than the programming language.

Matías Giovannini said...

@Ketan: First, I cannot meaningfully compare Java and OCaml bytecode, except to say "wow, look at how faster it is at this!" and leave it at that. I should not have written "twice as fast" because that invites all sort of odious judgments not always warranted. So in the sense that I compare mutability against immutability in the same language/platform combination and not across them (except to assure that the methodology is not wildly unsound), I do stand by my conclusion that mutability is better supported, in terms of speed and style, in Java.

Second, there is anecdotal evidence that I would have seen a disparity in favor of mutation in OCaml too, if I had benchmarked mutation to the integer field instead of to the float one. More analysis is warranted, but boxing and unboxing of floats seems to be dominating the time taken to mutate that field.

SpiceGuid said...

Immutability is more a space consumer than a time consumer.

Immutability is potentially memory hungry and/or stresses the GC.
Examples include :
* immutable collections used in a volatile (non-persistant) way
* strict collections used in a lazy way (lists used as streams)

My policy is i prefer immutability yet i usually switch to mutability rather than resort to lazyness. Whether we like it or not, OCaml is biased towards mutability, full featured immutability requires lazyness.

Bart czernicki said...

Why don't you compare mutability vs immutability on a "base" framework like .NET?

Use C# for the mutable types (you can create immutable ones as well) and use F# for the defacto immutable test.

Ketan said...

That's not what I meant. Suppose you'd been comparing math using integers to math using floating point numbers, and gotten similar results. Would you say that Java penalizes floating point math? That's a property of the machine, or rather, just a fact of life, since floating point math is fundamentally more complex.

At this point, you don't have the data to say what you said. You could just as easily say OCaml penalizes mutable data, or, less judgmentally, that it is less well-optimized for that use case. With the data you have you're only comparing implementations, not idioms, so you can only say that someone who is writing OCaml should consider the immutable idioms because the performance is relatively good.

Markus Kohler said...

great post!
I would be interesting (at least for java), what the memory usage/memory allocation rates are. Running the test in parallel Threads could also be very interesting.

Matías Giovannini said...

@Ketan: you're absolutely right, and the discussion over at Reddit made it abundantly clear to me. I need to follow up on this with all that has been said here and there, but I'll wait a bit.

If you object to my use of "penalize" as implying "active harm", I agree that it was a poor choice of word. At this point, it is clear that there are better idioms in each language.

William Louth said...

[Markus] "I would be interesting (at least for java), what the memory usage/memory allocation rates are"

This is pretty simple with OpenCore extensible metering system which already has most of the counters you need to actually understand what is really being measure though to be honest I could determine this myself from a quick scan of the code.

Running the tests with our "clock.time" meter I have the mutable version 15x faster.

Using our "cpu.time" meter the time for the mutable is approximately its "clock.time" result but for the immutable version I have have a 20-30% discrepancy. Its easy to guess what is causing this but I turned on our "gc.time" meter to confirm it. GC only happens in the immutable case (nothing new or unexpected here).

The "alloc.count" shows 1 object is created for the mutable test and X number of iterations for the mutable. The alloc size is 32bytes for both person objects. But of course the number of objects is not comparable.

Our "field.write.count" shows that 3 fields per X number of iteration for the immutable test and only 1 field in the mutable test.

Our "field.read.count" has the same results as the "field.write.meter".

Using our "execution.count" meter there are 2 methods called per immutable iteration compared to only 1 per mutable iteration. Again very easy to discern in this same code example this is much more useful when you have the typical (horrendous) Java class with various call/path branches that are state dependent.

So what exactly are we measuring. The speed (slowness) of method executions, field reads/writes, object allocations, gc behavior,.....

Any further analysis must actually try to address what factors do in fact make the biggest contribution to the additional overhead and weight them accordingly.

William