My last post generated a ton of very interesting discussion. I'd like to summarize it here to bring the topic to closure.
In view of the excellent analysis performed by Mauricio Fernández , Ketan's comment:
At this point, you don't have the data to say what you said. You could just as easily say OCaml penalizes mutable data, or, less judgmentally, that it is less well-optimized for that use case.
is entirely justified. His first observation that [I have] more data to support that the performance disparity is due to the idiom rather than the programming language"
was as misunderstood by me as it was insightful.
Mauricio did an in-depth analysis of the disparity and reported about it first on reddit, then on his own blog (which I cannot recommend highly enough). His follow-up is a bit more in full than his first analysis, but I'll paraphrase it anyway for completeness's sake. OCaml's garbage collector is of the generational variety. This means that it works under the generational hypothesis: the most recently created objects are also those most likely to become unreachable quickly. In order to exploit this fact, it divides the heap in at least two regions: the young space with newly created objects that mostly point to themselves and to older objects, and the old space that comprise objects that survived a number of collection phases. Any time an old object points to a young object it must be added to the remembered set, a list of roots that allow the collector to quickly decide what's reachable without scanning the entire old space.
Mauricio shows that my "synthetic business entity update benchmark" violates the generation hypothesis: since float
s are in general boxed, updating a mutable float
field generates a boxed cell with the new value. After enough allocations, the fixed, mutable record is promoted to the old space, from which it points to the young space containing the box. Every mutation then adds to the remembered set via the caml_modify
write barrier. His post shows ways to ameliorate this penalty by exploiting the conditions under which OCaml unboxes floats, in particular by using records comprised entirely of floats
, mutable or immutable (in this case, just one suffices). He followed up with an in-depth analysis of the write barrier and a simple optimization that gains up to 50% the speed of synthetic benchmarks, and an excellent 32% on this particular benchmark.
It is hoped that the OCaml team continues improving the performance of the garbage collector (indeed, some people are asking very vocally for different kinds of overhaul to it). Meanwhile, instead of recommending workarounds or speed-ups, I'd say "go with the idioms best supported by the language". In this case, indeed, it pays off to use immutable records in OCaml. This shouldn't be extrapolated to a general methodology applicable to other programming languages.
No comments:
Post a Comment