Algorithm-conscious, cache-oblivious

No matter how credentialed a writer is, he is bound to make a mistake in print sooner or later. Raymond Chen's latest entry was for me oddly synchronous and oddly at variance with my own investigations on Bentley and Sedgewick's Ternary Trees. His apology of the hash table is technically sound, but only on the surface: there is not a single measurement to back up his assertions. Here is my data.

I pitted four different data structures intended as finite maps on strings: the standard Hashtbl, my implementation of Cuckoo Hash, the standard height-balanced Map and my own implementation of ternary trees as Trie Maps. I used Mac OS X 10.5's /usr/share/dict/words file against a Spanish lexicon of 90.000 entries. I tested both successful and unsuccessful searches by looking up every entry in the original file first, and every entry in the alternate file second. In both cases the files were read into memory and then processed with the algorithm benchmarks. I used a bit of generic programming to have the same benchmark code operate on the four data structures, and in each case the mapped domain was unit, treating the maps effectively as sets.

Here are the results for reading the big file and testing against the small file:

Test of 234936-word set against 89546 queries (3.91 % success rate). In ops/s.
Data StructureInsertReadLookup
Trie Map118,932.34770,795.15991,782.69

As you can see, Hashtbl is competitive inserting data, and Cuckoo Hash is smoking fast looking up data. This is not a very fair comparison, as these are imperative data structures, whereas both the standard Map and the Trie Map are purely functional data structures. Even so, the Trie Map is slightly (3.5%) slower in successful searches and substantially (60%) faster in unsuccessful searches than Hashtbl.

Unfortunately the champion apparent, the Cuckoo Hash, has a rather fragile disposition. The Spanish lexicon thoroughly broke its insertion routine.

Aside: I used to view with suspicion the lack of termination analysis for the insertion routine, and the very scarce code floating around implementing Cuckoo Hashes. After seeing here a practical case of non-termination for insertion with a dataset of less than 4,000 keys I consider this algorithm completely unsuitable for use. I have confidence that the universal hash I used is essentially random, as it passes standard tests for this data set, and that I implemented the published algorithm faithfully.

Even so, the Ternary Trie remains the victor in both successful and unsuccessful searches, being 60% faster than Hashtbls (and almost 4 times as fast as height-balanced trees) in the latter:

Test of 89546-word set against 234936 queries (1.49 % success rate). In ops/s.
Data StructureInsertReadLookup
Trie Map108,718.90889,076.241,300,517.50

What is the take-away lesson in this? First, do not let expert advice lull you into complacency. Take the time to experiment and analyze for yourself: it is not only intellectually challenging and rewarding, it might be a store of surprises (and of blogging material) for you.


Jon Harrop said...

This leaves a lot to be desired, IMHO.

Firstly, you've benchmarked only in OCaml which is specifically designed for high performance trees at the cost of imperative performance. Moreover, you're probably paying for functors or polymorphism, which has nothing to do with hash tables.

Secondly, single hash tables are only good for small keys. If your keys are long then you might try a trie of hash tables.

I suspect if you reran your benchmark code in F# you would find that hash tables are several times faster. Moreover, if you compared the results between OCaml and F# you would find the fastest overall solution to be hash tables in F#.

Note that the post you cited referred specifically to the CLR. I doubt the author had contemplated hash table performance in languages where they are penalized.

Unknown said...

@Jon: First of all, I welcome your opinion. I appreciate the candor and respect you show.

I'm aware that Chen discussed the performance of hash tables vis à vis ternary trees in C#, but he cast doubts on the general appropriateness of the latter, in my opinion.

I don't doubt that CLR hash tables are fast. By the discussion of cache effects on performance I wouldn't be surprised to learn that they have low-level support in the runtime. Again, the post implies more for me than is strictly warranted.

I still wish his article had linked to the original benchmark, though.

With respect to the cost of abstraction, it is uniform for all four data structures, a simple rebinding of names to give them a unified functorial interface.

Anonymous said...

Your Cuckoo implementation allows for non-terminating inserts?

Surely your implementation is completely broken?

Are you rehashing to a same-sized data structure after N failures?

Perhaps you should consider rehashing into a large hashtable after a too-long insertion chain.