I want to write a birthday attack program in Haskell for a variant of SHA1 which only produces only a 50 bit hash. To do this I need a hash table capable of storing approx. 2^25 entries.
The keys in this map will be Int64
and the values will be short length strings (~ 16 bytes).
Any suggestions for which hash implementation to use?
(Disregard that last update - I would need a bit array of 2^50 elements.)
For 2^25 entries at 8 bytes a piece, you are looking at something like 768MB of storage for just the data, at most probably around 3 gigabytes with actual overhead for storing bytestrings -- guesstimating 80 bytes per bytestring, then you have the hashtable/map's internals to store, and the boxing for the key, etc.
This means you can store the entire thing resident in memory on a decent machine, which keeps the problem relatively sane, but your collection times will kind of suck.
I would recommend using a lot of smaller hash tables, by partitioning your keyspace, that way you can run lots of the updates in parallel regardless of the hash table you use.
As for implementation:
You can either wrap a bunch of immutable hash tables like the wide-fanout ones from unordered-containers in IORefs and use some kind of atomicModifyIORef or something like ryan newton's compare and swap primitive, or you can try to use the old Data.HashTable implementation in a straightforward imperative manner.
The latter will improve your asymptotics by a logarithmic factor over the hash-array mapped tries used by unordered-containers, but Data.HashTable has bad constants. At the scale of your problem these factors probably cancel out though.
I also posted same sort of question. And from some suggestion, I am using Kyoto Cabinet. It is pretty cool and gives nice performance also. You can check my posts also because I have similar issues. EX. one, two and three. Perhaps this may helpful.