Have you considered or tested using either closed hashing or linear array lookups as a replacement for you linked list open hashing implementation. Years ago I significantly improved the speed of a color quantization operation that several other engineers had already optimized by replacing it with a simpler closed hashing algorithm straight out of Knuth. More recently I've had success for small collections using arrays and performing linear search. This technique is used in Redis (see http://redis.io/topics/memory-optimization)
As soon as pools are used (see my other comment here) chaining is much faster than storing all elements in the table behind the hash -- you can use simpler hash function and have better performance even when the table is relatively full.
- alignment greater than 16-bytes, eg. 128 bytes for isolated buffers.
- the hardware prefetcher like to load cachelines around the memory actually accessed, just in case. So data chunks that will be accessed at the same time better be near each other to save cache usage a bit.
- memory access which does not have a simple pattern is slower than one which is contiguous or have a simple stride.
I just wanted to say that when allocations are very cheap and kept "near" keeping lists of elements linked from the hash array (http://en.wikipedia.org/wiki/Hash_table#Separate_chaining) instead of keeping all the elements in the hash array itself (http://en.wikipedia.org/wiki/Hash_table#Open_addressing) makes the hash table faster when the hash functions are simple and fast as the number of collisions inside of the array sinks. I don't know if that would be applicable for dmd.