Have you considered or tested using either closed hashing or linear array lookup...

acqq · on July 25, 2013

As soon as pools are used (see my other comment here) chaining is much faster than storing all elements in the table behind the hash -- you can use simpler hash function and have better performance even when the table is relatively full.

WalterBright · on July 25, 2013

I haven't spent much time looking into cache effects in the compiler's internal data structures, that gold hasn't been mined yet.

p0nce · on July 26, 2013

Some ideas (at least on x86):

- alignment greater than 16-bytes, eg. 128 bytes for isolated buffers.

- the hardware prefetcher like to load cachelines around the memory actually accessed, just in case. So data chunks that will be accessed at the same time better be near each other to save cache usage a bit.

- memory access which does not have a simple pattern is slower than one which is contiguous or have a simple stride.

acqq · on July 25, 2013

I just wanted to say that when allocations are very cheap and kept "near" keeping lists of elements linked from the hash array (http://en.wikipedia.org/wiki/Hash_table#Separate_chaining) instead of keeping all the elements in the hash array itself (http://en.wikipedia.org/wiki/Hash_table#Open_addressing) makes the hash table faster when the hash functions are simple and fast as the number of collisions inside of the array sinks. I don't know if that would be applicable for dmd.