I always wondered why when it comes time to create data structures we rarely if ever use a doubling of the constant factor to fix or at least reduce some of these issues.
I suppose you could call it a generational hash table, but really it's a Hash Table interface wrapped around 2, occasionally 3 hash tables.
Create a read-write hash table A. When A hits capacity, create a second read-write hash table B (a constant factor 1.0 < n < 1.5), and stop writing to A. Create a read-only hash table R, dump A into it. During this time, all reads look at [B, A, R] in sequence. Once R is created, swap the list to [B, R]. When B fills, create another RW hash C (again resized) and dump R and B into a new read-only table S and swap the list to [C, S].
During resizes you only ever have 3 places to look. The Read-Write hash has to handle tombstones (and keep a count of them to keep resizing costs O(1)). I believe the main thing not handled here is if concurrent mutation traffic is faster than the copy constructor for the read-only heap. As near as I can tell the cited paper here has no solution for this.
I need to re-read Cliff Click's description of his lock-free hash table. If memory serves it has flavors of this and also answers the back-pressure problem (amortization across writers). But I suspect you can do something rather pedestrian and get similar results.
I would have agreed with you until about a year ago, which was when my browser somehow broke crucial PDF features like search in their already plainly-legacy PDF renderer…
The ACM website's online PDF viewer is terrible. The zoom buttons need to be pressed so many times to make any meaningful change, the document pages load lazily, there are many unnecessary animations, the scrollbar position keeps jumping around, etc. Good thing there is a download link so that I can use my browser's built-in PDF viewer or an external reader.
May have something to do with drm. When you download a pdf, you get some watermarked version with a rightslink banner. However, when you view it, it doesn't show you the banner. They didn't want to ruin your viewing pleasure with the banner. So they ruined it with the shitty pdf viewer.
Remember the era when people couldn't figure out how to make footers work with infinite scroll? Even Facebook messed it up. Try to click something and the next page would load and off the footer went half way down the window.
I was once asked what career I'd pursue other than programming.
I replied that I would open a restaurant that focused on stir fries created from a limited number of finely chopped ingredients simmering in predefined locations of a large grill. Not only tasty, this arrangement would allow serving customers quickly with little overhead.
I always preferred immutable sorted trees (such as found in Haskell or Scala) to concurrent hash tables when dealing with concurrent sets or maps.
You can add or remove a node from such trees without modifying the original trees, by only creating O(log(n)) nodes (tree branches that don't change don't have to be copied). So "modifying" a tree does not invalidate the previous versions of the tree, that can still be processed by concurrent threads if they still hold a reference to the root node. No longer referenced tree branches will then be discarded by the GC.
With a concurrent hash table, you have to hold a lock on the whole datastructure if you don't want it to be modified while processing it, and this makes the whole thing is harder to think about overall.
I also find it easier to implement a correct and fast comparison function (required by tree structures) than an equivalent hash function. Trees can also give you range search, and keep your items sorted, which are useful in some cases.
Trees are now my go to datastructures for map and sets unless maximum speed and minimal memory use are absolutely required. These might not be as efficient, but they make more reliable software.
Not if you don't want the hash table to change while doing some read only processing on it. Immutability ensures that the reference you get always point towards a object that is in a consistent state.
That is not the point. The point is that the hash table can be updated in a way (e.g., atomic) that is invisible to other readers and thefore they always see a consistent version of the hash table itself.
Is this covered in the article? I don't see this linked, but it was just a blog post, so maybe not relevant here, and they describe the same algorithms anyway.
The benefits are visible if you search for the word "TSX". It says:
"Replacing the insert and update functions of our specialized growing hash table with Intel TSX variants increases the throughput of our hash table by up to 7%"
Well, yeah, TSX appears to be a side channel attack exploitable by a malicious hyperthread. But that's not particularly close to TSX itself being buggy on Haswell.
If you only run trusted code, Zombieload V2 seems irrelevant (on first glance. I didn't look deeper to see if there are ways to exploit it from a bytecode interpreter or such.)
If you have a GC HAMTs are ideal for a lock free transactional semantics. In fact all pure data structures can be generalized to give lock free optimistic transactions with a single CAS. HAMTs just have the nicest properties for minimizing conflicts.
I suppose you could call it a generational hash table, but really it's a Hash Table interface wrapped around 2, occasionally 3 hash tables.
Create a read-write hash table A. When A hits capacity, create a second read-write hash table B (a constant factor 1.0 < n < 1.5), and stop writing to A. Create a read-only hash table R, dump A into it. During this time, all reads look at [B, A, R] in sequence. Once R is created, swap the list to [B, R]. When B fills, create another RW hash C (again resized) and dump R and B into a new read-only table S and swap the list to [C, S].
During resizes you only ever have 3 places to look. The Read-Write hash has to handle tombstones (and keep a count of them to keep resizing costs O(1)). I believe the main thing not handled here is if concurrent mutation traffic is faster than the copy constructor for the read-only heap. As near as I can tell the cited paper here has no solution for this.
I need to re-read Cliff Click's description of his lock-free hash table. If memory serves it has flavors of this and also answers the back-pressure problem (amortization across writers). But I suspect you can do something rather pedestrian and get similar results.