Yup. Message passing has allocation pressure and cache consistency pressure not ...

kccqzy · 2024-10-02T23:59:01 1727913541

You are talking about writes to a data structure such as a list or a dictionary from multiple threads. Nobody uses advanced message passing techniques for that. A list is basically the poster child of why you avoid mutexes: each thread writes to its own version of a sublist, with no use of mutexes, and then at the end of the processing every thread's lists are merged together. Merging a list takes O(1) time by manipulating a few pointers. You avoid mutexes and there's zero drawback. I don't know why you are talking about copying or allocating: a list requires no copying to be merged, and in this case all the allocations still happen over multiple threads so there's no change in allocation pressure. If your allocator is bad, there could be internal mutexes inside the allocator, but that's besides the point.

With dictionaries, you may have to do a bit of resizing and rehashing at the end. But in my benchmarks, it is still worthwhile.

cogman10 · 2024-10-03T00:34:49 1727915689

> Merging a list takes O(1) time

Depends on the list implementation. I assume we are talking C++ lists? in that case yeah, I can see how you'd instead do a `splice` of sublists.

I was thinking more in terms of something like a `vector` in which case adding the sublists in requires copying values from one vector to the source vector. That (especially if done wrong) can involve allocating potentially more than once to drop the results in.

But, for the record, there are lock free algorithms that don't require a mutex to add values into a list. You basically CAS the tail pointer with your new value.

That being said, I concede that for a list a mutex is probably the wrong choice. It's a better choice in Java which has constraints that prevent (easily) splicing together 2 lists.

For the dictionary, implementation is key. A good segmented dictionary is going to be hard to beat.