Message passing has allocation pressure and cache consistency pressure not present in using a shared message location. Especially as the amount of memory in question goes up, the benefit of a shared location increases in terms of the performance impact.
Sure, for something silly like writing to an int, there is negative benefit in a shared location, but when you start talking about a dictionary with 1 million entries, the shared location becomes much more of a benefit vs all the copying and allocating you'd have to do if you tried to do the same thing with message passing.
For some datastructures, it's the optimal way to move data around. For example, LMAX disruptor is about the fastest way to pass messages because of the shared memory and well tuned locks.
You are talking about writes to a data structure such as a list or a dictionary from multiple threads. Nobody uses advanced message passing techniques for that. A list is basically the poster child of why you avoid mutexes: each thread writes to its own version of a sublist, with no use of mutexes, and then at the end of the processing every thread's lists are merged together. Merging a list takes O(1) time by manipulating a few pointers. You avoid mutexes and there's zero drawback. I don't know why you are talking about copying or allocating: a list requires no copying to be merged, and in this case all the allocations still happen over multiple threads so there's no change in allocation pressure. If your allocator is bad, there could be internal mutexes inside the allocator, but that's besides the point.
With dictionaries, you may have to do a bit of resizing and rehashing at the end. But in my benchmarks, it is still worthwhile.
Depends on the list implementation. I assume we are talking C++ lists? in that case yeah, I can see how you'd instead do a `splice` of sublists.
I was thinking more in terms of something like a `vector` in which case adding the sublists in requires copying values from one vector to the source vector. That (especially if done wrong) can involve allocating potentially more than once to drop the results in.
But, for the record, there are lock free algorithms that don't require a mutex to add values into a list. You basically CAS the tail pointer with your new value.
That being said, I concede that for a list a mutex is probably the wrong choice. It's a better choice in Java which has constraints that prevent (easily) splicing together 2 lists.
For the dictionary, implementation is key. A good segmented dictionary is going to be hard to beat.
Message passing has allocation pressure and cache consistency pressure not present in using a shared message location. Especially as the amount of memory in question goes up, the benefit of a shared location increases in terms of the performance impact.
Sure, for something silly like writing to an int, there is negative benefit in a shared location, but when you start talking about a dictionary with 1 million entries, the shared location becomes much more of a benefit vs all the copying and allocating you'd have to do if you tried to do the same thing with message passing.
For some datastructures, it's the optimal way to move data around. For example, LMAX disruptor is about the fastest way to pass messages because of the shared memory and well tuned locks.