Concurrent Gen1 seems fine. You still have to worry about STW Gen2, but it's much better.
FWIW, no one serious about allocations in native code uses naive malloc()/free(). My favorite trick is in game programming where you have a current frame pool that you just reset every frame.
The thing I like about pools is that they seem to actually reduce the cognitive load required to consistently get memory management "right" in a large application.
With manual memory management you have to worry about the ownership convention of each chunk of code, with GC you have to worry about architecting strong/weak references so as not to inadvertently retain everything (not to mention latency issues), and with reference counting you have to worry about ownership cycles. In practice I haven't come up with a better strategy than enforcing some sort of top-down hierarchy which effectively smashes most of the differences in cognitive load between manual/refcounting/GC. GC has slightly less upfront busywork but in practice the tooling tends to be poor so it's a wash (if that). In GC and refcounting it's easy for one inexperienced/tired/sloppy individual to create a massive leak completely out of proportion to the footprint of their immediate code.
Pools, in contrast, allow the same top-down approach with the very significant benefits that I don't have to think about the hierarchy at a finer level than the pool itself (which I have to do for the 3 other approaches) and that memory management mistakes don't typically lead to the globally-connected-component of the dependency graph sticking around indefinitely.
One big problem with pools is that you have to deallocate the pool sometime (or else your program's working set will continually grow over time), and when you do, all the pointers into that pool become dangling.
Another problem with pools is that you can't deallocate individual objects inside a pool. This is bad for long-lived, highly mutable data stores (think constantly mutating DOMs, in which objects appear and disappear all the time, or what have you).
Another big downside to pools is that a lot of GC implementations will scan only 'live'objects. Large object pools unnaturally increase the number of objects that need to be scanned during GC, negating a (sometimes) useful GC efficiency tick.
Precisely. This is an especially important point in response to the oft-repeated "just use pools if you want to avoid the GC in a GC'd language" fallacy.
If your GC is triggered after the allocated memory increases by X% (which is fairly common), then this technique is effective, since it lowers allocation rate.
Also, Go doesn't scan arrays that are marked as containing no pointers, so representing an index as a massive array of values has proven quite effective for me.
Fair enough. I suppose there are a significant number of applications where there isn't an obvious way to perform coarse-grained partitioning of object lifetimes. If you are a language designer looking to force a memory management scheme on all of your users pools would be a bad choice.
FWIW, no one serious about allocations in native code uses naive malloc()/free(). My favorite trick is in game programming where you have a current frame pool that you just reset every frame.