The only limitation of "Dynamic Storage Allocation A Survey and Critical Review" is that it does not include a discussion on multithreaded allocator design - because the paper predates when people really started caring. I'm unaware of a survey paper which covers that aspect of memory allocation.
(By the way: yours is a good comment! There's no need to mention you may get downvotes. We ask people not to do that, as it can tend to have a "I dare you" effect: https://news.ycombinator.com/newsguidelines.html)
Oops. Sorry, I will be careful next time. Thanks for pointing that out.
> does not include a discussion on multithreaded allocator design
This is very true. tcmalloc seems to have been the earliest design with thread-local pools. jemalloc didn't originally have this design[1], and over time many allocators just adopted it, including SuperMalloc and others.
Actually, thread-local pools predates tcmalloc by quite a few years. Cribbing from the related work section from a paper I'm a co-author on from 2006 (http://www.scott-a-s.com/files/ismm06.pdf):
"Streamflow uses segregated object allocation in thread-private
heaps, as in several other thread-safe allocators including Hoard
[3], Maged Michael’s lock-free memory allocator [18], Tcmalloc
from Google’s performance tools [10], LKmalloc [15], ptmalloc
[9], and Vee and Hsu’s allocator [25]. In particular, Streamflow
uses strictly thread-local object allocation, both thread-local and
remote deallocation and mechanisms for recycling free page blocks
to avoid false sharing and memory blowup [3, 18]."
[3] E. Berger, K. Mckinley, R. Blumofe, and P. Wilson. Hoard: A Scalable
Memory Allocator for Multithreaded Applications. In Proc. of the 9th
International Conference on Architectural Support for Programming
Languages and Operating Systems, pages 117–128, Cambridge, MA,
November 2000.
[15] P. Larson and M. Krishnan. Memory Allocation for Long-Running
Server Applications. In Proceedings of the First International
Symposium on Memory Management, pages 176–185, Vancouver,
BC, October 1998.
[18] M. Michael. Scalable Lock-free Dynamic Memory Allocation. In
Proceedings of the ACM SIGPLAN 2004 Conference on Programming
Language Design and Implementation, pages 35–46, Washington,
DC, June 2004.
[25] V. Vee and W. Hsu. A Scalable and Efficient Storage Allocator
on Shared Memory Multiprocessors. In Proceedings of the 1999
International Symposium on Parallel Architectures, Algorithms and
Networks, pages 230–235, Perth, Australia, June 1999.
The earliest appears to be Larson and Krishnan from 1998. It appears that in the late '90s and early 2000s, it was SMP focused, for servers. Then in the early to mid 2000s, people (including my advisor) started realizing this whole "multicore" thing was for real, and system software would have to change.
I wasn't sure where it appeared first, either! I had to dig out that old related work section. There may be work that predates the '98 reference, but it may not have gotten much attention. (I had assumed Hoard would be the first in the literature, but that's from 2000.) I think when it shows up is more related to the available hardware at the time, and what people were doing with it. It's not a huge stretch to imagine thread-local pools, but I don't think enough people were paying attention to the problem before then.
(By the way: yours is a good comment! There's no need to mention you may get downvotes. We ask people not to do that, as it can tend to have a "I dare you" effect: https://news.ycombinator.com/newsguidelines.html)