The only limitation of "Dynamic Storage Allocation A Survey and Critical Review...

eatbitseveryday · on Aug 26, 2016

> There's no need to mention you may get downvotes. We ask people not to do that, as it can tend to have a "I dare you" effect: https://news.ycombinator.com/newsguidelines.html

Oops. Sorry, I will be careful next time. Thanks for pointing that out.

> does not include a discussion on multithreaded allocator design

This is very true. tcmalloc seems to have been the earliest design with thread-local pools. jemalloc didn't originally have this design[1], and over time many allocators just adopted it, including SuperMalloc and others.

[1] https://www.facebook.com/notes/facebook-engineering/scalable... (search for tcmalloc)

scott_s · on Aug 26, 2016

Actually, thread-local pools predates tcmalloc by quite a few years. Cribbing from the related work section from a paper I'm a co-author on from 2006 (http://www.scott-a-s.com/files/ismm06.pdf):

"Streamflow uses segregated object allocation in thread-private heaps, as in several other thread-safe allocators including Hoard [3], Maged Michael’s lock-free memory allocator [18], Tcmalloc from Google’s performance tools [10], LKmalloc [15], ptmalloc [9], and Vee and Hsu’s allocator [25]. In particular, Streamflow uses strictly thread-local object allocation, both thread-local and remote deallocation and mechanisms for recycling free page blocks to avoid false sharing and memory blowup [3, 18]."

[3] E. Berger, K. Mckinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proc. of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117–128, Cambridge, MA, November 2000.

[9] Wolfram Gloger. Dynamic Memory Allocator Implementations in Linux System Libraries. http://www.dent.med.unimuenchen.de/wmglo/malloc-slides.html.

[10] Google. Google Performance Tools. http://goog-perftools.sourceforge.net/.

[15] P. Larson and M. Krishnan. Memory Allocation for Long-Running Server Applications. In Proceedings of the First International Symposium on Memory Management, pages 176–185, Vancouver, BC, October 1998.

[18] M. Michael. Scalable Lock-free Dynamic Memory Allocation. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, pages 35–46, Washington, DC, June 2004.

[25] V. Vee and W. Hsu. A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors. In Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks, pages 230–235, Perth, Australia, June 1999.

The earliest appears to be Larson and Krishnan from 1998. It appears that in the late '90s and early 2000s, it was SMP focused, for servers. Then in the early to mid 2000s, people (including my advisor) started realizing this whole "multicore" thing was for real, and system software would have to change.

eatbitseveryday · on Aug 26, 2016

I had read your paper :) but did not dig enough to find whether tcmalloc had introduced the concept or not. Thanks for pointing those out!

scott_s · on Aug 26, 2016

I wasn't sure where it appeared first, either! I had to dig out that old related work section. There may be work that predates the '98 reference, but it may not have gotten much attention. (I had assumed Hoard would be the first in the literature, but that's from 2000.) I think when it shows up is more related to the available hardware at the time, and what people were doing with it. It's not a huge stretch to imagine thread-local pools, but I don't think enough people were paying attention to the problem before then.