Hacker News new | past | comments | ask | show | jobs | submit login
Abandoning segmented stacks in Rust (mail.mozilla.org)
134 points by dbaupp on Nov 6, 2013 | hide | past | favorite | 32 comments



As noted in the link, Go is also moving to contiguous stacks[1]. This is supposed to be implemented in 1.3.

[1] https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzW...


I thought the segmented stack was the key to having hundreds of thousands of userland threads. It seems pretty farfetched (though not impossible) that realloc'ing the entire thread stack when it gets too big.


I think Go is primarily designed for servers and servers are overwhelmingly 64bit these days. In a 64bit virtual address space, you can have millions of contiguous stacks each of gigabytes size.


Not really. Assuming that page size is 4KB, and given that for most systems, the minimal amount of memory that can be committed is a page, you will exhaust all 4GB of your server's physical RAM with only 1 100 000 threads.


I don't think many real servers have 4 GB these days. The dev box under my desk has 32. Linux servers often have 1 TB of physical memory. So based on your calculation that's 256 million threads. Sounds enough :)


The whole point of a cheap concurrency like goroutines is the ability to avoid using callback based async IO (i.e. the go standard libraries perform select/epoll/... under the hood but your code behaves like if it was blocking on IO).

You might have better use of your memory (caching, in memory databases) than wasting it for even a fraction of a million of routines that are just waiting for something to happen.


Yes, but if I buy a server with e.g. 128 GB of RAM, I would preferably spend it to cache 100GB of live data and searchable indices, and not worry when I will spawn too many threads so that I will start trashing the swap.


That's true, but the majority of those 64-bit servers only have a 48-bit address space to millions of gigabyte-sized stacks would still be a tight fit.


Obviously it would be a "virtual" gigabyte-size stack. I.e. it would be mmapped, but all the threads would not actually be using all their stack. Physical memory is not allocated, but virtual addresses are.


One can only fit 2^18 == 262144 stacks with size 1GiB into the 48-bit x86-64 address space (i.e. 48 bits is the total virtual address size).


Isn't 1GiB stack quite huge? I would have excepted them to be like order of magnitude smaller.


Yes, it's huge. C programs on linux normally defaults to a 8 or 10 MB stack per thread, and that is considered huge by many.


Not entirely, as I understand it. Segmented stacks is just one solution. Another is to start with a very small stack, and grow it contiguously (copying to a new location if necessary) when an overflow is imminent. You still get the benefit of having small stacks, which lets you make those 100ks of userland threads, and as long as they don't blow up their stack usage (same requirement as for segmented stacks), they'll be fine.


I played with the idea of something like this a while back, but never finished it:

https://github.com/roboprog/buzzard/blob/master/test/src/mai...

It performed reasonably well in a demo doing a bunch of string concatenation, but I never followed up on it.

https://github.com/roboprog/mp_bench/blob/master/thd_frk.bzr...


Further down the thread there are several alternatives to segmented stacks brought up.


I don't like the name contiguous stack. Yes, they are contiguous, but their defining characteristic is that they move. That's why I prefer the term copying stacks.


I'm a big fan of Mozilla and the Go team, but do recognize this is something Dlang's Walter Bright criticized from the beginning. As I assume all parties know much more than I do, I wonder what reason both languages chose this route...


Graydon Hoare (Rust's original designer) was well aware of Walter's opinion regarding segmented stacks. In fact, Graydon heartily agreed with that opinion when it came to 64-bit architectures. On 32-bit, however, segmented stacks do theoretically allow many many many more concurrent tasks than you'd achieve merely with multithreading. Therefore, Graydon's opinion was that segmented stacks would remain relevant as long as 32-bit architectures were.

Now, for context, remember that Graydon began working on this language (in his own free time) in 2006, when the architecture landscape was much different. And then later in 2009/2010, after Mozilla picked up the language in order to write a web browser in it, one of the primary goals was to be able to target phones, which lag fairly far behind desktops in 64-bit adoption. But now that Rust has a stable release target of mid-2014, tailoring core language features to the diminishing 32-bit demographic is just no longer tenable.


I'd rather someone tries segmented stacks and determines it isn't a good fit than to just discard it on a comment from something someone said. This way future researchers have more data points, whereas just dismissing out of hand leaves a rather big question mark.


Well, I agree I'd rather someone try it for another data point. However, this isn't just a random internet comment, it's a man who has probably written more compiler code than about anyone of recent times .


That probably is true, but lots of great men were wrong about stuff that were in their domain - see Einstein and Cosmological constant.


The hot split problem is an example of a poor grow/shrink algorithm, and the contiguous stack would exhibit the same behaviour if the growth and shrink thresholds are the same.

How about avoiding hot split by changing the deallocation algorithm to allow empty blocks and see what happens?


> The hot split problem is an example of a poor grow/shrink algorithm

Not really. With segmented stacks, when you hit the end of a stack, you must (1) allocate a new segment to extend the stack and (2) copy the current function frame to the new segment and resume execution. Then, when the function returns, you have to (3) deallocate the new segment and shrink the stack, and (4) resume execution of the last function frame on the previous stack segment.

Now, steps (1) and (3) are optional, but steps (2) and (4) are not, and incur a significant overhead (compared with non-segmented, contiguous stack, e.g. C stack).


You could add step (5) recognize that you have a hot split and amend step #2 to copy all of the hot frames to the new segment.

Their point is that if you have done something wrong if you have to dynamically turn on profiling to keep the stack from catching on fire.


That's a possible solution, but it requires that stack is relocatable (i.e. that you can find and rewrite all pointers to data on the stack), which is a significant complication with regards to a segmented stacks implementation. If you have a relocatable stack, you might as well use copy-the-whole-stack strategy with continuous stacks.


Segmented stacks are better even if you have a relocatable stack. GHC switched from monolithic copy-the-whole-thing-to-grow-it to segmented stacks a while ago, and it was a big win. Not just because we waste less space, but because the GC knows when individual stack segments are dirty so that it doesn't need to traverse the whole stack. To avoid the thrashing issue we copy the top 1KB of the previous stack chunk into the new stack chunk when we allocate a new chunk.


The idea is to leave the growing to the MMU. And a contiguous stack does not have to shrink immediately when its size goes down, whereas a segmented stack does (or at least it has to do something that causes it to set its stack pointer back to the initial smaller stack segment).


Still there is a problem with finite virtual address space (though not so significant on 64-bit architectures).


"[...] I expect that on 64-bit platforms the number of concurrent tasks will be comparable to using segmented stacks. On 32-bit platforms, with their limited address space, the situation will not be as good, but this is a calculated risk that we can live without the same amount of concurrency there."

I thought that Rust is meant to aim for a "C replacement", but then doesn't the above quote raise some concerns towards ARMs and more embedded targets (e.g. AVRs, ...)? Or maybe my hopes were misguided from the beginning, I start to think now, given multithreading etc.? I'm confused now.


Embedded targets normally don't want segmented stacks anyway, and running hundreds of thousands of threads/tasks (which is what segmented stacks allows) is unlikely to be particularly useful(?).

In any case, this ML post is basically saying "Rust is reverting to C-like stack behaviour" (with some checks to handle stack-overflow safely (i.e. memory safety & security), the current implementation just aborts the whole process on stack overflow).


Disclaimer: not a language designer/implementer (yet).

Couldn't segmented stacks for async/await-style IO be simulated by swapping the stack out at every await? I mean copying out the registers and the raw stack to some heap location, then restoring them by reversing the process later.

(One problem I see is that it'd be slow when lots of IO operations are involved.)


Doesn't work if threads can have pointers into each other's stacks (which we'd like to support someday for fork/join parallelism).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: