Just to inject a comment here that isn't complaining about the post, I found it an interesting read! It's pretty rare for anyone working in these sorts of situations to explain anything at all and I'm grateful the author bothers to write them up.
I inferred from context that "stackgap" was some sort of random memory space added to a process, but in case anyone else isn't familiar with it I found this presentation with a picture (the previous slide explains a bit more too):
From there it suggests it's normally a random gap in the ~256kb range. The change mentioned in the post increased the range to up to 2mb, which explains why that, coupled with a 2mb stack limit, caused such problems.
I'm a little surprised this didn't trigger problems earlier, since it would randomly causes processes to have half as much stack as they used to. I guess most processes don't use anywhere near 2mb (that's the 4mb limit - potential 2mb of gap) of stack?
This method implementation feels a bit dubious - the same effect, making it harder to exploit buffer overflows, can be achieved by moving the whole stack around randomly in the process' address space while preserving its size.
I don't know for sure, but I'm guessing the fact that it runs from a ramdisk means that it's probably more memory-constrained than a regular kernel, so they configure it to use as little memory as possible. Not much more, on modern systems with gigabytes of RAM, but moreso on older systems.
well, technically the web site was named after the sort of problem discussed. although in this case the stack problems aren't due to application programmers making mistakes as they usually are, it is due to the OS programmers making the location the stack start random for security reasons. then counting that against the amount of memory applications use and resulting in the error
An obviously smart guy who doesn't know how to tell a story OR provide enough technical detail to the curious.
Who's the audience for this post?
Script programmers who never had to turn a computer off and on because they popped the wrong number of arguments off the stack before returning from a function?
Kernel hackers who didn't already see the checkins?
People who Google a problem after their kernel upgrade fails with obscure messages, who then receive no information on how to solve the problem?
Context is everything. From the title on down, this has none. If you're not going to take the time to do it right, let the commit messages tell the story.
I love learning about systems and I love a good "hunt down an obscure bug" story but this post offers neither.
I thought it was interesting? It's not even like the author himself came here and put it in front of you. I can send you some links with much more comprehensive displays of vacuousness on the internet if you need some solid targets.
And it's not like the author has a comment section so I can tell him there.
Call me a dick if you like, but I'd love the web to contain a little middleground between mindless listicles and sentences like "Technically, the errors above come from ksh, when execve fails because there’s no room left in the new process to copyout argv."
I know what those words mean, I'm doubly-sure Ted does, and I'm sure it'd be a far better post if he spent another thirty words on the issue.
Call me a dick if you like, but I'd love the web to contain a little middleground between mindless listicles and sentences like...
And I'd love the web if people could provide criticism without overwhelming amounts of sarcasm and put downs. We can't always get what we want.
You're post easily could have said something along the lines of Interesting subject matter, but I would appreciate the article more if it either contained more technical details on the issue, or provided more context around the issue itself and it's implications.
Feel free to email him tedu@tedunangst.com or tweet @tedunangst
Just because there's a comment section doesn't mean you can't comment. You can read James Hague's thoughts on the matter here: http://prog21.dadgum.com/57.html
Or I can choose to leave a comment on HN and raise the broader issue of lack of context, something that plagues a lot of tech bloggers.
If the first words I see when I arrive on a site are "guest" and "flak," and then there's a misleading headline, and the seventh through ninth word in the post itself is "OpenBSD install kernels" with no further explanation as to what that is then I reserve the right to express both a local and global observation regarding the quality of the information as presented.
I'm amused that this is downvote central but turning a valid observation into a slightly lighter shade of beige isn't refuting my argument.
I truly find this fascinating. That site has what appears to be a pokemon character wearing a drum kit, adds the line "Home: :Add Story: :Archives: :About: :Create Account: :Login:" and proceeds to quote the opening of the article while providing absolutely no additional information.
CmdrTaco used to provide a transitional sentence or two, and that's what made sites that look just like OpenBSD Journal actually have appeal and usefulness. Has the internet forgotten this lesson too?
This bug doesn't have security implications. The term 'stack overflow' is slightly ambiguous, as it could either mean buffer overrun in the stack (which can often lead to arbitrary code execution) or the program simply running out of stack space (like in the article).
I inferred from context that "stackgap" was some sort of random memory space added to a process, but in case anyone else isn't familiar with it I found this presentation with a picture (the previous slide explains a bit more too):
http://www.openbsd.org/papers/auug04/mgp00005.html
From there it suggests it's normally a random gap in the ~256kb range. The change mentioned in the post increased the range to up to 2mb, which explains why that, coupled with a 2mb stack limit, caused such problems.
I'm a little surprised this didn't trigger problems earlier, since it would randomly causes processes to have half as much stack as they used to. I guess most processes don't use anywhere near 2mb (that's the 4mb limit - potential 2mb of gap) of stack?