Go 1.16 will make system calls through Libc on OpenBSD

simias · on Feb 2, 2021

Whenever the topic of Go bypassing the libc was brought up before the concerns were usually dismissed by the Go devs, saying that it was technically impractical due to some of Go's requirements not matching the libc's semantics.

Does this mean that the Go developers managed to work around this problem or that it was just a flimsy post-hoc justification for chronic NIH syndrome?

TFA itself links to another blog post discussing "Some reasons for Go to not make system calls through the standard C library"[1] but as far as I can tell it doesn't explain why it suddenly stopped being a problem on OpenBSD.

[1] https://utcc.utoronto.ca/~cks/space/blog/programming/GoCLibr...

AnIdiotOnTheNet · on Feb 2, 2021

> Does this mean that the Go developers managed to work around this problem or that it was just a flimsy post-hoc justification for chronic NIH syndrome?

No, there is almost certainly a bunch of headache having to do system calls through libc for Go. It didn't stop being a problem, there just isn't another option in OpenBSD's case. UNIXs are tricky for anyone who doesn't want to use libc since they typically define that as the official interface to the system. Linux is the outlier here by keeping its syscall ABI very stable.

simias · on Feb 2, 2021

I understand that but if they're going to go down that route for OpenBSD why not bring it to all platforms? While I can believe that going the raw syscall route can be easier than dealing with libc idiosyncrasies, it seems to me that maintaining both interfaces depending on the target OS would be trickier than just cutting your losses and going the libc route everywhere.

After all as far I can tell it's not just an OpenBSD problem since famously they got breakage in MacOS as well.

I must admit that I haven't taken the time to analyze in depth the pros and cons here, but Go's history of NIH coupled with the fact that basically every other mainstream language manages to work by binding the libc leaves me very perplex.

In particular some of the points raised in the blogpost I linked seem fishy to me. For instance errno being a global: this is in no way a Go-specific problem (multithreaded C couldn't run concurrent syscalls if it was true). In practice errno is thread-local instead of being a true global. It's explicitly documented in the man page:

> errno is defined by the ISO C standard to be a modifiable lvalue of type int, and must not be explicitly declared; errno may be a macro. errno is thread-local; setting it in one thread does not affect its value in any other thread.

I can believe that there are other issues I fail to consider, but again it works for everybody else, what makes Go so special here?

masklinn · on Feb 2, 2021

> I understand that but if they're going to go down that route for OpenBSD why not bring it to all platforms?

Because they really don’t want to, so they will avoid doing it until forced to, as previously happened with e.g. macOS.

> I must admit that I haven't taken the time to analyze in depth the pros and cons here, but Go's history of NIH coupled with the fact that basically every other mainstream language manages to work by binding the libc leaves me very perplex.

One of the issues Go has is it uses its own non-C stack. Libcs generally assume a C stack and don’t understand growable movable stacks so they can get very cross when called with an unexpectedly small stack (iirc Go defaults to 2k while the smallest C stack I know of is 64k on old OpenBSD, then macOS’s 512k for the non-main threads).

aduitsis · on Feb 2, 2021

Obviously I'm missing something here, because I was under the impression that, regardless of how big the stack is, the pages are going to be mapped (in the data structure pointed to by the CR3 register) when accessed and no earlier. Unless the allocation of the stack makes sure that it is mapped in its entirety upfront and stack access cannot incur minor page faults, which may not be so outlandish come to think of it. Can you please provide some clarification here? Thanks!

masklinn · on Feb 2, 2021

I'm not sure what clarification I could provide given I don't know what you're missing.

You're correct (AFAIK anyway) that stack pages are mapped lazily. That doesn't change the fact that C stacks are large allocations, while Go will only allocate a very small stack (2k last I checked) per goroutine.

simias · on Feb 2, 2021

Goroutines don't map directly to OS threads though, do they? So in practice it only matters for "hard" threads. I have no idea about what the Go scheduler looks like though, so I can't say if there's a direct relation between the two (coroutine stack vs. thread stack).

Also the default pthread stack size is (usually) 2MB which is indeed non negligible if you have a ton of threads, but with pthread_attr_setstacksize you can lower it. I can't seem to find the actual minimal size at the moment but I have a vague memory that you can reduce it to 16kB portably.

I'm currently working on a multithreaded Rust program with a bunch of threads and strong memory constraints and I use the runtime to reduce the stack to 64kB, it seems to work just fine.

And again, given that the language is garbage collected I can't help but find it a bit amusing that they're being stingy with a few MB of VMEM per thread. I guess that frees virtual memory for use in the balasts!

masklinn · on Feb 3, 2021

> Goroutines don't map directly to OS threads though, do they? So in practice it only matters for "hard" threads.

It matters for goroutines because that means you can't straight call into foreign code from a goroutine's stack, which is why cgo and friends are so problematic.

> Also the default pthread stack size is (usually) 2MB which is indeed non negligible if you have a ton of threads

That's not relevant because it's not vmem is the point. The actual resident size of a 2MB stack is 4k unless you start dirtying more pages. Unless you're running on 32b, VMEM doesn't really matter.

aduitsis · on Feb 2, 2021

Yes, sorry, let me be more clear, I don't understand why Go should allocate a small 2k stack (you are correct, the Go stack size has fluctuated through the years, nowadays it's 2k) and if it grows go to the trouble of copying it to a larger structure? Instead of just allocating a big stack that will grow lazily anyway?

Obviously the folks that created Go aren't stupid, far from it, so there must be a real and valid reason. But can't easily imagine what it is.

JulianMorrison · on Feb 2, 2021

Go is very, very profligate with threads.

When you have 100,000 threads, big stacks add up.

masklinn · on Feb 2, 2021

It's only vmem though is the point. It doesn't really matter that you have 100000 stacks of 8MB when all of that is vmem and you've got a page committed on each.

labawi · on Feb 2, 2021

Each allocation also needs an entry in the page table (actually a virtual memory tree). On x86/amd64 the fanout is about 1:512.

If you have 8MiB stacks, then a minimally allocated stack uses a 4KiB data page, but also 2MiB of address range uses up a full 4KiB bottom level page table, and 8MiB range takes up 4/512 x 4KB of 2nd level page table and so on. So you use about 8.03 KiB RAM if you never touch more than 4KB and your 8MB reservations are mostly grouped. Some architectures have bigger pages, increasing fanout but also the minimum allocation.

Contrast to 2KiB stacks without reservations/overcommit - you use about 2KiB of usable RAM + (1/2) x (1/512) of 4KiB 1st level page table + .. , assuming allocation are again mostly grouped. Hence, for up to 2KiB of stack memory you need about¹ 2.005 KiB of RAM. Works the same for 16 KiB and even 64 KiB page sizes.

100000 * (8MiB reserved, 1..4KiB used stack) needs ~784MiB RAM.

100000 * (2KiB reserved, 1..2KiB used stack) needs ~200MiB RAM.

Note that, if you actually touch your reserved stack, even once, your allocation can balloon to possibly tens or hundreds of GiB (100k * 8MiB = ~800GiB), unless you do complicated cleanup, while a segmented stack can keep the allocations within reasonable efficiency, freeing any excessive stack allocations in userspace.

¹ ignoring bookkeeping overhead in both cases, to keep the calculation clear. Hopefully it isn't more than a dozen bytes or so.

masklinn · on Feb 2, 2021

> 100000 * (8MiB reserved, 1..4KiB used stack) needs ~784MiB RAM.

Yeah so nothing really relevant, that's 1/20th of a relatively basic dev laptop's memory for 100k threads.

And of course that's an insane worst-case scenario of 8MB stacks, which the linux devs picked because they wanted to add a limit to the stack but didn't really care for having one, Windows uses 1MB stacks and macOS uses 512k off-main stacks, so you don't need anywhere near 8MB to get C-compatible stack sizes.

labawi · on Feb 2, 2021

> It's only vmem though is the point

With 8M/4M/2M/1M/512k stack size, 4K page size, you get about 800M/800M/800M/600M/500M RAM usage, of which only 400M is usable, rest is overhead. At 16K page size, it's ~1600M. Compared to 200M if using 2K side-by-side stacks, in any configuration.

Yes, probably not too excessive, even though it is noticeable when you spawn theads for anything and everything, and run more than a single application on a non-SV-developer PC.

I think main problems start when you actually touch more than the base allocation of the stack (or just use 16K pages). Maybe segmented stacks grow from 200M to 300M, 500M or whatever you actually use at a given time (with say 20-60% efficiency), but your C-compatible stacks might go from 500M to 3G¹ if you on average touch just 32K of stack per thread with some unlucky function, even though at any given time only the same ~100M of memory actually stores useful information.

¹ or more, no idea how high it typically goes, but that in itself is a nasty gotcha, and a likely reason you won't find "lightweight" threading in combination with native per-thread stacks

masklinn · on Feb 3, 2021

> With 8M/4M/2M/1M/512k stack size, 4K page size, you get about 800M/800M/800M/600M/500M RAM usage

Simulating this (by creating 100k maps of the relevant sizes) there is no difference in RES between 8M pages and 512K pages: it was ~495M for both, only the VMEM varied (respectively ~780G and ~50G). Touching the second page increased the RES of both to ~816M, which is about what you'd expect.

This is on a more or less stock x64 Mint.

> I think main problems start when you actually touch more than the base allocation of the stack

The thing is you're unlikely to do that in all of your 100k routines, most of them will not grow beyond their first page, and maybe their second… at which point the routine's stack would have grown to 8k anyway.

> maybe segmented stacks grow from 200M to 300M, 500M or whatever you actually use at a given time (with say 20-60% efficiency)

Go has not used segmented stacks since 1.3.

> your C-compatible stacks might go from 500M to 3G¹ if you on average touch just 32K of stack per thread with some unlucky function

Your not-C-compatible stacks will do the exact same thing. Since 1.3, stacks are realloc'd and double in size on every overflow.

The Go runtime will also shrink stacks if able (halving them) during a GC run, but you can do essentially the same thing on your C stack using madvise(2), and without the need to copy stack data around.

So what gain there is, is really only for goroutines which never grow beyond their initial size (and only since the default stack was decreased from 8k to 2k in 1.4), at the cost of all the C incompatiblity mess. And it assumes these 2k stacks are allocated from a reusable pool (which is probably a fair assumption though I certainly did not check it) otherwise they'd be on different pages anyway.

labawi · on Feb 5, 2021

> Simulating this .. there is no difference in RES between 8M pages and 512K pages

Linux memory management has notoriously complicated reporting. Your program only has 495M of usable memory mapped (RSS; not sure where the extra 100M is coming from), but RSS does not count page tables.

You cannot actually use the 400M of sparsely allocated memory (4K at 2+MB intervals) without another 400M of page tables. I'd suggest you try allocating and using >50% of RAM, or just compare how much can you use sparsely vs. dense before your program hits OOM. Note, you may need to enable overcommit, increase maximum map count, if you are mapping region seperately and preferably disable swap to avoid thrashing.

  sysctl vm.overcommit_memory=1
  sysctl vm.max_map_count=10000000
  swapoff -a

You can also watch /proc/memory PageTables total, which should show the difference.

That being said, I don't use go, I was merely pointing out that virtual memory management is not magical, and it has real memory costs when doing sparse allocations (also has quite significant other costs).

> The Go runtime will also shrink stacks if able (halving them) during a GC run, but you can do essentially the same thing on your C stack using madvise(2), and without the need to copy stack data around.

While you could do that, I believe it requires a syscall per-stack, which might be more expensive than a bit of copying, and I'm not completely sure how easy it would be to determine whether the memory has in fact been allocated and needs freeing.

tuxychandru · on Feb 2, 2021

This will mean a virtual memory allocation of 800 GB. Such large allocations can fail, even if virtual, depending on over commit settings.

masklinn · on Feb 2, 2021

> Such large allocations can fail, even if virtual, depending on over commit settings.

Given how Go has no issue being prescriptive as hell on other things, I don't see why they couldn't just go "set vm. overcommit_memory to 1 and fuck off", that's exactly what e.g. Redis tells you.

pjmlp · on Feb 2, 2021

Because that is very OS specific.

masklinn · on Feb 2, 2021

AFAIK BSDs don't have a concept of overcommit because you can't even turn it off.

And again it would hardly be the first time Go makes OS-specific decisions.

JulianMorrison · on Feb 2, 2021

You aren't thinking with big numbers. Small amounts of admin work add up. Virtual pages aren't free. Go has had to think very hard about what to pare down to avoid dragging to a halt with orders of magnitude fewer threads.

aduitsis · on Feb 2, 2021

My point exactly, thanks.

wbl · on Feb 2, 2021

Go wants low overhead multithreading and static linkage. Glibc is inherently dynamically linked, and sets up threads in ways Go doesn't want.

Error doesn't work for everyone else. It's a hack, you wouldn't do it if it wasn't for backwards compatability.

Blikkentrekker · on Feb 2, 2021

But why does it want to not dynamically link against a platform's libc?

What is the advantage in that? including that into a binary seems about as senseless to me as including the entire kernel in it.

badsectoracula · on Feb 2, 2021

Not all platforms have a "libc". Linux is an example: there is no standard c library and while glibc is very common, there are several distributions that use other C libraries like uClibc and musl. For example a gaming handheld i have which is running Linux uses uClibc

Another example is Windows, the platform API does not provide a C library (even MSVC has its own). While there is an MSVCRT.DLL it is not recommended to link against it as it is there only because some other software relies on it and its semantics are for around Visual C++ 4 (IIRC).

pjmlp · on Feb 2, 2021

Windows used to be an example like any other non-POSIX OS, however as of Windows 10, the C standard library is part of the OS.

https://devblogs.microsoft.com/cppblog/introducing-the-unive...

badsectoracula · on Feb 3, 2021

Ah yes, i've forgotten about it. I wonder how reliable that is as i noticed a bunch of applications i have ship their own copy of ucrtbase.dll.

Blikkentrekker · on Feb 2, 2021

But Linux is not a platform in general.

It is already next to impossible to write software that requires “Linux” and nothing more with all the kernel functionality that can be enabled or disabled.

Linux is a component of many different platforms, which indeed do provide different libcs, but also different t.l.s. libraries, different c.p.u. architectures, different Linux configurations and whatever else.

As far as I know with respect to Windows, it only provides stable interfaces viā C libraries, and does not have a stable binary interface to the kernel directly.

badsectoracula · on Feb 3, 2021

Right, Windows has a stable C API, but the comment i replied to was about a platform libc.

zlynx · on Feb 2, 2021

For containers Go static binaries are great. Used to build Docker containers for Kubernetes using a 8 MB Go binary. That was all you needed. No libraries. No "minimum OS' Alpine or Ubuntu image.

Blikkentrekker · on Feb 2, 2021

That seems to be a sensible use yes, but those aren't really any targeted and supported platform.

IgorPartola · on Feb 2, 2021

Not every libc is a glibc.

wbl · on Feb 2, 2021

True but errorno is still a hack.

Quekid5 · on Feb 2, 2021

> > [errno]

> I can believe that there are other issues I fail to consider, but again it works for everybody else, what makes Go so special here?

errno being thread-local doesn't really help all that much with a M:N threading model -- the runtime is going to have to be extremely careful to not stomp all over it. (Whenever calling into anything which uses errno. Of course system calls do that as well, but it's a much smaller surface area than "most of libc".)

tyingq · on Feb 2, 2021

It's OpenBSD's plan to eventually shut down any other option via system call origin verification. So golang has to bite the bullet at some point.

"The eventual goal would be to disallow system calls from anywhere but the region mapped for libc"

https://lwn.net/Articles/806776/

hda111 · on Feb 2, 2021

One reason not to go through libc is the possibility to create a pure Go userland like it’s done with the u-root project that is apparently in use at Google and resembles Busybox based systems with a memory safe language.

simias · on Feb 2, 2021

Unless you plan to never run anything but Go on your system you're going to need a libc at one point or an other. If anything this decision reinforces the NIH hypothesis for me. Almost like a prideful "NO C ALLOWED" stance which is not really pragmatic IMO.

The only counterargument I can think off would be to remove the libc footprint for ultra-small, go-only distributions but I highly doubt that it's significant on any modern system (even an embedded one) and if you care so much about reducing the runtime footprint why would you go with a GC language in the first place?

macksd · on Feb 2, 2021

>> Unless you plan to never run anything but Go on your system

That's actually exactly the use case for a lot of containerized applications written in Go.

edit: I've worked on multiple applications like this and there's the odd service where you need some other dependency, and then that service has to start running on Alpine Linux or something. But if the majority of your many "microservices" are pulling in bits of userland from a distro, they stop being "micro" very quickly which is a big deal when there's a lot of them. It's far preferable in that architecture if the entirety of your userland is your own binary.

acoard · on Feb 2, 2021

>Alpine Linux

From their website:[0] >"Alpine Linux is built around musl libc and busybox"

I could be off-base here, but it seems to me the better analogy would be some sort of "Go-Linux" that's like a Docker linux OS written entirely in Go.

[0]: https://alpinelinux.org/about/

macksd · on Feb 2, 2021

I wasn't making an analogy - I literally use Alpine Linux when I need some common *nix tools but I want something lighter-weight than the Docker images from more mainstream distros. musl libc and glibc aren't 2 fundamentally different things - the former is just a more permissive license and a lighterweight implementation.

throwaway894345 · on Feb 2, 2021

In containers it’s entirely reasonable and desirable that you only ship your app with minimal dependencies. I suspect that the people who are grumpy about upstarts doing away with libc are also inclined to argue that containers are the work of the devil or some such.

macksd · on Feb 2, 2021

>> containers are the work of the devil or some such

And ironically it was the BSDs and Solarises of the world that had more fully developed similar concepts long before they were mainstream on Linux. I'm not terribly surprised by OpenBSD's stance here, though, and even though I'm more in the Go / Linux ecosystem I can't really argue - it's an unfortunate collision of very different philosophies.

macksd · on Feb 2, 2021

I've actually wondered if there would be demand for a Busbybox / Toybox clone built in Go for this reason. I have zero need for it personally, but I think it would be fun to make.

masklinn · on Feb 2, 2021

> as far as I can tell it doesn't explain why it suddenly stopped being a problem on OpenBSD.

It doesn't. It's just that as previously with macOS, Illumos, or Windows, the Go project ended up with its back against the wall: in this case, ultimately only the OpenBSD libc will be allowed to make syscalls so their choices are "use libc" and "no Go on OpenBSD".

And in all honesty the followup https://utcc.utoronto.ca/~cks/space/blog/unix/CLibraryAPIReq... is much more convincing as to why it's a hassle to go through libc.

hacknat · on Feb 2, 2021

Normally I agree with anit-NIH sentiments, but why are we all supposed to just have to be happy with a language ABI that's been around since the 1970s?

macksd · on Feb 2, 2021

You would probably find yourself asking similar questions a lot in OpenBSD land - they're very committed to stability and security even if it means forfeiting many modern features. As I say in another comment, wanting to run Go on OpenBSD the way it does on Linux is a collision of very different philosophies. Hard to say one is wrong, but if they want to coexist one of them would eventually have to compromise, and it was never going to be OpenBSD in the end.

simias · on Feb 2, 2021

If it ain't broke...

Having a common entrypoint into the kernel we can hook into is fairly valuable IMO, especially when said interface is for the most part stable between all unx (and to a somewhat lesser extent even on Windows).

Having this lowest common denominator ABI makes interop relatively* painless. But clearly Go wants to eat the world and doesn't seem to care very deeply about this. I suppose it's a valid approach, even if I don't necessarily agree with it.

convolvatron · on Feb 2, 2021

  If it ain't broke...

user supplied read addresses

getpwent() - actually really the whole notion of users

file permissions

aynch storage event completion (all asynchrony is pretty broken)

system metadata control (sysctl, /proc, interface socket messages...)

signals

ttys

errno

affinity interfaces ...

bshanks · on Feb 3, 2021

Could you give any pointers to what is broken about these, and how a better way to do it would be? (I'm thinking about implementing a small system call library interface and I'd like to not repeat the mistakes of the past)

convolvatron · on Feb 3, 2021

user supplied read addresses - when a process does a write() it makes sense for it to supply the data as it filled it in. but it almost never makes the same sense for a read(), and using the user address requires a copyout. i'm convinced that having the read() results show up in a kernel-allocated buffer in userspace is a better idea, but this is somewhat subjective

getpwent(), the whole notion of users - we dont use computers the same way we used to in the 70s. talk, write, finger, wall - they aren't very fun anymore since its either just me on my laptop, or one of the 100s of virtual machines floating around. more importantly, the attempts to glue unix system user identity to distributed identities (PAM) have really turned out to be a mess

filesystem permissions - these are clearly insufficient given the number of system-specific addons here.

signals are so riddled with constraints and incompatibilities that they are basically useless - except you have to fiddle with them for things like SIGPIPE

ttys were already kind of broken when they were relevant,

errno is actually a property of the libc, but the status interface is pretty broken - have you even grepped the kernel code to find out what might be issuing an EINVAL?

why dont you mail me at yuri tenuki org? i love these kinds of chats

killingtime74 · on Feb 2, 2021

The NIH one, as always with Go

knorker · on Feb 2, 2021

It's a clash of two NIH worlds: Go and OpenBSD. :-)

masklinn · on Feb 2, 2021

I mean OpenBSD is hardly the first let alone only system which mandates going through libc (or equivalent) to interact with the OS.

knorker · on Feb 2, 2021

Don't get me wrong, it makes perfect sense. And Go seems to aim more towards pragmaticism than OpenBSD's elegance and correctness.

OpenBSD is deciding to invent their own paintbrush without even looking at what other people are painting with, while Go took the closest bucket of paint and threw it on the floor, not realizing they were standing in a corner while doing so.

And I say that as a happy user of both.

quotemstr · on Feb 2, 2021

> Go devs, saying that it was technically impractical due to some of Go's requirements not matching the libc's semantics.

I've always considered claims like this to be bullshit. I've worked a lot with both libc level system calls and (for crash report generation) direct system calls bypassing libc. They're the same interface. There's nothing you can do with a raw system call interface that you can't do through raw libc --- with rare exceptions for things like libc not having system call wrappers for some new system call or legacy ill-advised emulation like in fallocate. None of these exceptions is a justification for bypassing libc for calling open(2) or write(2).

As far as I can tell, the real reason Go eschewed libc system call interfaces is that the Go developers want to think of themselves as "not C".

ominous_prime · on Feb 2, 2021

Go has a different calling convention and stack layout from C, and wanted completely static binaries which you cannot always get when linking to the standard libc implementations. What is "bullshit" about these compatibility concerns? It's not like _Ken Thompson_ was unaware of how to use libc.

int_19h · on Feb 2, 2021

There are many languages and runtimes that use their own calling conventions - not the least of which is C++ - and even stack layouts. The problem with Go is specifically the spaghetti stack that they need for coroutines. But that is an inherent design issue - a systems programming language is supposed to play well with the system, and that means being able to use the standard OS APIs properly. Which on basically all platforms other than Linux means going via libc or equivalent (e.g. Win32). Go designers just unilaterally decided that the rules don't apply to them, because reasons. And the end result was stuff like this:

https://github.com/golang/go/issues/16606

So now they're gradually backtracking on all platforms - macOS switched to libc a while ago.

quotemstr · on Feb 2, 2021

Yet Go can call libc functions just fine on many OSes. Why can't it on Linux? Every single other managed runtime --- Mono, Java, Python --- can call through libc just fine. There is zero technical case for Go not doing the same thing.

The bullshit lies in these FUDlike insinuations that using libc would limit Go in some way. These insinuations are never backed up with technical specifics. I don't care that famous names are involved with Go: the presence of these people doesn't make Go's behavior correct or necessary.

There is zero technical case for Go doing what it does on Linux. You can make a "fully static binary" (which is a terrible idea anyway) with libc. Nobody should be making static binaries.

bborud · on Feb 2, 2021

Those sound like assumptions. Can you back it up with examples of what cases Go developers saw as problematic and point out what they should have done instead (and that this indeed does not represent a problem)?

tedunangst · on Feb 2, 2021

The driving force for this was the syscall origin check, but syscall ABIs change too. There's a speculative execution bug in ARM CPUs which requires barriers. This also requires patching.

https://marc.info/?l=openbsd-ports-cvs&m=158083696719245&w=2

lxgr · on Feb 2, 2021

For anyone wondering what the syscall origin check is about or how it's implemented: https://lwn.net/Articles/806776/

Seems like this is a good idea for multiple reasons. Another benefit is that it would seem to make something like "Wine in reverse" possible/easier to implement.

severino · on Feb 2, 2021

Can you please elaborate on this "Wine in reverse" concept?

lxgr · on Feb 2, 2021

I'm talking about a userspace-only Linux compatibility layer for Windows.

As far as I understand, Wine (a userspace-only Win32 emulator) is possible because Windows applications always go through the standard library for system calls, which therefore can be hooked without kernel support.

The same is not generally true for Linux binaries – these usually do go through glibc, but direct syscalls are possible (through raising the appropriate interrupt or instruction).

I don't think there's an easy way to trap these without kernel support (which is what WSL 1 has been doing).

masklinn · on Feb 2, 2021

> The same is not generally true for Linux binaries – these usually do go through glibc, but direct syscalls are possible (through raising the appropriate interrupt or instruction).

OTOH Linux having a well defined set of syscalls you can very convincingly fake being a Linux kernel. That’s what wsl1 did. That’s also what SmartOS does, which allows it to mix native and Linux (LX) zones.

lxgr · on Feb 3, 2021

Definitely, iSH does the same thing on iOS and it works really well.

But I don't see why we can't have both (a stable syscall interface and requiring all syscalls to go through a standard library).

masklinn · on Feb 3, 2021

> But I don't see why we can't have both (a stable syscall interface and requiring all syscalls to go through a standard library).

What's the point of having a stable syscall interface when you can require that only libc perform syscalls?

severino · on Feb 5, 2021

> I don't think there's an easy way to trap these without kernel support (which is what WSL 1 has been doing).

Thanks four your explanation. So is WSL2 different in this regards?

prepperdev · on Feb 2, 2021

vDSO is a good way to mitigate issues like these. It's also a better stable ABI than libc, and easier to maintain than pure kernel ABI (like Linux does), because nobody forbids the kernel to inject different vDSOs to different binaries, if a need arises.

vDSO is also a language independent construct, so there's no special treatment for any favorite language, be it C (OpenBSD), C++ (Windows) or Oberon OS (Oberon).

vDSO: https://en.wikipedia.org/wiki/VDSO

remexre · on Feb 2, 2021

vDSO functions are still allowed to use arbitrary amounts of stack though, which means Go might still have problems with them without having to allocate much larger stacks than it normally would.

masklinn · on Feb 2, 2021

In fact that is an issue Go hit in the past: https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/

prepperdev · on Feb 2, 2021

That would be up to a specific OS to decide which guarantees on stack size limit would it like to provide. But I agree that it's a valid concern, and not all possible vDSO implementations are reasonable.

Flow · on Feb 2, 2021

> All except some of the most recent arm64 processors have a speculative execution flaw that occurs across a syscall boundary, which cannot be mitigated in the kernel.

Big Ouch. I wonder how the Linux developers will approach this bug since they don't enforce syscalls to be done from glibc.

spijdar · on Feb 2, 2021

It's not enforced, but I'd dare say by and large almost everything will just use glibc. I'd assume if you're playing with fire enough to be calling syscalls yourself, you can mitigate the bugs yourself.

I don't think it'll be a bit problem, anyway. In my experience not very much calls syscalls directly. Go is a big exception, though...

arghwhat · on Feb 2, 2021

> It's not enforced, but I'd dare say by and large almost everything will just use glibc.

What about musl? uClibc?

Linux is well known for the fact that it guarantees its syscall interface as primary contract to userspace.

spijdar · on Feb 2, 2021

Okay, so I'm referring mainly to the typical "GNU/Linux" desktop/server OS vs embedded or "container Linux" which is where the majority of alternative libc use will happen.

In either case though there is a libc that most software will use, and the mitigations can be applied there. Even though direct use of syscalls is legal on Linux, the fact that it's stable is primarily relevant and interesting to said libc developers.

The fact that syscalls aren't guaranteed on other systems is usually of little consequence since the libc is developed in tandem with the kernels of those systems. Linux's situation as a fully decoupled kernel means it does things differently in that sense. The developers are fully separated, so there needs to be a strong "contract" that syscalls will be stable.

Doesn't mean (IMO) it's a good idea for end-users e.g. software developers to use syscalls except in exceptional circumstances. Which is usually the case!

arghwhat · on Feb 2, 2021

On Linux, libc developers are in the exact same group as regular developers.

libc is not intended to be the official entry point in any way or form, and kernel vulnerabilities and workarounds are not meant to be handled by a libc implementation.

That other OSs make libc their official interface is primarily because it's the simplest thing to do when kernel, libc and the rest of userspace is co-developed, as it allows for breaking kernel changes and other fun things that are not allowed under Linux ABI guarantees anyways.

It is not because it is the most secure choice, or that dealing with syscalls is hard (syscalls are easy and safe to work with). It's just that stable ABIs are a lot of work to develop, and this structure is just the simplest for smaller OS communities to develop.

rwmj · on Feb 2, 2021

> libc is not intended to be the official entry point in any way or form, and kernel vulnerabilities and workarounds are not meant to be handled by a libc implementation.

Depending on how you define "workarounds", glibc is full of those. For example stat(2) is very much not the same syscall now as it was back in the 1990s. In some cases glibc will do a runtime test to see which syscall variants are supported by the kernel and implement workarounds (I even saw a case where this caused a bug in some programs).

arghwhat · on Feb 2, 2021

This is indicative of glibc's bad design more than anything else.

stat(2) is not a single syscall. The changes are exposed as new, isolated syscalls (sys_stat, sys_newstat, sys_stat64), with glibc switching internally between them as it sees fit, surprising developers in the process.

This makes stat a great example of the syscall being easier to work with, more stable and more reliable than the glibc wrapper.

quotemstr · on Feb 2, 2021

Libc is absolutely the official entry point: it's libc that implements POSIX interfaces, not the kernel. If POSIX isn't official, what is?

Making libc the stable support boundary has all sorts of advantages to an operating system and basically zero downside. Only vanity argues for doing it the Linux way.

tsimionescu · on Feb 2, 2021

There are several popular libc variants on Linux. Which of them is official?

Also, Linux is not POSIX compliant and doesn't necessarily care about being. There are several important IO options on Linux that have nothing to do with POSIX.

Flow · on Feb 2, 2021

Not sure what you are advocating here.

If someone successfully injects code that perform a direct syscall they can successfully use this info leak despite a safe and patched (g)libc.

TheDong · on Feb 2, 2021

> If someone successfully injects code that perform a direct syscall they can successfully use this info leak despite a safe and patched (g)libc.

As Raymond Chen wrote, that's the other side of the airtight hatch (https://devblogs.microsoft.com/oldnewthing/20060508-22/?p=31...).

If someone is capable of directly running their own code that ignores libc (or other existing mitigations, such as may exist in javascript runtimes / go compiler) then cool, they can use spectre to perform a timing attack against their own code that they're running. Or they could just read their own memory.

Spectre's main risk was for reading other program's memory or for doing so remotely with javascript. If you can already make the process you're attacking run arbitrary syscalls instead of use glibc, then you've already won and no amount of protection will help.

lxgr · on Feb 2, 2021

> If you can already make the process you're attacking run arbitrary syscalls instead of use glibc, then you've already won and no amount of protection will help

You're basically arguing that in a post-spectre world, native processes can fundamentally never be a security boundary again, right?

I'm wondering if this is necessarily true. For the concrete example at hand, Linux could offer some opt-in mechanism, e.g. an argument to exec(), that restricts syscalls to glibc only. A sandboxing mechanism could then require all executed processes to go through glibc and instantiate them only using that option.

bregma · on Feb 2, 2021

Android is not GNU/Linux. There are an awful lot of Android computers out there. Embedded Linux is almost never GNU/Linux. There's an awful lot of embedded Linux out there.

Desktop Linux is hardly the majority of Linux installations. How many containers in the cloud are using something like Alpine Linux?

I'd dare say by and large glibc is in the minority.

the8472 · on Feb 2, 2021

> will approach this bug

Note that the mail is from 2020. So it probably already is fixed. It might be this one

https://patchwork.kernel.org/project/linux-arm-kernel/patch/...

tptacek · on Feb 2, 2021

It was very relevant to me a couple days ago, so I think it's not the case that Go programs that look up names link by default dynamically to libc, just for whatever that's worth to anyone.

(Our company, Fly.io, runs container images for customers on Firecracker microVMs around the world, and I had to build a DNS-dependent service, in Go, that runs directly from our (Rust) init and can't assume a libc exists).

X-Istence · on Feb 2, 2021

The downside to Go shipping it's own implementation of DNS resolution is that on systems that support far more rich set of DNS options (such as split DNS based upon hostnames in macOS) is that it doesn't work if the binary is built with a pure Go implementation.

That leaves users annoyed, because sending all DNS into a VPN tunnel is not always an option.

yencabulator · on Feb 3, 2021

If cgo is enabled (the usual case), Go uses libc for DNS if the configuration looks exotic enough that the pure-Go code wouldn't give the same results.

https://golang.org/pkg/net/#hdr-Name_Resolution

ArchOversight · on Feb 3, 2021

This is a long standing issue in various open source projects that use Go, where they want to ship static binaries.

Most recently, this happened with Concourse and it's Fly binary: https://github.com/concourse/concourse/issues/3691

The developers don't want to use CGO, or when they do use CGO they disable the net part... and now stuff doesn't work.

Projects have to specifically build with cgo enabled on macOS, or else it fails.

NewJazz · on Feb 2, 2021

Is your Rust init private? What does it do?

tptacek · on Feb 2, 2021

Mostly just init stuff. We're transforming Docker containers (mostly) into standalone VMs, so it's doing all the scut work of taking a completely stripped-bare booted kernel and getting it to a state where you can run an arbitrary Linux program on it.

I wouldn't want to take the thread off on a huge tangent, it's just funny that this was just recently super relevant to me (it would have been problematic if DNS-dependent Go programs depended on libc, because right now I can't assume there's a libc binary to be dynamically linked to).

Bringing it back to Go and its (sometimes libc-dependent) DNS libraries: it is very annoying how fiddly it is to get a Go program to use an alternative DNS server.

alaties · on Feb 2, 2021

Had a similar problem a couple years ago where I needed to use alternative DNS libraries to troubleshoot issues in a company's infrastructure.

Golang's rules for what implementation to use are found here: https://golang.org/pkg/net/#hdr-Name_Resolution

A really solid alternative DNS client implementation can be found here: https://github.com/miekg/dns. Real easy to read and vet compared to a few other libraries I ran into when working on this problem.

JoshTriplett · on Feb 2, 2021

I'm currently in the process, right now, of writing a minimal init in Rust to do exactly the same thing. Is yours something you'd consider sharing?

The sum total of what I want to do: bring up loopback, bring up the one and only Ethernet interface, set up its IP and basic routing, run another program, and do some basic log reporting.

tptacek · on Feb 2, 2021

I hack on our init, but I didn't write it. So it's not my place to share it. And there's some us-specific stuff in it that wouldn't be super helpful to you. But you could definitely ask Jerome on our team; he's a super helpful guy, even if he's super quiet here (he's like the anti-me). One way or the other I'm sure we can help you get where you're going!

JoshTriplett · on Feb 2, 2021

I reached out to Jerome. Thanks!

tptacek · on Feb 2, 2021

Feel free to follow up with Kurt, me or Michael if you don't get a quick response; Jerome is in Montreal and he may be buried under a mountain of snow and brown gravy.

vidarh · on Feb 2, 2021

If you can can control which DNS servers it talks to, implementing the DNS spec directly is a pretty trivial exercise to get full control. (If you can't, the main complexity is dealing with implementation quirks)

sivizius · on Feb 2, 2021

https://lwn.net/Articles/806776/ says, system-call-origin verification is for mitigation of ROP. ROP is (always?) a result of stack buffer overflows, which are a result of bad programming in asm/C/C++, but usually not a thing at all with modern languages like Go, Rust, Swift, Haskell, etc. while C/C++-programs usually use the libc already. This enforces programs written in such languages to use another layer, written in C, that is not really necessary but might introduce new vulnerabilities. Do I miss something?

roblabla · on Feb 2, 2021

The system call interface is usually written in an unsafe language such as asm anyways. Going through the libc is very unlikely to actually introduce vulnerabilities, especially if going through the lowest level function that directly wrap the syscall.

Not using the libc was always a risky proposition on BSDs anyways. They don't have a stable kernel ABI the same way the Linux kernel does. From OpenBSD's perspective, the stable ABI is the libc, and anything using the kernel ABI directly is liable for breakage with each update.

loosescrews · on Feb 2, 2021

> Going through the libc is very unlikely to actually introduce vulnerabilities

Citation needed. glibc has a long history of security bugs.

https://www.cvedetails.com/vulnerability-list/vendor_id-72/p...

roblabla · on Feb 2, 2021

Geez man, way to take things out of context. Let me fix your quote:

> Going through the libc is very unlikely to actually introduce vulnerabilities, especially if going through the lowest level function that directly wrap the syscall.

Sure, glibc has a bunch of bugs. But the lowest level of functions, that just wrap the syscalls, are very unlikely to have bugs. Here's the `read` implementation, for instance:

https://github.com/bminor/glibc/blob/21c3f4b5368686ade28d90d...

All it does is delegate to the low-level syscall, with some extra handling around to handle async calls (which can be removed when compiling glibc yourself, but you're probably not doing this).

Here's clone:

https://github.com/bminor/glibc/blob/21c3f4b5368686ade28d90d...

This one's written in asm, and you can't really simplify it all that much more.

All the functions that wrap the low-level syscalls are very hard to get wrong, really. Where the glibc bugs come from are the high-level functions, like pthread. But those can trivially be bypassed if necessary.

sivizius · on Feb 2, 2021

And where is the advantage then to have a `call syscall_wrapper` over `mov rax, syscall_number; syscall` if the ROP was able to return just before the call?

roblabla · on Feb 2, 2021

I wasn't making any claim on whether this mitigation is actually useful, I was simply stating it's unlikely to introduce more vulnerabilities than the status quo.

But since you're asking, I can think of two upsides:

1. It restricts the kernel attack surface available to only those syscalls that are exposed through the wrapper. (This is obviously dubious if the libc provides a generic syscall function, I don't know if OpenBSD libc has one).

2. It forces the ROP to either find the libc ASLR base, or to find a gadget that calls into the target libc function. This makes ROPs a lot harder to write.

As with any mitigations, they're mostly meant to make the attacker's life miserable. They're not full protections, and can often be bypassed. The point is to increase the cost of the attack.

lxgr · on Feb 2, 2021

To quote the grandparent post:

> especially if going through the lowest level function that directly wrap the syscall.

I think the argument is that security bugs are unlikely to occur in those low-level wrappers.

warmwaffles · on Feb 2, 2021

Other libc implementations exist that do not have many.

Musl[1] for one.

[1] https://www.cvedetails.com/product/39652/Musl-libc-Musl.html...

sivizius · on Feb 3, 2021

The number of CVEs is for the most part a function of how-often-does-someone-look-at-it and only for a tiny part a function of safety. If I remember correctly, sometime ago a lot of CVEs appeared for OpenBSD, because someone seemed to have started looking for vulnerabilities and found a lot of quite trivial but severe ones. Therefore I am somewhat concerned about the the OpenBSD-libc, but neither have I looked at the code nor am I an expert. I see the point that the wrappers are probably safe, but yet the full libc is loaded and available for malicious code. But to be fair: The libc is able to handle – either by caching or by implementing the functionality – some syscalls without actually invoking a syscall and such a library allows to have an unstable API for syscalls and might allow other mitigation tactics. As a compromise: Why not multiple libraries (libc, libgo?, librs?) that are allowed to do syscalls? Yes, a larger code base to maintain but imho enforcing a single library is worse.

roblabla · on Feb 3, 2021

At least for rust, a librs would be hard to create due to the lack of a stable ABI. So even if you did a pure rust libstd (like the now defunct Steed[0]), you'd still need to either manually link that libc (breaking the mitigation), or rebuild your program each time the distro updates either libstd or the rust compiler.

I'm not sure what the situation for Go is. Assuming Go has proper support for dynamic loading and a stable ABI, it would be doable.

[0]: https://github.com/japaric/steed

steveklabnik · on Feb 3, 2021

You could expose the C ABI from your Rust code, which is, of course, stable. You would want to do that for a libc, even if a Rust ABI were available.

saagarjha · on Feb 2, 2021

Guess they finally saw sense and realized that the when platforms say the syscall ABI is unstable, it really is unstable…

chungy · on Feb 2, 2021

OpenBSD has never really cared about backwards compatibility with binaries. It often works, but then suddenly they break. The OpenBSD developers really prefer fixing bugs instead of working around them.

segfaultbuserr · on Feb 2, 2021

It's also the advantage you have when you officially maintain a complete operating system, not just a kernel.

anthk · on Feb 2, 2021

As long as you can (cross) compile go source easily, I don't see it as a problem.

jart · on Feb 2, 2021

There's no good reason for a Unix system to have an unstable system call ABI. I'm sorry but companies like Apple (who broke every Go binary a few years ago) don't get to copy syscall definitions from Bell System V and then declare it their own internal API.

Usually the only time the SYSCALL ABI breaks, it's because kernel authors intentionally chose to do it for no apparent reason. For example, OpenBSD at some point changed how mmap() was defined so that it takes seven arguments:

    void *sys_mmap(void *addr, size_t len, int prot, 
                   int flags, int fd, long pad, off_t pos);

The sixth argument doesn't do anything. It just breaks binary compatibility. It's also noncompliant with the System V ABI specification, which says system calls have six arguments max.

I work on a project called Cosmopolitan Libc which lets you create static binaries that just work on Linux + Mac + Windows + FreeBSD + OpenBSD. It was only possible to do this because Unix systems generally agree on definitions. I worked really hard to support OpenBSD since I believe in the project. I just hope they keep the ABI stable going forward.

For example, I'm really happy that this restriction only applies to dynamic binaries. It seems perfectly reasonable that they'd want to make the assumption that if a program chooses to link OpenBSD's Libc that it intends to use it. That's fine just so long as we continue having the ability to build static binaries with an alternative cross-platform Libc like Cosmopolitan.

Speaking of which, I think I might actually implement some of OpenBSD's ideas in Cosmopolitan. I could probably track down all the functions that need raw SYSCALL and use __section__ so they're all linked to the same part of the binary and then call the msyscall() function to limit it just to that page. That way as a guest libc author I'm upholding the spirit of the intent. When in Rome do as the Romans.

ChrisSD · on Feb 2, 2021

> Apple (who broke every Go binary a few years ago)

That's a really weird way to phrase it. Apple says that interface is unstable. Go uses it anyway. The unstable interface turns out to be unstable.

It wasn't Apple that broke Go binaries. It was Go.

mhh__ · on Feb 2, 2021

Just because they don't maintain it stably doesn't mean they shouldn't do so.

ChrisSD · on Feb 2, 2021

Wishing Apple did something differently does not make it so. Either you accept the stable interface for what it is or you accept your binaries may break. And you've only yourself to "blame" in the latter case.

mhh__ · on Feb 2, 2021

I don't have to blame anyone because I don't own any Apple products, what I'm getting at is that Apple's "Minimal Documentation, force through our blessed channels" way of going about things is kind of remarkable given that microsoft have been absolutely slammed for similar actions in the past.

my123 · on Feb 2, 2021

BSDs traditionally don't have a stable syscall ABI.

Windows never had a stable syscall ABI either.

Almost only Linux does it... because the kernel and libc are maintained as separate projects in the Linux world.

trasz · on Feb 2, 2021

FreeBSD does have a stable syscall ABI. But it's not _the_ stable ABI people should be using; the advertised stable ABI is the libc.

littlestymaar · on Feb 2, 2021

> Almost only Linux does it... because the kernel and libc are maintained as separate projects in the Linux world.

That's because Linux is just that: a kernel. And people are free to build their userspace on top of it.

mhh__ · on Feb 2, 2021

I don't really do OS development so I was mostly using this as a microcosm of the wider practice.

Practically as long as it's trivially callable from C I'm not bothered.

seg_lol · on Feb 2, 2021

Systemd should also be the libc provider.

MarkSweep · on Feb 2, 2021

There has to be a line draw somewhere to demarcate where the interface of a system is defined. Just as Linux does not attempt to ensure that using /dev/mem to manipulate kernel data structures works the same between versions, many operating systems don’t make promises about the syscall interface.

mhh__ · on Feb 2, 2021

I must say I'm amused that my own personal opinion on software is this down worthy

ainar-g · on Feb 2, 2021

> For example, OpenBSD at some point changed how mmap() was defined so that it takes seven arguments:

It seems like that was the case from day one, when a copy of the NetBSD code was imported to later become OpenBSD[1]. And if my reasoning is correct, saying that OpenBSD “changed it at some point” is not quite correct.

https://github.com/openbsd/src/blob/df930be708d50e9715f173ca...

comfydragon · on Feb 2, 2021

But, that link shows mmap taking 6 arguments?

ainar-g · on Feb 2, 2021

The parent comment talked about the syscall, not the library function.

   return((caddr_t)(long)__syscall((quad_t)SYS_mmap, addr, len, prot,
       flags, fd, 0, offset));

The first argument is the syscall number, the rest are the seven arguments in question.

saagarjha · on Feb 2, 2021

You've brought this up before, but I still don't have insight into why you say that copying system call definitions from Bell (if there was ever such a thing, POSIX standardized above that level for a reason…). macOS, Windows, the BSDs, all declare their stable ABI to be above the syscall layer. Why must it be defined lower?

leeter · on Feb 2, 2021

Windows does it because the vast majority of functionality is not implemented in the kernel per se, but rather in user space by preference for stability. Even things like display drivers are often primarily implemented in userspace.

The main advantage of not having a stable kernel ABI is that you have the freedom to change it for security/performance reasons. This means the OS can change how they implement things over time without breaking things as easily. MS is notorious for this and for using this fact to allow them to 'emulate' older ways of doing things even when the real implementation has long since moved on.

The main advantage of having a stable kernel ABI is that the userspace can be whatever the developer wants it to be realistically. If they want it to be just a single GO program and literally nothing else they can do that (routers are a good example of devices that do this).

jart · on Feb 2, 2021

No they don't. Only Windows forbids developers from linking static binaries. That's because they change the SYSCALL ordinals every few months. Which means that in order to load a program on Windows you have to give up control of the virtual memory address space to the operating system so that it can load DLLs at arbitrary addresses.

Unix systems have never had that restriction. Because if you use the official libc and pass -static to gcc then that means the "stable api" creates a binary that depends on the kernel abi. If the kernel authors break the abi then it means you need to recompile all your software in order to upgrade.

masklinn · on Feb 2, 2021

> No they don't.

Yes they do.

> Only Windows forbids developers from linking static binaries. That's because they change the SYSCALL ordinals every few months.

That Windows somewhat actively precludes raw syscalls doesn’t mean they are supported elsewhere. It’s always been bsd (and especially macOS) policy that syscalls are an implementation detail.

> Because if you use the official libc and pass -static to gcc then that means the "stable api" creates a binary that depends on the kernel abi.

Try doing that on macOS, you’ll find out that there is no static version of libSystem, or crt0.

pjmlp · on Feb 2, 2021

Plenty of OS allow for static linking, including Windows.

tedunangst · on Feb 2, 2021

You should perhaps look into the history of the mmap syscall arguments.

jart · on Feb 2, 2021

4.2BSD System Manual specifies it as mmap(addr, len, prot, share, fd, pos). There are six arguments. My best guess is something went horribly wrong with off_t on big endian systems and that it somehow leaked into x86_64 system call abi. Do you know?

tedunangst · on Feb 2, 2021

off_t needs 64bit alignment so registers get copied to the stack argument structure correctly on kernel entry. This is ancient voodoo.

josefx · on Feb 2, 2021

> The sixth argument doesn't do anything. It just breaks binary compatibility.

As far as I can tell it might be used to align the memory layout of the following 64 bit arguments to 64 bits. Or at least ensure that there is no auto generated padding that might contain random values.

Edit: The link provided by ainar-g 3 mentions that the way gcc handled padding of the offset field changed between gcc 1 and 2.

quotemstr · on Feb 2, 2021

There's no reason for an operating system to have a stable system call ABI. A stable ABI means that the kernel support boundary grows forever and that userspace shims are impossible in the general case. And what's the point? Calling through libc or ntdll or whatever is appropriate on a given system is no great burden. Go's libc avoidance is just hubris.

jeremyjh · on Feb 2, 2021

For Linux it makes sense because there is no project running the entire operating system. The kernel's system call interface is stable because there is no other "Linux" system interface. For an OS that maintains a libc they can make a different choice.

quotemstr · on Feb 2, 2021

So we should endure technical mediocrity forever because we couldn't get our act together socially?

There is a way out. I've previously proposed on LKML that the Linux kernel team provide an official userspace system call library that sits below libc and that all libc implementations would share. We'd forbid new system calls being called except through this library. Optionally, we'd enforce this constraint on all system calls.

This is how Fuchsia works, by the way: all Fuchsia system calls must go through one giant vDSO.

jeremyjh · on Feb 2, 2021

I agree this is a better architecture but this creates more work for the kernel team but doesn't free them from ABI stability constraints anywhere in the near-term. It would take 10-15 years to pay off I think.

wahern · on Feb 2, 2021

> Go's libc avoidance is just hubris.

Maybe my impression was wrong, but it was my understanding that Go originally preferred direct syscalls because of stack management headaches. You can't know how much stack space a libc syscall wrapper requires, which even for seemingly simple syscalls can be quite complex--e.g. glibc has to emulate POSIX thread semantics. OTOH, treating such libc wrappers like regular C FFI functions would obliterate the design and implementation assumptions around goroutine stack management. Considering that Linux was originally the first (and, let's be honest, only real target), it made perfect sense to rely on Linux syscall ABI promises.

Fast-forward a few years: 1) Go has a more mature binary format and dynamic linking capabilities, shrinking the gap between Go's internal ABI and the native libc ABI. 2) Goroutine stacks switched from split-stacks to movable stacks, and the minimum stack size became larger. 3) Demand and motivation for supporting libc wrappers (i.e. for Windows) grows. Result: Go surmounts one of its original simplifying design compromises. Though, I would assume that libc wrapper support still incurs ongoing maintenance costs on each platform; namely, managing the minimum stack requirement for each particular call, which could change overtime, while it's important not to be too pessimistic so that a syscall doesn't force an unnecessary stack resize.

jart · on Feb 2, 2021

Even if you use the system libc, if you pass the -static flag to gcc then you end up with a binary that depends on the syscall abi. If the kernel interface breaks, then all your programs need to be recompiled from scratch in order for them to work again. Are you opposed to static linking?

Blikkentrekker · on Feb 2, 2021

What, do you believe, are the advantages of static linking against a libc?

a1369209993 · on Feb 2, 2021

The same as static linking any other library: it stops the library semantics from being changed out from under you by a 'update'.

Blikkentrekker · on Feb 2, 2021

The difference is that other libraries are actual libraries that factor out common patterns, whereas the libc, despite it's name, is more so an interface, especially it's system call wrappers.

It sits so close to the kernel that the concerns of changing semantics apply as easily to the kernel as they do to the system call wrappers.

quotemstr · on Feb 2, 2021

Yes, I am opposed to fully static linking. What's the point of static linking? Windows has no static linking (all system calls go through ntdll) and it has a compatibility story better than any Unix. Static linking to libc is unnecessary for long term ABI support.

Dynamically link against libc and statically link the rest for all I care, but there's no reason not to talk to libc.

Also: the vDSO is also a form of dynamic linking. Are you opposed to the vDSO?

jart · on Feb 3, 2021

I distribute binaries. My binaries work on six different operating systems. In order to do that I had to roll my own C library. I'm happy I did that since it's so much better than being required to use six different ones.

I'm not opposed to vDSO but I disagree with how Linux maps it into memory by default. Linux should not be putting anything into the address space that the executable does not specify. MMUs are designed to give each process its own address space. Operating systems that violate that assumption are leaky abstractions imposing requirements where they shouldn't.

The main thing dynamic shared objects accomplish is giving upstream dependencies leverage over your software. They have a long history of being mandated by legal requirements such as LGPL and Microsoft EULAs. It's nice to have the freedom to not link the things.

quotemstr · on Feb 3, 2021

> My binaries work on six different operating systems. In order to do that I had to roll my own C library.

Other people have made software for decades without writing program-specific libc instances. Tell me you at least started with something decent like musl instead of literally writing your own libc from printf on up.

> Linux should not be putting anything into the address space that the executable does not specify

Execution has to start somewhere, and kernels have often reserved parts of the system address space for themselves.

> The main thing dynamic shared objects accomplish is giving upstream dependencies leverage over your software.

Loose binding in interfaces allows systems on both sides of the interface to evolve. If you want 100% complete control over your system for some reason instead of writing programs that play well with others, just ship your thing as a VM image and be done with it.

jart · on Feb 4, 2021

I used lots of code from Musl, OpenBSD, and FreeBSD. I used Marco Paland's printf. I used Doug Lea's malloc. I used LLVM compiler_rt. I used David Gay's floating point conversion utilities. The list goes on. Then I stitched it all together so it goes faster and runs on all operating systems rather than just Linux. See https://justine.lol/cosmopolitan/index.html and https://github.com/jart/cosmopolitan

Trapping (SYSCALL/INT) is a loose binding. The kernel can evolve all it wants internally. It can introduce new ABIs. Processes are also a loose binding. I think communicating with other tools via pipes and sockets is a fantastic model of cooperation. Same goes for vendoring with static linking. Does that mean I'm going to voluntarily load Linux distro DSOs into my address space? Never again. Programs that do that, won't have a future outside Docker containers.

Also, my executables are VM images. They can boot on metal too. Read https://justine.lol/ape.html and https://github.com/jart/cosmopolitan/issues/20#issuecomment-... Except unlike a Docker distro container, my exes are more on the order of 16kb in size. That's how fat an executable needs to be, in order to run on six different operating systems and boot from bios too.

quotemstr · on Feb 4, 2021

> Same goes for vendoring with static linking. Does that mean I'm going to voluntarily load Linux distro DSOs into my address space? Programs that do that, won't have a future outside Docker containers.

Strong claim. Wrong, but strong claim.

The completely-statically-linked model you're proposing might be acceptable on servers, but on mobile and embedded devices like Android, it's a showstopper: without zygote pre-initialization and DSO page-sharing, Android apps would each be at least 3MB heavier than they are today and take about 1000ms longer to start --- and a typical Android device has a lot of these processes running.

More broadly, yes, in most contexts, I see a general trend away from elaborate code-sharing schemes and towards "island universe" programs that vendor everything. But these universes need to interact with their host system using a stable ABI somehow, I believe that SYSCALL is fundamentally the wrong layer for this interaction, as it's not flexible enough. For example, the Linux gettimeofday() optimization couldn't have been done without the ability to give Linux programs userspace code to run pre-kernel via the vDSO. How do you propose the kernel do things like vDSO gettimeofday optimizations?

jart · on Feb 5, 2021

If you think I'm wrong then why don't you tell me what requirements you've faced as a software developer distributing binaries? 99% of developers have never needed to deal with the long tail of corner cases.

Doesn't everything on Android start off with the JVM as a dependency? In that case the freedom to not use DSOs is something that Google has already taken away from you. That's not a platform I'd choose to develop for unless I was being paid to do it.

On x86 RDTSC returns invariant timestamps so you technically don't need shared memory to get nanosecond precision timestamps. XNU does the same thing and they don't call it a DSO. Because that's just shared memory. I have nothing against shared memory.

kisamoto · on Feb 2, 2021

Can someone please ELI5 the significance of this for me?

Thanks

Someone · on Feb 2, 2021

In Linux, the system call interface is stable (on any given architecture, but not across them. See https://stackoverflow.com/questions/10281567/why-are-the-sys...): the way you call any kernel function is guaranteed to stay the same forever.

That means that programs that directly make system calls will keep working on newer OSes.

On many (¿most? https://unix.stackexchange.com/questions/473137/do-other-uni...) other operating systems, that’s not the case; the OS ships with a library that provides a stable interface, and system call numbers, arguments, or calling conventions can change (in theory, the interface could even change across reboots or process runs). On openBSD, that library is Libc (and, unfortunately, is a lot larger than just the OS interface. IMO, in an ideal world, it should be split in two parts, the OS interface and a C library)

Go wants to produce statically linked executables. It can’t do both that and link with LibC. It now changed to dynamically link with LibC, guaranteeing that what you compile today will run as well on next year’s OpenBSD as it does on today’s one.

On top of that, openBSD has a security feature where it verifies that system calls are made via LibC. That feature wasn’t implemented as thoroughly as possible because go made direct system calls. This change allows OpenBSD to tighten that feauture.

ben_bai · on Feb 2, 2021

I'm sorry OpenBSD also cranks libc versions whenever needed. And it depends on what changed, if it'll run on an old kernel. Since it links against a fixed version, keeping it running involves recompiling or some trickery...

So basically executable are mostly only ever good for 1 stable release. So don't throw away your source code, you gonna need it in 6 month.

kisamoto · on Feb 2, 2021

Thank you! That was really informative and I will check out those links to learn more.

I've been developing with Go for a few years now but never strayed too low level so was unsure how this change in Go 1.16 would affect me. I doubt my code will run on OpenBSD in the near future however I'm happy to know that if it does it will be supported in the future.

ben_bai · on Feb 2, 2021

go apps are and will be easy to run on OpenBSD. It's one of the languages which are really as easy as "go get githbu.com/..."

Edit: It's just the go mantra of compile once run forever doesn't really work on OpenBSD. You do have to recompile under certain conditions.

justaj · on Feb 2, 2021

So does OpenBSD's libc library differ a lot from say, Linux's libc library? Meaning: Suppose you want to write an app that can talk to both Linux's libc and OpenBSD's libc using some sort of "common language" that is compatible with both, would that be possible?

ben_bai · on Feb 2, 2021

that common language is called POSIX

Edit: All libc basically adhere to it. Except for some extentions that are not in POSIX but might be in glibc/musl/FreeBSDs libc or some extentions that are OpenBSD specific.

Someone · on Feb 2, 2021

POSIX and C standards (IIRC, the two conflict in some minor places). GNU libc also tries to adhere to Berkeley UNIX (see https://www.gnu.org/prep/standards/html_node/Compatibility.h.... Trigger warning: contains a remark on text editors)

bithavoc · on Feb 2, 2021

Same for macOS and iOS since Go 1.11, it all goes through libSystem now.

bitcharmer · on Feb 2, 2021

I always thought GO makes syscalls via libc. If it doesn't am I correct assuming no LD_PRELOAD magic will work with GO? As in no custom memory allocators, no kernel bypass for TCP stack, etc.

rfoo · on Feb 2, 2021

Yes, you are correct. That also means no proxychains.

DominoTree · on Feb 2, 2021

At one point, building Go and running its test suite on OpenBSD would replace /dev/null with a standard flat file. After some time, the disk would fill up because everything piped to /dev/null was now being stored on disk. Not to mention the additional I/O happening.

Go already uses libc by default on many platforms. But there are issues - sometimes the libc behaves differently than Go's documented APIs, but this is primarily a documentation issue. Contrariwise, sometimes, Go's native APIs don't behave on systems due to platform-specific implementation bugs.

I think this is a good move overall.

floatboth · on Feb 2, 2021

Now do this on FreeBSD too.

EdSchouten · on Feb 2, 2021

Why? As far as I know, FreeBSD's system call ABI is supposed to be (relatively) stable. It's a requirement anyway if you want to run jails of a different version.

floatboth · on Feb 2, 2021

Personally: because it makes porting to new CPU architectures hell. (The Go assembler is the worst!) I've abandoned the FreeBSD/arm64 port and two other people had to pick it up to finish it.

But also: while it's stable, it's not public. IIRC, Go developers were told about this, but decided to ignore.

And of course: stop breaking LD_PRELOAD hooks :P

lifeplusplus · on Feb 2, 2021

does this have performance implication

IcePic · on Feb 2, 2021

It will cost a cycle or three extra, but given the great cost to make syscalls in the first place (especially if you have to do flushes and mitigations for the 'recent' security bugs on x86) it probably won't matter much. 1000 vs 1003 or 10000 vs 10003 would be hard to measure in a real world program.

Also, many things libc does (which go programs need to do themselves) is to wrap a lot of the calls to things like malloc/calloc/realloc to use internal buckets and as seldom as possible call out to the kernel to get one or ten pages of ram in one call, then hand out suballocations from them for each "obj = malloc(8);" so that you minimize the amount of syscalls made, regardless of if you do them directly or via libc.

Since syscalls always were expensive (you need to save registers, check userid permissions, flip to kernel mode, do the work asked for with or without SMP locking protections, give permissions to uid for the resource returned, perhaps check if its time to deliver signals or switch to another process and if not, flip out of kernel mode, restore registers and return to the process again) a lot of the calls done by a C program is kept in libc if possible.

For example, gettimeofday() springs to mind, where a lot of trickery is done to give all programs a readonly page with the current time mapped into your program space so the call doesn't have to go via the kernel but instead becomes a memory read, since programs tend to call this thousands of times.

saagarjha · on Feb 2, 2021

Of course, on Linux the kernel API extends to the vDSO, so Go can certainly rely on its existence. But reimplementing a libc for no good reason is typically a fool's errand.

masklinn · on Feb 2, 2021

> Of course, on Linux the kernel API extends to the vDSO

The kernel ABI, however, does not. vDSO are shared objects exposing a C ABI, and oddball compilation settings have broken Go's vDSO calls in the past: https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/

Hydraulix989 · on Feb 2, 2021

OpenBSD often trades off performance for security.

ashishmax31 · on Feb 2, 2021

So does that mean no statically linked binaries since glibc can't be statically linked?

tephra · on Feb 2, 2021

OpenBSD does not use glibc.

makapuf · on Feb 2, 2021

But if even Go can't use static binaries, does that mean that static binaries are not supported by those systems ?

fctorial · on Feb 2, 2021

"static binaries" are dynamically linked to the kernel. Only the OS ISOs are fully statically linked.

ashishmax31 · on Feb 2, 2021

I would think so. Can anyone eli5 why statically linking binaries is a big deal? Even light weight container oriented linux distributions like alpine ship with musl. In which scenario would it find it use cases?

q3k · on Feb 2, 2021

Massively simplifies the build process, especially for cross-compilation. You can just build the binary, copy it over and run it, without having to ever touch containers, or having to worry about sysroots.

You can go ahead and do the following on your Mac:

    GOARCH=riscv GOOS=linux go build

to build a binary that just runs on Linux on RISC-V.

arp242 · on Feb 2, 2021

One advantage is that you need to worry less about versions; if I build a dynamic library that uses a feature from GNU libc 2.30, and someone tries to run it with GNU libc 2.23 then they will get an error.

These versions are not chosen at random: I ran in to this issue with people trying to run my binary on Ubuntu 16.04 (LTS release), which was solved by linking it statically.

Also, people may use musl libc, and while it has some compatibility with GNU libc this is far from complete.

So in short, linking it statically means it will work for the largest amount of people with a minimal of fuss for both the person building the binaries, and the people running them.

As people have mentioned, these issues are less present on non-Linux systems.

account42 · on Feb 2, 2021

> which was solved by linking it statically

You could also have linked against glibc 2.23 or 2.22 or older instead of bloating your binary.

arp242 · on Feb 2, 2021

But that's a lot more effort, and who knows if someone is still using a CentOS or whatnot with an even older version. And it still won't work for other libc implementations.

Adding an extra megabyte or so is a reasonable trade-off, with no real other downsides. It's not that large – smaller than many websites.

saagarjha · on Feb 2, 2021

Some have the viewpoint that static linking allows for what is essentially dependency pinning of binaries.

dilyevsky · on Feb 2, 2021

Not glibc in mac os/openbsd case but yeah can’t have fully static go binaries on those.

pjmlp · on Feb 2, 2021

glibc is Linux only.

sanxiyn · on Feb 2, 2021

glibc also supports Hurd.

Blikkentrekker · on Feb 2, 2021

Did Debian not also port most of glibc to support FreeBSD's kernel?

sanxiyn · on Feb 2, 2021

Yes they did. See http://www.nongnu.org/glibc-bsd/.

pjmlp · on Feb 2, 2021

Yep, it is already at 1.0 finally?

Hydraulix989 · on Feb 2, 2021

I agree with OpenBSD's philosophy of introducing breaking ABI changes between versions, if security can be improved.

Hydraulix989 · on Feb 16, 2021

This got downvoted?! If you want thirty years of backwards compatibility, go use Windows (you saw how well that worked for security...).

OpenBSD is an OS that people choose to use when they want security prioritized as a trade-off against other things such as performance and binary compatibility (there's always trade-offs). Many firewalls use it, for example.

If you value other things more than sheer data security, then there's other (beautiful) choices. (Analogously, there's no single best vehicle for everybody.)

Anyone with a modicum of actual OS kernel development experience would acknowledge these mere facts as universal truths, instead of "unpopular opinions."