Except there are some platforms where you need to go through libc and the direct syscall interface is considered private, or subject to change. OpenBSD is like this, and I believe Mac is too.
While Linux does have stable syscall interface, unlike other OS-es, you still want to go through glibc. At least for NSS, otherwise your app could be broken.
Golang has CGO_ENABLED=1 as the default for this reason.
Worth mentioning that the golang.org/x/sys/unix package has better support for syscalls than the og syscall package nowadays, especially for some of the newer ones like cachestat[0] which was added to the kernel in 6.5. AFAIK the original syscall package was 'frozen' a while back to preserve backward compatibility, and at one point there was even a bit of drama[1] around it being marked as deprecated instead of frozen.
Didn't they go back to Glibc in 2017 after a syscall silently corrupted several of their tightly packed tiny Go stacks? The page you link to seems to refer to a proposal from 2014 as "new".
> Didn't they go back to Glibc in 2017 after a syscall silently corrupted several of their tightly packed tiny Go stacks?
You must be thinking of https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/ which was about the vDSO (a virtual dynamically linked library which is automatically mapped by the kernel on every process), not system calls. You normally call into the vDSO instead of doing direct system calls, because the vDSO can do some things (like reading the clock) in an optimized way without entering the kernel, but you can always bypass it and do the system calls directly; doing the system calls directly will not use any of the userspace stack (it immediately switches to a kernel stack and does all the work there).
IIRC that was specifically on macOS and other BSDs which don't have a stable syscall interface. They still use raw syscalls on Linux, which guarantees syscall stability on pain of Linus Torvalds yelling at you if you break it.
Linus ships a kernel –– where would his stable interface live if not the syscall ABI? The *BSD and macOS folks ship operating systems, where they have the option of defining their ABI at a higher level of abstraction.
Linux could have made their own libc and mandated use of it. But they didn't. They chose a language agnostic binary interface that's documented at the instruction set level.
As a result of that brilliant design choice, every single language can make Linux system calls natively. It should be simple for JIT compilers to generate Linux system call code. No need to pull in some huge C library just for this. AOT compilers could have a linux_system_call builtin that just generates the required instructions. I actually posted this proposal to the GCC mailing list.
That's not what Linux is, though. It's a kernel. libc is a userspace library. The Linux developers could also make their own libpng and put their stable interface in there, but that's not in scope for their project.
> As a result of that brilliant design choice, every single language can make Linux system calls natively.
That is like saying it's a brilliant design choice for an artist to paint the sky blue on a sunny day. If Linux is a kernel, and if a kernel's interface with userspace is syscalls, and if Linux wants to avoid breaking userspace with kernel updates, then it needs a stable syscall interface.
> No need to pull in some huge C library just for this.
Again, I'm not sure why the Linux project would invent this "huge C library" to use as their stable kernel interface.
They could but they didn't. At some point, Linux almost got its own klibc. The developers realized such a thing wasn't needed in the kernel. Greg Kroah-Hartman told me about it when I asked on his AMA:
The importance of this design should not be understated. It's not really an obvious thing to realize. If it was, every other operating system and kernel out there would be doing it as well. They aren't. They all make people link against some library.
So Linux is actually pretty special. It's the only system where you actually can trash the entire userspace and rewrite the world in Rust. Don't need to link against any "core" system libraries. People usually do but it's not forced upon them.
> if Linux wants to avoid breaking userspace with kernel updates, then it needs a stable syscall interface
Every kernel and operating system wants to maximize backwards compatibility and minimize user space breakage. Most of them simply stabilize the system libraries instead. The core libraries are stable, the kernel interfaces used by those core libraries are not.
So it doesn't follow that it needs a stable syscall interface. They could have solved it via user space impositions. The fact they chose a better solution is one of many things that makes Linux special.
> The importance of this design should not be understated. It's not really an obvious thing to realize. If it was, every other operating system and kernel out there would be doing it as well.
No, they would not. I can say this with confidence because at any point in the last several decades, any OS vendor could have started to do so, and they have not. They have uniformly decided that having a userspace library as their stable kernel interface is easier to maintain, so that's what they do. The idea that the rest of the world hasn't "realized" that, in addition to maintaining binary compatibility in their libc, they could also maintain binary compatible syscalls is nonsensical.
The Linux kernel, on the other hand, doesn't ship a userspace. If they wanted their stable interface to be a userspace library, they'd need to invent one! And that would be more work than providing stable syscalls.
> So Linux is actually pretty special. It's the only system where you actually can trash the entire userspace and rewrite the world in Rust.
That's not rewriting the world, that would be a new userspace for the Linux kernel. You're still calling down into C, there's just one fewer indirection along the way.
> So it doesn't follow that it needs a stable syscall interface. They could have solved it via user space impositions.
They could have, but as Greg Kroah-Hartman pointed out, that would have just shifted the complexity around. Stability at the syscall level is the simplest solution to the problem that the Linux project has, so that's what they do.
It would be pretty funny if the kernel's stability strategy was in service of allowing userspace to avoid linking a C library, considering it's been 30+ years and the Linux userspace is almost entirely C and C++ anyway.
And for what it's worth, the reason Windows has such a high binary backwards compatibility that win 98 programs can easily run on win 11 or so is that they have this extra abstraction layer.
I’m aware of this but I really don’t the benefits of this approach; It causes issues in eg openbsd where you can only call syscalls from libc, and it seems like they’re trying to outsmart the os developers and I just don’t see an advantage.
> 1. No overhead from libc; minimizes syscall cost
The few nanoseconds of a straight function call are absolutely irrelevant vs the 10s of microseconds of a syscall cost and you lose out on any of the optimizations a libc has that you might not or didn't think about (like memoization of getpid() ) and you need to take on keeping up with syscall evolution / best practices which a libc generally has a good handle on.
> No dependency on libc and C language ABI/toolchains
This obviously doesn't apply to a C syscall header, though, such as the case in OP :)
Because of the aforementioned problems, since glibc 2.25,
the PID cache is removed: calls to getpid() always invoke
the actual system call, rather than returning a cached value.
Get rid of libc and you gain the ability to have zero global state in exchange. Freestanding C actually makes sense and is a very fun language to program in. No legacy nonsense to worry about. Even errno is gone.
> Get rid of libc and you gain the ability to have zero global state in exchange.
No you don't, you still have the global state the kernel is maintaining on your behalf. Open FDs, memory mappings, process & thread scheduling states, limits, the command line arguments, environment variables, etc... There's a shitload of global state in /proc/self/
Also the external connections to the process (eg, stdin/out/err) are still inherently global, regardless of however your runtime is pretending to treat them.
And it's not like you even managed to reduce duplicated state since every memory allocator is going to still track what regions it received from the kernel and recycle those.
> Freestanding C actually makes sense [..] No legacy nonsense to worry about
This is a big one. Linking against libc on many platforms also means making your binaries relocatable. It's a lot of unnecessary, incidental complexity.
You can still randomize heap allocations (but not with as much entropy), as usually the heap segment is quite large. But you don't get randomization of, e.g. the code.
ASLR is a weak defense. It's akin to randomizing which of the kitchen drawers you'll put your jewelry in. Not the same level of security as say, a locked safe.
Attacks are increasingly sophisticated, composed of multiple exploits in a chain, one of which is some form of ASLR bypass. It's usually one of the easiest links in the chain.
> On the other hand all of that comes back to bone you if you’re trying to benefit from vDSO without going through a libc.
At least the vDSO functions really don't need much in the way of stack space: generally there's nothing much there but clock_gettime() and gettimeofday(), which just read some values from the vvar area.
The bigger pain, of course, is actually looking up the symbols in the vDSO, which takes at least a minimal ELF parser.
> At least the vDSO functions really don't need much in the way of stack space: generally there's nothing much there but clock_gettime() and gettimeofday(), which just read some values from the vvar area.
GNU aren't the OS developers of the Linux kernel. Think of the Go standard library on Linux as another libc-level library. On the BSDs there is a single libc that's part of the OS, on Linux there are several options for libc.
> OpenBSD allows making syscalls from static binaries as well.
Do you have a source for this? My Google searches and personal recollections say that OpenBSD does not have a stable syscall ABI in the way that Linux does and the proper/supported way to make syscalls on OpenBSD is through dynamically linked libc; statically linking libc, or invoking the syscall mechanism it uses directly, results in binaries that can be broken on future OpenBSD versions.
I upvoted for the great links, but I still don't think a static binary that will break in the future is meeting the expectations many have when static linking.
> we here at OpenBSD are the kings of ABI-instability
> Program to the API rather than the ABI. When we see benefits, we change the ABI more often than the API.
> I have altered the ABI. Pray I do not alter it further.
The term ABI here though is a little imprecise. I believe it just refers to the syscall ABI. So, it should be possible to make an "almost static" binary by statically linking everything except libc, and that binary should continue to work in future versions of OpenBSD.
It's a lisp interpreter with a built in system-call primitive. The plan is to implement everything else from inside the language. Completely freestanding, no libc needed. In the future I expect to be able to boot Linux directly into this thing.
Only major feature still needed for kernel support is a binary structure parser for the C structures. I already implemented and tested the primitives for it. I even added support for unaligned memory accesses.
Iteration is the only major language feature that's still missing. I'm working on implementing continuations in the interpreter so that I can have elegant Ruby style iteration. This is taking longer than expected.
This interpreter can make the Linux kernel load lisp modules before its code even runs. I invented a self-contained ELF loading system that allows embedding arbitrary data into a loadable ELF segment that the kernel automatically maps into memory. Then it's just a matter of reaching it via the auxiliary vector, The interpreter uses this to automatically run code, allowing it to become a freestanding lisp executable.
Maybe bootstapping a new language with no dependencies.