The coverage and the attention to detail here is incredible. Hats off to Justine.
I would love to know how this table is generated and what the test process is. What's the best way to run tests like this across such a wide variety of OSes? Maybe vagrant images?
Justine here. I have a fleet of VMs for various OSes running the runitd.com daemon. I then write unit tests in Emacs and when I press CTRL-C CTRL-_ it builds the test, gzips it, and uses runit.com to deploy and run the executable to every VM, and reports back the output within milliseconds. Here's a screenshot of me testing the sendfile() system call: https://justine.lol/cosmopolitan/sendfile-testing.png When I run `make test` it deploys and runs all 465 test executables to all seven test VMs currently in the fleet, and it takes about 10 seconds. Here's a video: https://storage.googleapis.com/justine/sizetricks/runit.mp4 All runit.com and runitd.com do (their sources are in tool/build/ of the cosmo github repo) is basically `scp program host: && ssh host ./program` which is what I used to do, except SSH was unacceptably slow, so I wrote runit, which use PSK TLS and DEFLATE to transfer files over the network.
If you don't mind an unrelated question - what do you mean when you write "WIN32 lacks consistency" for things like chdir() and unlinkat()? If you mean the inability to pass HANDLEs instead of paths, you might want to use the native NT APIs (like NtOpenFile and NtDeleteFile), which let you use handles directly.
I used to work with someone who was an expert in writing cryptography libraries. He insisted on never using any system calls for maximum portability (he also wrote his own memory allocators and optimized for small binary size). Seemed quirky to me but he was proven right multiple times as our ultraportable library let us book some big deals that would have been otherwise infeasible without a big rewrite
If you know how much memory you need, you can just have a section of zero-initialized-data for that, which the allocator then handles.
The allocator serves also as a proxy, as in: only the allocator needs to know how to request heap space.
Portable in that case probably means that you have little to no code change required for a port.
Something else to consider is that there may not even be an operating system the program was run on.
> there may not even be an operating system the program was run on.
Sure, in this case you can just implement malloc the same way a kernel implements me. My failure of understanding is when there _is_ a kernel that your program needs to interact with.
In the executable file you can specify that the operating system should give you a writable space of a given size, initialized with whatever you want. I think that's frequently used to initialize static variables, so you might get the desired result by just telling the compiler you want a large static array.
You don't need to avoid _all_ system calls. Just call sbrk/VirtualAlloc/vm_allocate once and then dole out
smaller chunks yourself.
That way if you need to add new platform, it's only one place; you're shielded against weird platform bugs (Apple goes one way, Microsoft the other), so the app behaves the same; you can tweak system to your particular use.
I suppose if you're willing to malloc big chunks and then potentially never have them released until the process terminates (as free often doesn't do this), one might call that writing one's own memory allocator despite not using mmap / VirtualAlloc.
https://www.hboehm.info/gc/ is a windows+linux+unix allocator implementation - it uses VirtualAlloc on Windows and mmap or sbrk on Unix.
For performance reasons (avoiding a syscall) almost all malloc implementations only use sbrk / mmap occasionally and keep the malloc calls in userspace.
Syscalls are provided by the operating system, it doesn't matter which C library you use. The C library usually just provides a nice API to call the syscall. And it it doesn't, you can still call it directly via syscall(1).
Yes but that's not what this table is describing. The table appears to be describing cosmopolitan libc's wrappers. Certainly most of these syscalls do not exist on Windows, so the "Windows" column couldn't possibly be describing raw syscalls.
Not long ago golang switched from doing direct system calls to libc on *BSD systems. The explanation I believe was that the BSD systems fix a lot of kernel problems and compatibility issues in libc and it was just easier to use libc then try to rewrite all of that in the golang standard libraries. On Linux it uses direct system calls for everything.
I believe the impetus for Golang was macOS and Windows, rather than the BSDs.
It’s not that libc is used to fix kernel problems/compatibility issues, but rather, the kernel ABI is not a supported, stable public API.
On macOS, there are cases where using the syscall ABI directly will result in your code breaking when interacting with any other code that does correctly use libc, due to out-of-sync userspace state maintained by libc.
For example, the fork(2) implementation in macOS’ libSystem will invalidate the cached copy of the current process’ pid used by getpid(2).
If you fork(2) by directly trapping to a syscall, the cached pid won’t be invalidated; any future calls to getpid(2) through libSystem will return the stale pid.
Yes, I believe what language implementation would want to do is to use the lowest-level API that’s supported and stable. On Linux, that happens to be syscalls, on BSD and macOS, that’s libc, and on Windows, that’s the so-called “Win32” API (kernel32.dll and such).
Meanwhile Windows makes no effort at all to keep syscall numbers stable, because you aren't supposed to use them directly. For example NtCompleteConnectPort has changed id five times in Windows 10 alone.
Meanwhile OpenBSD was discussing ideas to extend their system-call-origin verification to only allow syscalls from libc.
If you're only using syscalls on macOS, you're fine too, as far as libSystem is involved. You're not fine if newer kernels change the syscall ABI, though.
I'm not sure it was libc in particular. In general, the Go compiler's initial design was very enthusiastic about the idea of fully statically linked binaries and easy cross-compilation. Avoiding system libs does kind of flow naturally from those goals.
Why is it that that every time the topic of system calls comes up, all these people come out of the woodwork to say we're not allowed to not link a platform's libc? That's an impossible to compromise with position, because Cosmopolitan Libc is a libc, so it can't very well depend on six other c libraries. It would destroy the project. Also where is this even written? The maintainers of systems like FreeBSD and NetBSD have done SO MUCH to help our project. I don't think they're anything like the anti-competitive userspace control freaks people on forums suggest they are. I think the folks making kernels are great people and outstanding engineers who want to cultivate an environment where programs can do what they want.
"Apple does not support statically linked binaries on Mac OS X. A statically linked binary assumes binary compatibility at the kernel system call interface, and we do not make any guarantees on that front. Rather, we strive to ensure binary compatibility in each dynamically linked system library and framework."
More to the point, platform vendors decide what their ABI boundaries are. Historically Unix vendors (back in the late 80s and early 90s) had no ABI boundaries... the expectation was every new OS release would require a recompile of all software. Obviously Linus has very different ideas about stability than the other Unix like systems of the time (which was a good thing), and focused on syscall stability. That made a lot of sense since he only wanted to maintain a kernel, not a full OS distribution.
When modern macOS and Windows developed their ABIs they both were relatively mature OS distributions including a dynamic linker and default runtime libraries, and the ABI boundary chosen was well above the kernel as that is an easier place to define and maintain it.
One reason (may not be their main one) could be that system calls have more overhead than function calls, and some things (like futexes) make more sense to implement on the userspace side. As another example, imagine a function for measuring time. You don't want that to be a system call if it's meant to be efficient, and when a better mechanism comes along, changing the implementation on the user space side is a lot easier and potentially more efficient if the kernel interface can be changed without breaking compatibility.
Quip: The higher up the rug, the more crap that there's room to sweep under it.
Or: Life can be easier for the kernel team if they're allowed to make breaking changes. Then task the lib team with writing shims / wrappers / etc. to fix all the problems which that causes. Then the manager of the lib team may have a perfect reason to boost his headcount. Then...
It makes it possible to change the API of system calls or completely remove them if better ideas come along for implementing their functionality.
Without it, you end up maintaining system calls that nobody should use.
Now, you could argue that moves the mess to the c library, which would still have deprecated functions that used to call old system calls, but now are built on top of better ones, but there’s more flexibility there. Application programmers can, one by one, move to newer c libraries that remove that cruft.
In the 80s and 90s, you could often use your old compiled binaries for decades. No recompile after an OS release. For example, I had binaries compiled in the mid-80s on a DEC running Digital UNIX that worked through all the upgrades to the systems to bring them to Tru64/TruCluster. That was a big part of the value of the system.
I also had a Mathematica binary (statically compiled except for libc) that ran on Linux from 1998 to 2010, including X windows (at some point, somebody moved the X files to a different location, so I had to set an env var).
Having a stable syscall interface is one thing... not supporting static linking is another... Even on linux, statically linking libc, GTK, ... is not a great idea.
I get you're carrying the weight of a thousand such conversations, but this poster didn't really come out railing against your choice or your libc. They just explained a consideration around this topic that others made. Didn't mention that you or anyone else shouldn't have and used fairly gentle language like "I believe" at that.
You’re clearly an extremely talented dev, but this is an immature attitude to have. It has nothing to do with being “anti-competitive”. It’s about how a particular system is designed. Linux decided for many reasons, including organizational and political reasons, to have the stable surface be the kernel ABI. Other systems have different considerations and design philosophies and chose differently. These are all valid choices (and even if you disagree, you’re not going to convince them to change). It’s the job of a developer targeting a platform to deal with that reality.
That's why some people prefer OpenBSD (security, correctness, balance between functionality and minimalism), others NetBSD (portability for small/legacy/older systems, a barebones BSD) or FreeBSD (desktop/server performance focused, SMP and network/BUS I/O takes preference over security).
Cosmopolitan Libc is a libc, so it can't very well depend on six other c libraries
Is that really the case? Given it’s specifically intended as a highly portable libc, in some cases maybe it could be implemented by calling through to the blessed system libc, rather than kernel calls.
I mean, that likely wouldn’t fit with your goals for the project, and it would likely need some horrible linker hackery, but in principle it seems technically possible. (I’ll believe you if you say it’s completely impossible, though!)
Let me crawl out of the woodwork to ask a question about the consequences: I understood Windows changed its syscall API a few times. How do you deal with that? What happens if they renumber again?
We don't use SYSCALL on Windows. We link KERNEL32 and friends. It adds bloat to our binaries (for instance, hello world could be 4kb rather than 16kb if we didn't need DLLs) but Microsoft leaves us with little other option, since it'd require just as much bloat to use Xed to wrangle the syscall ordinals out of NTDLL. Our policy is to stick with stable supported interfaces whenever possible. Since XNU's SYSCALL interface is allowed, and it's nearly identical to those of Linux and BSDs, it'd be tragic to not use something that fits so perfectly hand in glove with the rest of our system call support. Look here, all we need to support a XNU system call is 16-bits of a word. https://github.com/jart/cosmopolitan/blob/2.1/libc/sysv/sysc...
> Our policy is to stick with stable supported interfaces whenever possible
Except you're not doing that consistently though.
On OpenBSD, mincore(2) was removed 3 years ago, and it's UNIMPL syscall 78 was eventually recycled by a different system call: mquery(2), which your library calls expecting mincore and passes bogus arguments to.
I would be very surprised if there aren't more serious mistakes lurking in your library, in fact I know there are.
The macOS syscall interface policy is the same as Windows. In practice it has had less churn then then NTDLL which may make it feel more practical, but it is only more "allowed" in the sense that it is technically feasible, not that it is guaranteed to be stable or supported.
/merge:.rdata=.text cuts out an entire section 0.5K
/nocoffgrpinfo cuts out about 256 bytes from your .text section
/emittoolversioninfo:no is supposed to omit the RICH header, but doesn't seem to work anymore.
/stub:stub.bin will let you replace the default DOS EXE stub with something smaller. Using a 64 byte file will get the EXE header and PE header to fit in the first 512 bytes (when combined with merging the .rdata and .text sections)
Set "Entry Point" to main. This completely bypasses the C standard library and CRT, none of that gets initialized or called.
Then you end up with an EXE containing 3 sections: The EXE header (512 bytes), the .text section (512 bytes, but only 141 bytes actually used in there), and the .idata section (512 bytes, for importing DLLs, only 41 bytes actually used)
Just because I'm a bit curious and don't know the history here: Did they need to renumber, and if so... why?
(I'm not arguing for nor against syscalls-as-kernel-API, I'm honestly just curious. Without knowing too much about it, it seems pretty sensible to support statically linked executables, but I might be missing something. I guess linking against an "as old as you need to support" libc dynamically isn't the end of the world, but it does constrain the build environment somewhat.)
I'm pretty sure Windows just automatically generates syscall numbers in their build process. Since the kernel and the dynamic libraries that are the only blessed way to interact with the kernel are shipped together, that's not a problem from Microsoft's point of view.
For example, look at [1] and search for e.g. 0x01aa, to see how the meaning of that syscall number changes in a pretty systematic way over releases.
> I guess linking against an "as old as you need to support" libc dynamically isn't the end of the world, but it does constrain the build environment somewhat.
I can't speak for Windows, but on macOS, you build for older targets by passing `-(mmacos|ios)-version-min=` to the compiler to specify your deployment target version; this is used to determine symbol visibility, API visibility, toggled #ifdefs, etc.
The provided version is also stored in the Mach-O load commands of your executable, and will used to select compatible symbols (and enable compatibility shims) when loading your binary's image at runtime.
No need for static linking — or jumping through hoops to build against an "as old as you need to support" set of installed libraries.
They don't per se need to renumber system calls, but for the most part the NT kernel developers saw little reason to enforce syscall number stability.
Use of DLLs is baked VERY deeply into windows. The kernel is not even a monolithic file. It is an exe like most others pulling in dlls that implement other functionality. It even features like API-set DLL redirections which mean the bootloader for the kernel needs to implement a fair bit of the PE loader functionality of the kernel just to load the kernel. (I've no real clue if there is shared code between the two loaders, or if they are two separate loaders that implement similar things. Many of the options/features of the full kernel executable loader are not really needed in the bootloader.)
So it is not much of a surprise that making the numbers stable to support fully statically linked executables was not really something they cared about. From their perspective you could always just pull in the ntdll.dll for your syscalls like they intended (or more likely a higher level win32 dll that uses the syscall).
Many developers seem to have a cultish belief that the systems libc is the Only True Libc, missing the point (or being incapable of understanding) that alternatives are possible.
Is that really any more cultish than the idea that the kernel interface is the only true interface, and everything else should be replaceable? It’s just a different approach.
What’s so crazy about an OS vendor wanting to exercise control over that userspace shim layer, rather than assuming that user programs will be poking directly at the kernel?
You can argue that having multiple libc implementations is good, sure, but it’s not the only way of doing things and it’s not without tradeoffs.
The argument in the first paragraph doesn’t really work, I think: the kernel-user boundary is a security one, while the libc-app one is not (OpenBSD is something of an outlier in this respect, and even its case is a bit murky). So it is a tad more natural to draw the ABI boundary at the former point than at the latter, though of course this is not at all the only design consideration.
Looking at this through a KERNEL32 lens, I can actually somewhat see the alternative point of view: in NT < 4 it was mostly an in-process RPC proxy for the subsystem server in CSRSS, and one can argue that libc on Unix-likes is in the same position (cf the vDSO on Linux). You can build RPC to a trusted server either way: define a stable wire protocol; or require the client to load proxy code into its address space. COM on Windows and hardware-accelerated graphics on Linux both take the second approach.
But then why the hell does the proxy also have opinions about memory allocation, assignment of TLS slots, or floating-point formatting? (KERNEL32 has them on the first two points as well, mind you.) History[1] aside, does this really look like good engineering? (KERNEL32 makes a bit of a point there, but given how much HeapAlloc sucks as an allocator compared to the interoperability benefits it brings, I’m not sure it should be taken too seriously.)
It's a bit of a paradox, isn't it? In theory it seems quite elegant to avoid tightly coupling the memory allocator to the kernel. But in practice, the fact that every pointer has to be freed by the same DLL that allocated it is famous for causing problems for Windows developers. As is, more generally, the inability to pass C and C++ standard library types across DLL boundaries.
I think my ideal system would look like Linux in some ways. Any given Linux system has a 'standard' shared library ecosystem, with a single shared libc, and usually a shared C++ standard library as well. Executables and libraries that participate in that ecosystem can assume they'll use the same standard libraries and so can pass those libraries' types across module boundaries. Alongside them are 'hermit' executables, like Go programs or C programs statically linking to musl, that avoid the standard libc. But those executables typically also avoid all other system shared libraries (or greatly constrain their usage), so the difficulty in using their APIs (due to mismatched standard library types) isn't a big deal. Now, Linux sort of forces this approach by having the dynamic linker itself be part of libc. But that has some downsides, such as forcing the vDSO to work differently from every other library. I think I'd prefer to have a universal dynamic linker, like on Windows, but to still have 'ecosystem shares a libc, while hermits stick to themselves' as a convention.
This is incorrect. On most platforms, the libc is part of the operating system, and the libc API is the supported system call interface; user programs aren't allowed to make system calls directly to the kernel, any more than they're allowed to jump directly into the middle of kernel routines.
As described elsewhere in this thread, that is the policy of MacOS X and Windows.
It is also a policy which OpenBSD is moving towards:
I think by this they mean the policy of the Linux kernel "never breaking user land".
Libc is seen as user land by the Linux kernel, it provides an API for programs to abstract them away from syscalls. Libc dies syscalls for programs.
By "never breaking userland", Linux promises to never break libc by never changing syscall numbers/args/types - new syscalls are added in ways that don't break older syscalls.
Other platforms don't make this explicit promise and reserve the right to shuffle syscalls around, change them, remove them, etc as desired - making maintaining a libc for them a pain in the ass, and programmers are told to only use the libc as syscalls don't come with guarantees.
You can (and I often do) program on windows using only syscalls, but your program tends to need rebuilding across major versions as the syscall interface isn't stable.
Ah ok, now the explicit vs other OSes "implicit" guarantees makes sense to me. I had no idea other OSes changed their system call numbers from time to time. Thanks for your detailed response!
Some BSDs (and Zircon/Fuchsia) block any syscall instructions that don't originate from inside the vDSO. Which means the only way to do them is to go through libc (well, the parts of libc that are provided by the vDSO, but semantics). https://fuchsia.dev/fuchsia-src/concepts/kernel/vdso mentions this, for example.
I'm almost 100% certain that anyone can directly call Fuchsia's vDSO syscalls without linking in any form of libc. It's not at all like the macOS situation.
vDSOs are essentially pre-linked C libraries though. On Linux they are normal shared objects with the C ABI. They can even be found in the file system with names like linux-vdso.so.1.
The vDSO is a normal .so file that gets built alongside the Linux kernel. The .so file is installed on the file system like any other library. When mapped into the process address space, it's just a normal ELF image. The functions all conform to the normal C ABI for the platform.
It's true that we don't need to load the vDSO into the address space of the process since the kernel does it for us. We do need to manually link the vDSO functions though: the kernel uses the auxiliary vector to pass the address of the vDSO ELF image to the process, allowing it to parse the ELF header and find the addresses of the functions. In most cases, the libc will do this while initializing itself prior to calling main.
> The .so file is installed on the file system like any other library.
Nope, it is built into the kernel image - at least on my system, but there doesn't seem to be a config option to disable that.
> When mapped into the process address space, it's just a normal ELF image. The functions all conform to the normal C ABI for the platform.
That's purely for convencience because it allows ASLR for the vDSO and there is no point in making it a new ad-hoc format if every C library already needs an ELF dynamic linker - if the vDSO was a thing before ELF it would surely have a different format.
> We do need to manually link the vDSO functions though: the kernel uses the auxiliary vector to pass the address of the vDSO ELF image to the process, allowing it to parse the ELF header and find the addresses of the functions. In most cases, the libc will do this while initializing itself prior to calling main.
You need to look up the entry points in the vDSO before you can call them but that is really no different than checking the kernel version before deciding which syscalls to invoke. And no, the vDSO is not loaded like any other .so - it is very much a special case [0] even if the libc ends up reusing some of the normal .so code and structures. For example, it needs to be loaded even for statically linked binaries [1].
Yea, if I remember correctly OpenBSD only allows syscalls to originate from libc so the OpenBSD devs had to create an exception for golang back when it didn’t use the system libc.
Some of the Win32 things mention "millisecond precision only", but this isn't quite correct.
Waitable Timers on Win32 let you request time values in units of 200 nanoseconds. See `CreateWaitableTimer`, and `SetWaitableTimer`.
`Sleep` and `SleepEx` can be implemented using a waitable timer, just use `WaitForSingleObjectEx`.
If you need to wait for an object (semaphore, etc) using a time unit other than milliseconds, you can use `WaitForMultipleObjectsEx` with one of them being a waitable timer.
---
The next question is if Win32 can actually deliver those requests for precise times or not. From the testing I did a while ago where I was simulating Sleep, I got actual sleep times rounded to about 4ms. So much for requesting nanosecond level precision.
"Your program will also boot on bare metal too. In other words, you've written a normal textbook C program, and thanks to Cosmopolitan's low-level linker magic, you've effectively created your own operating system which happens to run on all the existing ones as well."
I’m not an expert but I was under the impression these we’re library calls on macOS, and the system calls were unstable and not supposed to be used directly?
one of the most cringe things I'll see in "high performance" code is ad hoc syscalls in the middle of loops and such. it should be much more obvious imo in environments like python notebooks that these less scalable operations are happening
Obviously, do whatever you want if you’re just playing around, but please do be aware that the syscall interface on macOS is private, subject to change at any time, and should never be used directly.
That sounds like using a Mac in general:
“Do what you want if your just playing around, but be aware that [if you’re not a blessed developer] macOS is private, subject to change at any time, and should never be used [professionally]”
What do alternative libc implementations like musl or dietlib do on those systems? Do their maintainers manually track the unstable syscall interface or are developers forced to use the system provided libc because the alternatives just don't work completely?
Just because no one's done it before doesn't mean we're not allowed to do it. What Cosmopolitan Libc does is no different from any statically-linked binary on BSD, because they too use the SYSCALL instruction directly in a way that can't be changed at will by the operating system maintainers. The BSD operating systems do not forbid static linking. If you look at their syscalls.master files, a lot of time, particularly with ones like FreeBSD, you'll notice a large number of system calls which are only there to preserve backwards compatibility with statically linked binaries. It's really nice of them to do that.
When people talk about the supposed requirement to depend on the platform libc dso, what they're actually talking about is Apple frowning upon statically linked binaries. https://developer.apple.com/library/archive/qa/qa1118/_index... They don't forbid it though, like Microsoft and Fuschia do. Fuschia for instance uses RIP origin detection. Windows does it by changing the RAX ordinals fortnightly. Apple simply asks that we say, hey, if you use Cosmo there's some risk Apple might break our binaries. We take proactive steps to avoid that happening with Cosmo. For example, we don't do some of the things Go did, like reverse engineering the memory layout of Apple's time functions. Cosmo sticks to the APIs that are shared by UNIXes in general, e.g. gettimeofday(), rather than depending on Apple's own internal designs, e.g. Mach system calls. I don't believe Apple can rightfully claim APIs that aren't their own, as being their own implementation detail which they can change at will. We do our best to respect Apple's boundaries, so I believe the risk of breakage with Cosmo on Apple should be minimal.
> Cosmo sticks to the APIs that are shared by UNIXes in general, e.g. gettimeofday(), rather than depending on Apple's own internal designs, e.g. Mach system calls.
Apple doesn't differentiate between "UNIX" and "Apple" when it comes to API; a particular API is either public with stability guarantees, or it is private and unstable.
> I don't believe Apple can rightfully claim APIs that aren't their own, as being their own implementation detail which they can change at will.
Apple absolutely does claim this, and if that means the library ABI has to change to accommodate a change, the dynamic linker and symbol tricks are leveraged to keep things working for code built against the earlier ABI.
If an engineer comes up with a really clever trick to make gettimeofday() just a tiny bit faster, but this requires breaking syscall ABI, they will absolutely do that.
> I believe the risk of breakage with Cosmo on Apple should be minimal.
Using system-private interfaces on Apple platforms means there are no guarantees here. You'll be OK, sometimes, for some releases. It mostly worked for Go, for a while.
Google smashed their own toys; the approach was fundamentally flawed from the start, they tried to make a go of it anyway, it didn't work correctly (which they should have known would happen, and were repeatedly told it would), and they tossed out the idea and reimplemented it correctly.
It's not "two megacorps" disagreeing, it's "supported interface" vs "unsupported, unstable, system-private interface".
You are confusing API and syscall here. Apple tries incredibly hard to keep APIs (as implemented via libSystem) working, and actually has passed UNIX conformance, so generally speaking these work, are stable, and don't get broken.
The underlying syscalls that support them have occasionally changed and broken existing apps that bypassed libSystem. For example, Sierra broke all go apps that called `gettimeofday` (the exact syscall jart used above an example!) because the go compiler emitted direct syscalls: https://github.com/golang/go/issues/16606
My conflation of C API, C ABI, and syscalls was deliberate, though perhaps ill advised. I understood that Apple has some of the above marked as proprietary, that nevertheless come standard with UNIX. Probably not syscalls since those involves numbers & interrupts, but at least C ABIs.
I'd like to know, did Apple actually break a "private" C ABI when this ABI actually implemented a standard UNIX function? I know they could, but did they?
Perhaps your conflation was deliberate, but it actually feels like you may not be conflating exactly what you think you are. Let me try to be fairly precise about some things:
By standard UNIX function I take it that you mean a function defined via POSIX and part of one of the various specifications used for UNIX certification (for the moment lets ignore the fact there are multiple revisions and optional extensions). It is important to note that the specifications says essentially nothing about:
* Binary formats
* Libraries (static or dynamic)[1]
* What symbols are in what library
It is all written in terms of what source code should compile, and how that compiled code functions. Everything else such as calling conventions, syscall interfaces, what is library code vs a syscall, etc is an implementation detail.
So given the above, I am not entirely sure what you mean by a `"private" C ABI when this ABI actually implemented a standard UNIX function.` Do you mean has Apple ever changed an internal function called by a function specified in POSIX? IOW, if your question is does Apple reserve the right to implement `stat()` as a call to `stat_internal()` and then change the arguments to "stat_internal()" ? Absolutely.
If you mean has Apple ever changed a function that is part of POSIX but it considers private? Those don't really exist on macOS, if POSIX allows it and it is part of the standard that has passed conformance it is by definition public and the C ABI level interfaces for as exposed by libSystem are stable (which is not to say that all of those interfaces are great, but they are standard and supported). IOW, the standard specifies that `stat()` exists, and it is by definition public.
That is not to say incompatible changes have never had to happen (for example, when UNIX conformance was originally implemented a lot of existing functions required incompatible changes to pass the test suites). All of that is handled via symbol versioning and redirecting new binaries to different symbols than the older binaries used, which maintains both binary compatibility for old binaries and allows new source to compile in the correct (conformant) way. This is why if you inspect libsystem_kernel.dylib you see variants of symbols like:
* _recvmsg
* _recvmsg$NOCANCEL$UNIX2003
* _recvmsg$UNIX2003
The old ones keep working with the existing semantics for older binaries, the headers have magic in them redirect to the newer ones when targeting the appropriate minimum OS version, and the userspace libraries have multiple entry points that provide both sets of semantics (often implemented in the userspace shim, sometimes by dispatching to the kernel with different syscalls).
[1]: Despite that, at this point POSIX does specify some of the semantic of `dlopen()` and `dlsym()`, which is pretty insane when you think about.
> Just because no one's done it before doesn't mean we're not allowed to do it.
If you by "allowed" means "is meant to work", then yeah it means it's not allowed on many of these platforms.
OpenBSD has infrastructure to control which memory regions syscalls are allowed to stem from, specifically designed to block syscalls from anything by libc. See https://lwn.net/Articles/806776/
I do not recall if it was enabled, but when someone does not provide you an ABI it very much means that you are not supposed to try to write code against it. If you do it anyway, you will have to live with the resulting instability.
It significantly simplifies OS development. E.g., syscalls can be modified and improved or entirely deprecated without concern.
If OpenBSD wanted to, they could implement something like io_uring with support for all kernel functionality, port libc to use that and ditch conventional syscalls entirely (simulating blocking where needed), without user-space knowing anything changed.
Under Linux, which is actually rather unusual in providing ABI guarantees, you're stuck maintaining bug-for-bug compatibility for every syscall you've ever written, and cannot change user-space no matter how good that change would be for either or both.
> you're stuck maintaining bug-for-bug compatibility for every syscall you've ever written
But instead you have to maintain bug-for-bug compability in libc for every API you've ever written. In case of macOS where the kernel and libc/libSystem developments are closed and done by the same entity is makes zero difference (imho).
Also, I understand catiopatio's argument in the sibling comment about doing more in user-space than in kernel in case of a bug, but it breaks down the moment thin wrappers around syscalls exist there. You link to libc/libSystem and use every thin wrapper in existence - now no syscall can be changed (if I understand correctly how macOS works, never used one)
> But instead you have to maintain bug-for-bug compability in libc for every API you've ever written.
It is not instead. With syscall ABI you need to maintain both, without it you only need to maintain libc (a majority of which is dictated by POSIX anyway).
> In case of macOS where the kernel and libc/libSystem developments are closed and done by the same entity is makes zero difference (imho).
As above, maintaining two contracts is harder than one.
Proprietary parts aside, most OS's have their kernel and user-space developed together, and that is exactly what allows them to work this way.
This is used to adopt new features, change or deprecate old features with a brutal efficiency that Linux cannot compete with - e.g., when OpenBSD implemented pledge in both kernel and all relevant tools.
This is one of the reasons that these projects can keep up or in some cases surpass Linux (FreeBSD networking is seen as superior, and you used to get better performance from running Linux binaries on FreeBSD through its compatibility layer) despite having much smaller groups of maintainers and users.
I'd much rather export this complexity to userland code than privileged kernel mode code though. It reduces the potential consequence for a mistake pretty drastically.
The main benefit is that you can implement complex logic, performance optimizations, and compatibility shims on the userspace side of the kernel/userspace divide, where it's generally much easier to do, and bugs won't result in full compromise of the kernel.
There's not really a strong argument for not linking libc. Even if you want to implement your own libc, you can still do so with linker tricks and calling through to the supported syscall wrappers in the real libc.
> There's not really a strong argument for not linking libc.
Sure there is. The libc sucks. Freestanding C actually turned out to be a superior language because there's not as much legacy weighing it down. There's many systems languages out there, nobody should be forced to link to C stuff.
I think Windows solved this much better than libc by separating the concerns: there's user32.dll providing a thin abstraction over syscalls, a couple libraries providing higher-level interfaces like wintrust.dll, and thirdly the C runtime library providing all the stuff mandated by C. You can easily ditch the latter, while still keeping the benefits of user32.dll (which cosmopolitan is doing).
macOS essentially has the same distinction internally (libsystem_kernel.dylib), but that is not directly linkable and is instead re-exported via the libSystem umbrella which also exports libsystem_c.dylib.
While you cannot link directly to libsystem_kernel.dylib, if you choose to ignore everything in libsystem_c.dylib and only use the syscall wrappers reexported from libsystem_kernel.dylib via libSystem it has practically the same effect on macOS (in fact, the resulting binary will be identical to what a hypothetical binary linked to just a single libsystem_kernel.dylib would be except for a single `LC_LOAD_DYLIB` command).
Such a binary would have the system lib c initialized sitting in its address space, but for the rare binary that really wants its own libc that doesn't seem unreasonable.
What's at issue here is linking to the platform libc for syscalls. If you don't want to use the platform libc's qsort implementation, by all means, don't. But if you want to call gettimeofday, you should do that by calling the gettimeofday function in the libc, not by trying to set up a system call on your own. The former is a stable interface, the latter isn't.
It's annoying that libc combines two completely different things - userspace utility functions, system calls - but unfortunately, that's where history has brought us.
> But if you want to call gettimeofday, you should do that by calling the gettimeofday function in the libc, not by trying to set up a system call on your own. The former is a stable interface, the latter isn't.
The system call interface is not only stable but also language agnostic. The correct place for this inferface isn't in some C library, it's in the language itself. We could have compilers directly targeting this. GCC could add a system_call keyword that emits code conforming to that ABI. Dynamic languages could have a JIT compiler that does the same thing.
Even on Linux, userland programs are encouraged to perform syscalls through the vDSO, a mini userland library supplied by the kernel. Performing syscalls directly is supported, but the vDSO is preferred. For most syscalls, using vDSO simply allows the kernel to pick the fastest available method for entering the kernel. But for a handful, the vDSO provides an optimized implementation that doesn't enter the kernel at all. One of those, ironically, is gettimeofday.
That said, the vDSO is much smaller than any libc.
Outside unsupported edge cases (e.g. shell code), there's really no material difference between jumping to the address of syscall wrapper in libSystem.dylib, versus trapping to the kernel via a given syscall number — other than the latter being a supported interface, while the former is explicitly not.
Edge cases also include using a different libc, working around bugs in the supported libc, trying to minimise how much of your stack is implemented in C.
Alternative libc implementations just aren't a thing outside of Linux.
However, if, for some reason, you really want to re-implement libc, you still can, but your library will have to link against the system-provided libc for the syscall entry points, because that's the only stable interface to the kernel.
There's no reason that should be a problem for a hypothetical libc-reimplementor.
My take: Like with any interface the definer needs to balance api user ergonomics, your developer's ergonomics, and internal developer's needs. If your api can get a little toe-hold on the remote side there's all sorts of advantages (and compromises) you can leverage.
In this case that means the api designer gets a little bit of wiggle room in the process userspace where it might be more appropriate to say shim some calls so that they're no longer 1:1 with a syscall. The obvious example is that malloc() can call syscalls for you and is the "default" way to allocate memory you might want to provide but It's more of it's own little runtime.
My favorite example was in my time at LinkedIn. Because the internal Kafka team had thick client libraries, making the company wide change to enable encryption was as simple as pushing out new libraries and deprecating old ones.
Well, isn’t that why this web page exists? To advertise the nice libc implementation so users don’t have to make syscalls directly to the OS? Maybe I’ve misunderstood something…
Given my (rough) understanding of Justine’s political leanings (active in Occupy [1]), I think (and hope) that you are mistaken. The following searches turn up empty [2,3] as well.
It is perfectly possible that you are still right. But I would be darn careful to make statements like you made without backing them up with facts. The Internet is a chilling enough place without even more vague rumours going around.
Edit: Well, apparently your memory is on to something after all [4,5].
I've never heard of that before, though I did hear about the whole Atomwaffen (or really a separate US group going by the same name) O9A infiltration disaster. Is there any public documentation of this going in in general, or is it confined to private Discord rooms and that sort of thing?
Oh please f** off with the speech policing. There is no hate festering in the article or anywhere on her page. Someone had to go very far out of their way to dig up something she said. Until then, it was just a technical post. That's very far from Moldbug who wrote volumes of rhetoric explicitly designed to push a particular ideology. In fact if exposure to radical views is the problem, the person who dug up the comments and posted them here did more harm here than anyone else. Until then we were all blissfully unaware and just having a technical discussion.
> Someone had to go very far out of their way to dig up something she said.
And every time it happens, more and more people find out. Streisand effect.
When I asked @dang a few years ago why Yarvin was banned, that was his response: in even technical posts related to Yarvin, his politics endlessly get brought up—and HN doesn't exist for that kind of discussion.
You can still see that happen in virtually any Urbit post, despite him not even being involved in it anymore.
If I were @dang, I would have allowed Yarvin links on HN (and Justine, obviously—amazingly talented engineer) and instead have a different HN policy: ask random HN posters to stop bringing up unrelated politics in the comments of technical posts.
Maybe @dang will do just that and we'll both be happy. It happens a lot and would make the site much, much nicer.
“If Stalin had a good writeup on programming, would linking that be dangerous, because some people might read it, start liking HIM, thus start liking communism and the ineviteble mass murder that follows it? Is this how little we trust other individuals when it comes to access to information? I personally trust my readers to have the ability to create their own opinions instead of blindly following whatever says the person they like.”
I have many people I dislike on a personal level, but still praise for their technical abilities.
Yes, please! Politics, technical inventions and the things people do in their free time are different things! Please stop mixing all of those together. This is a ad hominem attack and I can't stand it.
I would love to know how this table is generated and what the test process is. What's the best way to run tests like this across such a wide variety of OSes? Maybe vagrant images?