I wish we could do away with (parts of) libc entirely. Many of its design principles and limitations date back to its inception and don't make sense today, even on "embedded" systems.
- function names limited to 6 characters (because of compiler/linker memory limits at the time?)
- non-reentrency of many functions (strtok(), errno...)
- dangerous string manipulation functions (see the various iterations of strcpy())
- and so much more...
Of course, I'm missing the point. Libc continues on because it's standard and backwards-compatible, and lets you leverage 50 years of software ecosystem.
Nevertheless, I'd love to see what a modern clean-slate implementation might look like. I'm sure there'd be much disagreement about what warrants replacement and what doesn't ;)
> I'm sure there'd be much disagreement about what warrants replacement and what doesn't ;
How many backwards incompatible variations of C++ std::string have there been, simply in terms of the standard (not extensions). Two? Three?
glibc has been turning their nose up at strlcpy(3) for literally 20 years, and yet still doesn't provide a comparable one-liner alternative that guarantees to always leave a NUL-terminated string behind (i.e. doesn't leave a loaded gun lying around).
A good string interface is impossible to pin down because a "string" is the anti-thesis of a proper data structure. C dipped its toes in the better string library waters with Annex K, which saw almost no uptake; even the original sponsor, Microsoft, never shipped a complete implementation. C would be wise to avoid chasing the mirage of a better string library. In general if you're thinking in terms of "strings", C is the wrong language. C being an inconvenient language for programming with that mindset is a good thing.
(I would love to see strlcpy added, but the fact it's so divisive is maybe a strong hint that there's no way to placate a substantial minority of people, not even the most pragmatic of users. Anything involving the str* API is a lose-lose at this point.)
What C really needs are some primitives for bounded arrays and slices. There are some very good, very simple proposals out there, but no compiler vendor has actually shipped them and the C committee has been burned several times before by standardizing de novo interfaces.
> even the original sponsor, Microsoft, never shipped a complete implementation
I guess mostly because they decided that in what concerns improving Windows system programming safety they were better moving everything into C++ and let others keep bothering with C.
And nowadays their are eyeing into the C#, C++ (with Core Guidelines analyzers) and Rust.
The exception being Azure Sphere, but I bet they will eventually add Rust there, as using C taints the security story of Azure Sphere.
The resistance among C devs to move away from micro-optimize every line of code while writing it down is too big to ever adopt one of those primitives for bounded arrays and slices.
Leading to solutions like ADI enabled SPARC Solaris, or the future Android devices that will be coming with ARM memory tagging enabled.
> glibc has been turning their nose up at strlcpy(3) for literally 20 years, and yet still doesn't provide a comparable one-liner alternative that guarantees to always leave a NUL-terminated string behind (i.e. doesn't leave a loaded gun lying around).
snprintf(3) can fail with ENOMEM. In fact, at one point (not sure about now), glibc could fail with ENOMEM even for %s.[1]
The size argument is an int, not size_t, so there are arithmetic overflow problems.
The return value is also an int, creating underflow and overflow hazards, though the degree to which this is a problem, and the extent to which it's a feature, vary. Everybody has a different opinion about how truncation should be signaled, if at all.
[1] Because glibc snprintf uses (or at least used) its stdio FILE interfaces for implementing snprintf, and it conditionally uses malloc for initializing a temporary FILE object.
errno is thread-safe as it resides in TLS. Part of libc such as pthread is very hard, and so does the dynamic linker. The latter is supposed to handle TLS data as required by TLS/elf spec, which more or less requires a full thread/pthread implementation. As a result, it is rather difficult to break down libc, without writing the whole thing in the first place. golang might be an exception (no linkage against libc), but AFAIK it doesn't use TLS.
It's a pretty hard task to tackle libc (not just string functions, obviously), maybe for the same reason there're only a few libc implementations (support full dynamic linking).
Neither pthreads nor a dynamic linker are required for libc. pthreads is part of the POSIX standard, not the C standard, and many libc implementations over the years have not supported dynamic linking.
To give you an idea, here's a chip used by at least one of the products developed by the author:
> Part of libc such as pthread is very hard, and so does the dynamic linker.
What does this mean? I'm having trouble parsing this as English much less making sense of the argument :)
> It's a pretty hard task to tackle libc (not just string functions, obviously), maybe for the same reason there're only a few libc implementations (support full dynamic linking).
Why does libc ship with string functions in the first place? What is special about strings that they are included in libc while other data structures (to my knowledge) are not? Presumably you don't need strings to implement malloc or other parts of libc?
Strings ship with libc because pretty much every language ships with string manipulation routines. You need them to basically do anything that isn’t pure computation.
Granted, but pretty much every language also ships with routines for other data structures, and I would think that you need things like lists/slices more often than you need string routines.
Most of the big chip vendors provide a libc implementation along with the proprietary compilers that they maintain for their chips. On the TI MSP430, I've had some projects that were too big to fit into flash if I used the libc (newlib) that's bundled with the open source MSP430-GCC compiler[1]. By contrast, if I compiled with the proprietary TI compiler[2] and its libc, the binary was small enough to fit. My experience is a bit dated at this point, so newlib and the GCC fork may have improved enough to reach parity by now.
The dominance of GCC and LLVM on x86[-64] machines has conditioned many of us to take great open source build toolchains for granted, but embedded systems are comparatively the wild west - there are still a lot of proprietary build toolchains out there. Obviously this is a negative if you're committed to using open source tools, but one positive is that it's a niche where companies can still make money selling compilers (and therefore employing engineers who work on compilers).
If I were in the market for such a library, I'd still have serious concerns. The GPL license still appears in the root of the repo as if the entire codebase might be subject to GPL; It appears to be a fork of newlib (GPL) and that means it might not qualify as a "clean room implementation" of GPL'd code - the legal team in a corporation is going to insist on a ton of due diligence on Picolibc before they allow it to be used.
> Remove[d] unused code with non-BSD licenses. There's still a pile of unused code hanging around, but all non-BSD licensed bits have been removed to make the licensing situation clear. Picolibc is BSD licensed.
From my understanding, Newlib is a GPL-licensed project that has some GPL-licensed contributions and some BSD-licensed contributions (in the sense that these aren't original code written by contributors, but rather are copy-and-paste inclusions of pieces from various GPL- and BSD-licensed codebases.)
The author created Picolibc by (doing the moral equivalent of) cherry-picking out a small base of the commits to Newlib—all of which were copy-and-pastes from BSD-licensed codebases.
---
In some sense, Newlib is very similar to how a Linux distro works, repackaging upstream code components (of various licenses) together.
Picolibc, then, would be a lot like creating a new Linux distro, derived "from" Ubuntu, that happens to use only the packages in Ubuntu that are themselves directly copied in from upstream Debian (pretending for a moment that Ubuntu does this rather than re-signing packages with their own keys.) Is such a distro really "derived from Ubuntu"? Or is it derived "from" Debian, with an Ubuntu mirror server just serving as a pipe that some Debian packages went through on their trip "from" Debian's hands "into" the hands of the new distro?
---
Either way, worries that the GPL could infect this effort are misplaced—even if it comes to a legal battle, at any point, the same codebase could be recreated (with a bit of a schlep) by just going to all the same BSD-licensed upstream sources that Newlib's contributors pulled from, and doing the same copy-and-pasting that Newlib's contributors did.
Please don't post unsubstantive comments here, especially not nasty ones, regardless of how annoying other commenters are. Otherwise we all go down an annoyance spiral.
There's nothing annoying about understanding how you're allowed to use something created by another person, especially when legal liability is involved.
An interesting non-free alternative is the SEGGER Runtime Library [1] which works with GCC, but with smaller code size than newlib-nano and significantly faster floating point emulation [2] for processors like the Cortex M3 or RISCV wih no FPU. It would be interesting to see if newlib-nano or picolibc could use these tricks.
Those figures are sufficiently far apart that I don't think we need to quibble about comments. Whatever is causing those huge numbers for picolibc (architecture optimized function variations? per file license preambles?), suffice it to say that it's debatable the extent to which newlib or picolibc is smaller or simpler than musl, if at all.
musl is maybe more ambitious in terms of full and correct POSIX compliance, but the code base is very clean and concise. For example, for locale support they keep things very simple by only supporting UTF-8. Unlike picolibc, musl includes a full ELF runtime dynamic linker, full pthreads library (with correct cancellation semantics!), and up-to-date wrappers for all the esoteric Linux syscalls, yet still clocks in at ~60k lines of C source files at most. It would be fairly trivial to remove those things, precisely because of how well structured and straightforward the code is.
musl also builds using a single, simple Makefile
musl$ wc -l Makefile
235 Makefile
compared to the supposedly simpler and cleaner meson build:
So the inverse--could an OS distribution like alpine profit from using picolib instead of musl? What would be the tradeoffs (presumably performance for size, but to what degree)?
Along with performance, there's probably other ABI issues and maybe even API extensions to the C library that just aren't present in these tiny implementations. You can work around the ABI stuff by recompiling with the new c library of course, but that won't work if you've got a closed source binary or if the software needs something not provided.
it says "A more lively project; still, definitely targets systems with a real Linux kernel." but actually people are using Musl for other use cases. And it is BSD licensed, and very high quality.
There were other considerations against musl there. I'd summarize them as targeting a desktop/server with kernel level system, or at least a mid-to-high end embedded system, has different code, memory, and optimization tradeoffs than the smaller embedded systems that seem to be targeted with picolibc.
- function names limited to 6 characters (because of compiler/linker memory limits at the time?)
- non-reentrency of many functions (strtok(), errno...)
- dangerous string manipulation functions (see the various iterations of strcpy())
- and so much more...
Of course, I'm missing the point. Libc continues on because it's standard and backwards-compatible, and lets you leverage 50 years of software ecosystem.
Nevertheless, I'd love to see what a modern clean-slate implementation might look like. I'm sure there'd be much disagreement about what warrants replacement and what doesn't ;)