Hacker News new | past | comments | ask | show | jobs | submit login
Picolibc – Libc for Embedded Systems (keithp.com)
118 points by homarp on Oct 3, 2019 | hide | past | favorite | 41 comments



I wish we could do away with (parts of) libc entirely. Many of its design principles and limitations date back to its inception and don't make sense today, even on "embedded" systems.

- function names limited to 6 characters (because of compiler/linker memory limits at the time?)

- non-reentrency of many functions (strtok(), errno...)

- dangerous string manipulation functions (see the various iterations of strcpy())

- and so much more...

Of course, I'm missing the point. Libc continues on because it's standard and backwards-compatible, and lets you leverage 50 years of software ecosystem.

Nevertheless, I'd love to see what a modern clean-slate implementation might look like. I'm sure there'd be much disagreement about what warrants replacement and what doesn't ;)


> I'm sure there'd be much disagreement about what warrants replacement and what doesn't ;

How many backwards incompatible variations of C++ std::string have there been, simply in terms of the standard (not extensions). Two? Three?

glibc has been turning their nose up at strlcpy(3) for literally 20 years, and yet still doesn't provide a comparable one-liner alternative that guarantees to always leave a NUL-terminated string behind (i.e. doesn't leave a loaded gun lying around).

A good string interface is impossible to pin down because a "string" is the anti-thesis of a proper data structure. C dipped its toes in the better string library waters with Annex K, which saw almost no uptake; even the original sponsor, Microsoft, never shipped a complete implementation. C would be wise to avoid chasing the mirage of a better string library. In general if you're thinking in terms of "strings", C is the wrong language. C being an inconvenient language for programming with that mindset is a good thing.

(I would love to see strlcpy added, but the fact it's so divisive is maybe a strong hint that there's no way to placate a substantial minority of people, not even the most pragmatic of users. Anything involving the str* API is a lose-lose at this point.)

What C really needs are some primitives for bounded arrays and slices. There are some very good, very simple proposals out there, but no compiler vendor has actually shipped them and the C committee has been burned several times before by standardizing de novo interfaces.


> even the original sponsor, Microsoft, never shipped a complete implementation

I guess mostly because they decided that in what concerns improving Windows system programming safety they were better moving everything into C++ and let others keep bothering with C.

And nowadays their are eyeing into the C#, C++ (with Core Guidelines analyzers) and Rust.

The exception being Azure Sphere, but I bet they will eventually add Rust there, as using C taints the security story of Azure Sphere.

The resistance among C devs to move away from micro-optimize every line of code while writing it down is too big to ever adopt one of those primitives for bounded arrays and slices.

Leading to solutions like ADI enabled SPARC Solaris, or the future Android devices that will be coming with ARM memory tagging enabled.



> glibc has been turning their nose up at strlcpy(3) for literally 20 years, and yet still doesn't provide a comparable one-liner alternative that guarantees to always leave a NUL-terminated string behind (i.e. doesn't leave a loaded gun lying around).

  snprintf(destination, sizeof(destination), "%s", source);

?


snprintf(3) can fail with ENOMEM. In fact, at one point (not sure about now), glibc could fail with ENOMEM even for %s.[1]

The size argument is an int, not size_t, so there are arithmetic overflow problems.

The return value is also an int, creating underflow and overflow hazards, though the degree to which this is a problem, and the extent to which it's a feature, vary. Everybody has a different opinion about how truncation should be signaled, if at all.

[1] Because glibc snprintf uses (or at least used) its stdio FILE interfaces for implementing snprintf, and it conditionally uses malloc for initializing a temporary FILE object.


You also missed the part where it's not necessarily self-evident what it does, or that it even fixes the issue ;)


errno is thread-safe as it resides in TLS. Part of libc such as pthread is very hard, and so does the dynamic linker. The latter is supposed to handle TLS data as required by TLS/elf spec, which more or less requires a full thread/pthread implementation. As a result, it is rather difficult to break down libc, without writing the whole thing in the first place. golang might be an exception (no linkage against libc), but AFAIK it doesn't use TLS.

It's a pretty hard task to tackle libc (not just string functions, obviously), maybe for the same reason there're only a few libc implementations (support full dynamic linking).


Neither pthreads nor a dynamic linker are required for libc. pthreads is part of the POSIX standard, not the C standard, and many libc implementations over the years have not supported dynamic linking.

To give you an idea, here's a chip used by at least one of the products developed by the author:

https://www.st.com/en/microcontrollers-microprocessors/stm32...

16KB of ram. You're not going to see any linking on this because you're probably running everything in a single address space anyways.


What is TLS?

> Part of libc such as pthread is very hard, and so does the dynamic linker.

What does this mean? I'm having trouble parsing this as English much less making sense of the argument :)

> It's a pretty hard task to tackle libc (not just string functions, obviously), maybe for the same reason there're only a few libc implementations (support full dynamic linking).

Why does libc ship with string functions in the first place? What is special about strings that they are included in libc while other data structures (to my knowledge) are not? Presumably you don't need strings to implement malloc or other parts of libc?


> What is TLS?

Thread-Local Storage:

https://en.wikipedia.org/wiki/Thread-local_storage


Strings ship with libc because pretty much every language ships with string manipulation routines. You need them to basically do anything that isn’t pure computation.


Granted, but pretty much every language also ships with routines for other data structures, and I would think that you need things like lists/slices more often than you need string routines.


libc includes string functions because C requires string manipulation functions. Other languages may or may not make use of C's string functions.


Yes, because they ship with their own string type.


strtok_r() should be available on any halfway decent embedded platform.


musl?


musl still has many of the same issues, as it needs to remain compatible with POSIX.


If you missed it, this is from Keith Packard, big-time X Window System developer.

https://en.wikipedia.org/wiki/Keith_Packard


Most of the big chip vendors provide a libc implementation along with the proprietary compilers that they maintain for their chips. On the TI MSP430, I've had some projects that were too big to fit into flash if I used the libc (newlib) that's bundled with the open source MSP430-GCC compiler[1]. By contrast, if I compiled with the proprietary TI compiler[2] and its libc, the binary was small enough to fit. My experience is a bit dated at this point, so newlib and the GCC fork may have improved enough to reach parity by now.

The dominance of GCC and LLVM on x86[-64] machines has conditioned many of us to take great open source build toolchains for granted, but embedded systems are comparatively the wild west - there are still a lot of proprietary build toolchains out there. Obviously this is a negative if you're committed to using open source tools, but one positive is that it's a niche where companies can still make money selling compilers (and therefore employing engineers who work on compilers).

[1] http://www.ti.com/tool/MSP430-GCC-OPENSOURCE

[2] http://www.ti.com/tool/MSP-CGT


The biggest news is probably the fact that it's BSD licensed. Having some GPL parts had driven me away from newlib before.


If I were in the market for such a library, I'd still have serious concerns. The GPL license still appears in the root of the repo as if the entire codebase might be subject to GPL; It appears to be a fork of newlib (GPL) and that means it might not qualify as a "clean room implementation" of GPL'd code - the legal team in a corporation is going to insist on a ton of due diligence on Picolibc before they allow it to be used.


Quoting the README:

> Remove[d] unused code with non-BSD licenses. There's still a pile of unused code hanging around, but all non-BSD licensed bits have been removed to make the licensing situation clear. Picolibc is BSD licensed.

From my understanding, Newlib is a GPL-licensed project that has some GPL-licensed contributions and some BSD-licensed contributions (in the sense that these aren't original code written by contributors, but rather are copy-and-paste inclusions of pieces from various GPL- and BSD-licensed codebases.)

The author created Picolibc by (doing the moral equivalent of) cherry-picking out a small base of the commits to Newlib—all of which were copy-and-pastes from BSD-licensed codebases.

---

In some sense, Newlib is very similar to how a Linux distro works, repackaging upstream code components (of various licenses) together.

Picolibc, then, would be a lot like creating a new Linux distro, derived "from" Ubuntu, that happens to use only the packages in Ubuntu that are themselves directly copied in from upstream Debian (pretending for a moment that Ubuntu does this rather than re-signing packages with their own keys.) Is such a distro really "derived from Ubuntu"? Or is it derived "from" Debian, with an Ubuntu mirror server just serving as a pipe that some Debian packages went through on their trip "from" Debian's hands "into" the hands of the new distro?

---

Either way, worries that the GPL could infect this effort are misplaced—even if it comes to a legal battle, at any point, the same codebase could be recreated (with a bit of a schlep) by just going to all the same BSD-licensed upstream sources that Newlib's contributors pulled from, and doing the same copy-and-pasting that Newlib's contributors did.


Or we could stop mindlessly releasing closed source encrypted firmware?


[flagged]


Please don't post unsubstantive comments here, especially not nasty ones, regardless of how annoying other commenters are. Otherwise we all go down an annoyance spiral.

https://news.ycombinator.com/newsguidelines.html


There's nothing annoying about understanding how you're allowed to use something created by another person, especially when legal liability is involved.


This is an excellent libc for embedded systems. I am looking to add it to a self hosted tcc (Bellard's C compiler) on ARM Cortex-M systems.


An interesting non-free alternative is the SEGGER Runtime Library [1] which works with GCC, but with smaller code size than newlib-nano and significantly faster floating point emulation [2] for processors like the Cortex M3 or RISCV wih no FPU. It would be interesting to see if newlib-nano or picolibc could use these tricks.

[1] https://www.segger.com/products/development-tools/runtime-li... [2] https://blog.segger.com/floating-point-face-off/


I didn't see this addressed in the post, but why not musl?


For what it's worth: musl is much larger than newlib-nano or picolibc. Most of the embedded projects I've worked on could not afford to use musl.


  picolibc$ find . \( -name test -o -name testsuite \) -prune -o -name \*.c -exec cat -- {} + | wc -l
    259227
  picolibc$ find . \( -name test -o -name testsuite \) -prune -o -name \*.h -exec cat -- {} + | wc -l
    60588

  musl$ find . -name \*.c -exec cat -- {} + | wc -l              
    60697
  musl$ find . -name \*.h -exec cat -- {} + | wc -l
    37889
Those figures are sufficiently far apart that I don't think we need to quibble about comments. Whatever is causing those huge numbers for picolibc (architecture optimized function variations? per file license preambles?), suffice it to say that it's debatable the extent to which newlib or picolibc is smaller or simpler than musl, if at all.

musl is maybe more ambitious in terms of full and correct POSIX compliance, but the code base is very clean and concise. For example, for locale support they keep things very simple by only supporting UTF-8. Unlike picolibc, musl includes a full ELF runtime dynamic linker, full pthreads library (with correct cancellation semantics!), and up-to-date wrappers for all the esoteric Linux syscalls, yet still clocks in at ~60k lines of C source files at most. It would be fairly trivial to remove those things, precisely because of how well structured and straightforward the code is.

musl also builds using a single, simple Makefile

  musl$ wc -l Makefile                                           
    235 Makefile
compared to the supposedly simpler and cleaner meson build:

  picolibc$ wc -l meson.build meson_options.txt 
    282 meson.build
    132 meson_options.txt
    414 total
If I was working on an embedded project, I know which build I'd prefer to hack on.


So the inverse--could an OS distribution like alpine profit from using picolib instead of musl? What would be the tradeoffs (presumably performance for size, but to what degree)?


Along with performance, there's probably other ABI issues and maybe even API extensions to the C library that just aren't present in these tiny implementations. You can work around the ABI stuff by recompiling with the new c library of course, but that won't work if you've got a closed source binary or if the software needs something not provided.


The original post linked in the opening covers some of the search and reasoning.


it says "A more lively project; still, definitely targets systems with a real Linux kernel." but actually people are using Musl for other use cases. And it is BSD licensed, and very high quality.


There were other considerations against musl there. I'd summarize them as targeting a desktop/server with kernel level system, or at least a mid-to-high end embedded system, has different code, memory, and optimization tradeoffs than the smaller embedded systems that seem to be targeted with picolibc.


Okay, what good is any of that if musl literally does not compile without Linux headers?

This is for systems that are at best a small RTOS, and there is no standard for what those operating systems offer to a libc.


Musl upstream only supports Linux, but it is so easy to port that there are large numbers of ports to non Linux environments.


Looks like it can save a good chunk of memory compared to newlib: https://github.com/RIOT-OS/RIOT/pull/12305


Needs to have a strong overlap with WASI.


Why?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: