An update on rust/coreutils

StefanKarpinski · on Jan 29, 2022

An interesting aspect of this effort is that it undoubtesly makes modifying these tools much easier. While GNU and the FSF are all about legally ensuring the ability to read and modify their code, they have, in practice, made it almost impossible to do so for entirely technical reasons. I’m a pretty good C programmer and whenever I’ve looked at even simple GNU tools like `cat` or god forbid GNU libc’s `printf` code (this one is actually kind of delightful it’s so nuts), I have despaired of understanding, let alone modifying, the code. GNU coreutils are such a morass of preprocessor defines and bonkers C code to support every legacy system GNU has ever run on, that it might as well only be distributed in binary form. I think it would literally be easier to decompile most GNU tools and modify the decompiled platform-specific source than to make the same modification to the source as shipped. Rust has a steep learning curve, but learning Rust is a picnic compared to figuring out how to modify just one GNU tool, and of course they’re all different. Rust puts the onus of portability where it belongs: in the language and compiler, instead of forcing it on every single programmer and application.

Anthony-G · on Jan 29, 2022

Thanks for sharing this perspective. A few years ago, I started learning C with the aim of contributing to GNU coreutils and other utilities that come with a GNU/Linux system. I read most of the K&R book and thought I had a decent grasp of the C language but when I cloned the coreutils source code, I couldn’t figure it out and gave up, thinking I’d need a lot more experience before I could understand production-quality code. It’s reassuring to find out it wasn’t just me that struggled with the complexity of the code base.

Since then, I’ve discovered (thanks to Hacker News) Decoded: GNU coreutils¹, a “resource for novice programmers exploring the design of command-line utilities” but unfortunately, I no longer have the capacity or free time to spend on coding.

¹ https://www.maizure.org/projects/decoded-gnu-coreutils/index...

yjftsjthsd-h · on Jan 29, 2022

This sounds right on the edge of the traditional situation in rewrites where everyone says how awful the old code is, so they rewrite it, but then the new code doesn't handle edge cases (doesn't run on AIX/Illumos/NetBSD, yes(1) turns out to be really slow without that weirdness[0], file semantics have really weird edge cases that you only find after losing data), so the new code is adjusted to actually be as comprehensive as the original, at which point the new code is as ugly as the original. Of course, it depends on why the old code was ugly; if, as suggested below, some of it was being intentionally obtuse to avoid claims of copying, there might be room for actual improvement, and we do have more perspective and better tooling and new algorithms - real improvement is 100% possible, I would just be very cautious in assuming that based on an incomplete comparison.

[0] https://news.ycombinator.com/item?id=14542938

shakow · on Jan 29, 2022

> doesn't run on AIX/Illumos/NetBSD

These OSs distribute their own coreutils.

> yes(1) turns out to be really slow without that weirdness[0]

I'm not sure there is a workflow where yes writing slower than 10.2GB/s is the bottleneck. Being able to go so fast is a nifty engineering feat, but hardly a necessity IMHO.

Spivak · on Jan 29, 2022

Yes but scripts are written assuming GNU coreutils behavior and this makes them easier to port. No reason they shouldn’t work everywhere. I think a nontrivial portion of Mac users on HN probably use the GNU utilities on macOS.

rodgerd · on Jan 30, 2022

> No reason they shouldn’t work everywhere.

No reason except the complexity of maintaining huge amounts of code to support dead processors and operating systems, or ones that are so rare that they might as well be dead.

KptMarchewa · on Jan 30, 2022

Macs generally use BSD tools when possible because Apple won't put GPLv3 code on their OS.

WalterGR · on Jan 30, 2022

Installing the GNU versions is as easy as

    brew install coreutils

etc., which I suspect is sort of what Spivak was referring to, and not that MacOS ships with them.

brundolf · on Jan 29, 2022

> file semantics have really weird edge cases that you only find after losing data), so the new code is adjusted to actually be as comprehensive as the original, at which point the new code is as ugly as the original

I think the GP was getting at the fact that Rust/rustc handles many of these edge cases at the language/compiler/standard-library level, instead of at the application level. Which would mean, at least in theory, the new code would actually not have to be as ugly as the original in many cases.

melissalobos · on Jan 29, 2022

I have seen it argued many times that the GNU codebases are deliberately written to be unintelligible to avoid claims of being derived from UNIX sources. If they were written in the most straightforward ways, then they would have to show that no-one working on it has seen the original sources. I believe there is even a section in the glibc manual discussing this idea.

smoldesu · on Jan 29, 2022

Sometimes it's obscenely complicated because the GNU devs like showing off with even the simplest of programs. My favorite example is GNU yes, a coreutil that's so ludicrously fast[0] that it can cause OOM states in mere seconds if you're not careful where you're redirecting output. Other HN users have compared the GNU version[1] with the BSD one[2], and I think it's a pretty good way to grok the difference in style between the two. In any case, both tools work perfectly fine for pretty much every use case.

[0] https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes...

[1] https://github.com/coreutils/coreutils/blob/master/src/yes.c

[2] https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c

duckerude · on Jan 29, 2022

One more data point here is that uutils yes can be even faster than GNU yes by taking advantage of Linux's vmsplice syscall: https://github.com/uutils/coreutils/blob/main/src/uu/yes/src...

On my system it manages 20GB/s vs 6 GB/s (piping through pv into /dev/null).

StefanKarpinski · on Jan 29, 2022

Awful but plausible.

est31 · on Jan 29, 2022

This is an extremely amazing project because it's actually trying to come up with a replacement that can be used as such. Nobody is going to rewrite scripts to use exa instead of ls for example. The coreutils are also somewhat the "basis" of the shell userspace.

Outside of the test failures and missing features, the only reason one might not want the coreutils is the size.

Right now, cat on my system is 44K large, while the default output size for the cat executable in release mode is 4.4 megabytes in release mode. If you enable a bunch of options in Cargo.toml to reduce the default bloaty settings (lto = true, codegen-units = 1, strip = true, debug = false), you get that down to 876K. Which is still 20 times larger than the native cat. For true, it's a similar story. 40K vs 812K.

lr1970 · on Jan 29, 2022

Rust executables are statically linked by default. This makes them oftentimes substantially larger than their dynamically linked counterparts. For me it is not a deal breaker.

ris · on Jan 29, 2022

My guess is people will start "solving" this kind of problem through multi-call binaries a la busybox.

duckerude · on Jan 29, 2022

uutils already (optionally) builds that way. The GitHub release has a single 8.4M binary.

That doesn't seem so bad, the GNU coreutils Debian package has an Installed-Size of 17.9MB. Though there might be hidden drawbacks I don't know about.

est31 · on Jan 29, 2022

> The GitHub release has a single 8.4M binary.

That's a definitely acceptable size, thanks for pointing that out. Yeah a bunch of symlinks should solve the issue. If the binary is kept cached in RAM at least, there should be no overhead in starting it. I think it's pretty rare to have environments that are so memory constrained that they can't hold onto 8 MB of RAM.

Note that not all targets are that small though. Musl has indeed 8.3 MiB, but in coreutils-0.0.12-x86_64-unknown-linux-gnu.tar.gz, the size of the coreutils executable is 12.6 MiB. On the bright side, on the same target, a local build with the cargo features to reduce bloat enabled gives only 7.1M. All these numbers are acceptable.

loeg · on Jan 29, 2022

Is anything in coreutils setuid? That’s the only possible drawback I have in mind.

ris · on Jan 30, 2022

Yes, the only reason I say "solved" is that this approach will only help for suites of many small tools that ship together.

andai · on Jan 29, 2022

Is that a typical ratio for software ported to Rust? Where is all the extra size coming from?

duckerude · on Jan 29, 2022

At least 500K comes from clap, the argument parsing library. That's how much true's size drops if you comment it out.

It's a lovely library. Generated help text with colors and line wrapping based on your terminal width, informative error messages that suggest corrections for typos, and a declarative model that lets you express subtle relationships between options. You don't get that from getopt.

But it's also pretty large. https://github.com/rust-cli/argparse-benchmarks-rs has some numbers.

(Disclaimer: I wrote one of the competitors on that page, lexopt. But I also use clap, depending on the project, and I'm happy with it.)

twsted · on Jan 29, 2022

500K to parse options is frankly too much...

duckerude · on Jan 29, 2022

This thread inspired me—I found a way to remove 40K: https://github.com/clap-rs/clap/pull/3370

I'm sure it's possible to cut it more. It'll never be tiny, but it can be smaller than it is now.

wongarsu · on Jan 29, 2022

In most software people will care more about helpful error messages and readable help texts than about half a megabyte of code size. Coreutils with its tiny frequently called binaries is an outlier where the size adds up, but that's not the norm.

0xedd · on Jan 29, 2022

Not sure how you are serious about your reply. PCs are a small market and a solution fitting them hardly represents a valid approach. There is an entire world of small devices that don't vomit terabytes of disk space.

One of the most attractive features of Linux is how it suits a large array of hardware. Replacing C solutions with some bloated bandwagon alternative is a bad idea.

wongarsu · on Jan 29, 2022

Yes, if you write software for devices with megabytes of disk space then don't use Clap for argument parsing in your software. I'm not really sure how that's a criticism of Clap in general, or the rust coreutils specifically. Rust coreutils built as a single binary (as opposed to one binary per tool) is about 8MB. Anyone who is agonizing over 500kB of RAM or disk space is going to use BusyBox instead of either the GNU coreutils or a rust replacement for them. Why limit the implementation to satisfy the needs of somebody who wouldn't choose you anyways?

camel-cdr · on Jan 29, 2022

Out of curiosity, why should each application implement line wrapping for terminals. That seems like a feature the terminal should implement, and from what I've read there is no standard way of getting the terminal size anyway.

zaarn · on Jan 29, 2022

The terminal isn't aware that you're, for example, using an indent because the help text of an option should appear indented from the option it's explaining. So the line wrap will go into the next line and ruin the readability.

camel-cdr · on Jan 29, 2022

Ah, that makes sense.

ksherlock · on Jan 29, 2022

There's the TIOCGWINSZ ioctl which seems pretty standard.

camgunz · on Jan 29, 2022

The added compile times scared me off of pretty much all the bigger options. Like, six seconds? Wooooooof.

est31 · on Jan 29, 2022

According to cargo bloat, 40.1% of the .text section comes from clap, and 46.7% of the .text section comes from std. clap is quite famous to be extremely bloaty.

To give an answer to your other question: a lot of the additional code size is a one time cost. That means that the relative difference is the largest in these cases, for binaries with little logic of their own. The more complex the program gets, the less the relative difference is. However, even if you don't account for these one time costs, Rust creates larger programs than C, due to heavy use of generics compared to C.

Ar-Curunir · on Jan 29, 2022

Maybe clap should store text strings in compressed form, and then decompress as needed. Could greatly reduce the bloat.

mlindner · on Jan 29, 2022

.text is the section of the binary where the code is.

rpeden · on Jan 29, 2022

As someone else mentioned, shared libraries. If you built ls and statically linked libc and whatever else it needs, ls would be a lot bigger.

cogman10 · on Jan 29, 2022

I believe what you are seeing there is the power of shared libraries.

bsdetector · on Jan 29, 2022

It'd also be a great time to establish a better standard for using these programs with a shell or scripting.

For example, if ls had a --shell option that would write out file information as quoted eval'able variables, or even JSON, or anything that was easily and reliably parsable it would remove a huge portion of scripting headaches and errors.

sylvestre · on Jan 30, 2022

I added some CI task and replayed the history to graph the evolution of the size: https://github.com/uutils/coreutils-tracking#binary-size-evo...

This will help making sure that we aren't regressing (more) ;)

Aldo_MX · on Jan 29, 2022

  Instead of 30 to 60 patches per month, we jumped to 400 to 472
  patches every month. Similarly, we saw an increase in the
  number of contributors (20 to 50 per month from 3 to 8).

More contributors = more hands to fix issues and tune performance. I believe this is an absolute win.

_fnhr · on Jan 29, 2022

According to the graph more than half of the tests are still failing. Can the speed be (to some extent) attributed to some missing functionality ?

sylvestre · on Jan 29, 2022

Good question!

Probably not. In general, functionalities are enabling or disabling a behavior when doing an operation. In the code, it translates most of the time by a simple if else. For example, adding new options usually looks like this PR: https://github.com/uutils/coreutils/pull/2880/files

The performance wins are usually produced by using some fancy Rust features.

laumars · on Jan 29, 2022

That’s only correct if you focus exclusively on optional features enabled by feature flags or environmental variables. Some features might be around support for things like Unicode where a tool could work 90% of the time but that extra 10% of support requires implementing a lot of additional logic that slows the routine down for the other 90% of use cases too.

Also ‘if’ causes branching which costs a small amount of CPU overhead. So it’s not a free operation and can quickly add up if you needed inside a hot path.

nicoburns · on Jan 29, 2022

> Some features might be around support for things like Unicode where a tool could work 90% of the time but that extra 10% of support requires implementing a lot of additional logic that slows the routine down for the other 90% of use cases too.

This kind of thing is where Rust really shines. The ecosystem was built post-unicode, so things tend to support it by default. Ripgrep for example has been unicode aware from the beginning, and you have to opt-out if you don't want that.

laumars · on Jan 29, 2022

I’m aware of Rusts support for Unicode, I was only using that as an example because it’s easy to visualise since most people who’ve written any kind of text parsing will understand the additional computational overhead correctly supporting Unicode costs. But while the example doesn’t directly apply to Rust, I guarantee you that there will be other edge cases in a similar vein that might cause issues.

devit · on Jan 29, 2022

For new flags, this is easily solved by generating and compiling (via generic monomorphization, macros or build-time code generation) two versions of all the relevant code (including of course any loops calling the relevant code) and switching the one that is executed depending on the value of the flag.

tomsmeding · on Jan 29, 2022

This is true, but also results in larger binaries. Elsewhere in this thread, it was noted that the coreutils compiled from the Rust code are already quite a bit bigger than the GNU counterparts. On many systems this might not matter much (and I guess it's fine if a software suite explicitly makes binary size a non-goal), but on some systems it does.

alerighi · on Jan 29, 2022

Well, they could be doing some simplifications and assumptions that can improve performance.

Also performance is not always the only factor to consider: you can optimize a program for speed, but also for RAM usage, or the size of the binary itself.

In program like Coreutils to me is more important that the programs are small than the rest. Typically you use a lot of commands in a script, to do some trivial operations (the input of the program is usually small), thus simpler programs (that have less startup time) are usually better.

nix23 · on Jan 29, 2022

>Coreutils to me is more important that the programs are small than the rest

Then use BusyBox/Matchbox....because..well that's the whole point of those projects.

https://busybox.net/about.html

zaarn · on Jan 29, 2022

After the first time the binary is executed, it should be comfortably in the memory disk cache. And frankly, on a modern SSD, the sector size is large enough that the only difference is reading 3 instead of 1 sectors to call "ls". Barely matters if it gets batched.

tialaramex · on Jan 29, 2022

Slightly less than half, but yes. That could certainly be relevant. On the other hand inevitably some tests will be fragile or even outright wrong and so while your solution passes a different (possibly better/ faster) solution fails because the test is bad.

For example suppose you're implementing case-insensitive sort, you write a test and tweak it slightly so that it passes as you expected. I come along and write a slightly faster case-insensitive sort, and mine fails. Upon examining the test I discover it thinks I ought to sort (rat, doG, cat, DOG, dog, DOg) into (cat, doG, dog, DOG, DOg, rat) but I get (cat, doG, DOG, dog, DOg, rat) my answer seems, if anything better and certainly not wrong but it fails your test.

tharkun__ · on Jan 29, 2022

The test is right because it tests for the fact that lower case letters have a smaller representation as an ASCII character position. There is probably a ton of scripting and other software out there that assumes that this is how things are sorted. You also want a stable sort meaning sorting the same sequence twice you want it to be sorted the exact same way again. So you need this sort of definition of how things are supposed to be sorted.

So while I agree that without historical context and compatibility for human consumption your way of sorting is probably fine, you could wreak major havoc when trying your sort as a drop in replacement. If you only have a few users change management for something like this is relatively easy. The user base of coreutils? Think twice if you want to try that change management.

tannhaeuser · on Jan 29, 2022

> The test is right because it tests for the fact that lower case letters have a smaller representation as an ASCII character position.

It's the other way around, though.

tharkun__ · on Jan 29, 2022

See why having tests is great to define how a system is supposed to work? Much better than memory.

tialaramex · on Jan 29, 2022

If the test is actually the specification then, sure I guess, but in most cases it isn't.

Notice that the "failed" example exhibits stability which you claimed was desirable, while neither exhibits sorting by case, this is after all a case-insensitive sort, the "successful" example is just swapping some of the list items for whatever reason, maybe it was how their chosen algorithm worked, maybe it's a bug, they wrote the test so they get to fill out a "correct" answer that matches their behaviour.

Now, striving to pass such tests gets you bug-for-bug compatibility which is what you want if you're an emulator, but the GNU project started out deliberately not doing bug-for-bug because it means people accuse you of copying, and so I don't see why this project should be different.

AndrewDucker · on Jan 29, 2022

214 passes, 298 fails. That seems rather more than 50% failing, even before the errors and skipped tests.

tialaramex · on Jan 29, 2022

There are 611 tests, 298 is less than half.

mlindner · on Jan 29, 2022

There are tests that are erroring as well. 214 passing out of 611 is less than 50% passing (in fact it's almost only 1/3 passing).

cedilla · on Jan 29, 2022

It is possible, but with benchmarks like "head -n 1000000 wikidata.xml" I doubt it. A comment in that PR says "the difference to GNU head is mostly in user time, not in system time. I suspect this is due to GNU head not using SIMD to detect newlines".

Unfortunately I couldn't find a list of failed/successful tests, if that's available I'd be happy if someone linked it

sylvestre · on Jan 29, 2022

https://github.com/uutils/coreutils/actions click on a CI job on main For example: https://github.com/uutils/coreutils/runs/4990891225?check_su...

The "Run GNU tests" is probably what you are looking for.

jacquesm · on Jan 29, 2022

An old mantra that has served me well: first make it work, then make it fast.

So first get it to 100% compatibility, then, and only then concentrate on performance. Because if you don't do it that way you will end up foregoing compatibility because you'd have to say goodbye to your beautifully tuned code that unfortunately can't be 100% compatible. As long as it does not pass all the tests: it does not work. Even if it works for some selected cases, the devil is in the details, and those details can eat up performance like there is no tomorrow.

notdonspaulding · on Jan 29, 2022

I follow this advice with one small modification. Namely, mine goes:

    - Make it once.  
    - Make it right.  
    - Make it fast.

"Make it right" as the first step can trick an unseasoned developer into never finishing a prototype. I don't mind the first iteration of something being sloppy and then pursuing correctness with an existing solution to the problem in hand.

I'm curious if you've got a sense of the interplay between prototyping and correctness similar to your sense of the interplay between performance tuning and correctness? Any thoughts?

jacquesm · on Jan 29, 2022

Yes, for green work that is a good scheme. I always joke I must be exceedingly stupid because I have to do everything three times before I get it done.

jeppester · on Jan 29, 2022

I've seen this advice many times before.

While I do generally agree, it can easily become an excuse for not thinking things through from the start.

If you have the wrong architecture it might be very difficult or even impossible to optimize performance later on.

Also: If working on a project for a client, performance can be difficult to sell when the feature already works. But that feature might break when it's put under load.

My advice would be to always have performance in mind. But otherwise to stay away from "optimizations" until they are needed.

jacquesm · on Jan 29, 2022

It's older than me. Thinking things through from the start is harder than it may seem for some classes of problems. Frequently the 'right' solution is several levels of insight removed from the ones that you will be able to achieve if you just sit down and 'think things through'. Only after you build it, benchmark your code against a large body of input and verify correctness will you have the appropriate insight required to really solve the problem.

God level programmers (of which I've met exactly one and know about one other over my whole career) can do this the first time around.

jeppester · on Jan 29, 2022

You are absolutely right. It was not my intention to claim that you will get things right the first time if you just think hard enough. Iteration is needed very often.

What I've seen many times is "best practices" like this being thoughtlessly applied.

A react component that rerenders 20 times, but passes the test. A database with no indices, no problem (until the number of records grow).

Things like that can easily be defended with "no need to optimise prematurely" or "I wrote the least amount of code to make the test pass".

jacquesm · on Jan 29, 2022

Indeed, in that context your comment makes good sense.

What helps in those cases: to have a good idea of how long you think something should take and then to verify that it indeed is within an order of magnitude of that first estimate. If it is much slower you are probably in trouble, if it is much faster than you will have to check if you are really doing all the work that you should be doing.

Colin Wright (also on HN) wrote: "You can't make computers faster, but you can make them do less work". The corollary is that if your program is faster than expected that may be because it is doing less work!

sva_ · on Jan 29, 2022

The coreutils consist of many individual binaries ('utils') which are more or less stand-alone[0], so I think this mantra doesn't apply here.

[0] https://github.com/uutils/coreutils/tree/main/src/uu

jacquesm · on Jan 29, 2022

It absolutely does. Each utility is a program all by itself and the compatibility tests apply to each and every one of them. How fast they are is not relevant until they are 100% drop in replacements.

btdmaster · on Jan 29, 2022

It might not apply for the whole project, but it certainly does apply for each individual per-command functionality (cp, mv, or even the general cross-command features like the filesystem interface).

pjmlp · on Jan 29, 2022

It does, as many times as there are binaries.

pjmlp · on Jan 29, 2022

Now that is something I fully agree with.

MailNerd · on Jan 29, 2022

Great work! Looking forward to a Linux system where the majority of user land is written in a safe and performant language!

pjmlp · on Jan 29, 2022

It is not the first effort to do so, hopefully this one has more wind to come through.

gravypod · on Jan 29, 2022

If you needed to do low level systems programming in Rust would you be able to use this as an OS abstraction layer by importing these sorts of things: https://github.com/uutils/coreutils/blob/main/src/uu/chmod/s...

Is this one of the intentions of this team? It sounds like it could make "scripting" in rust very nice if all of the CLI functions you're used to exist as language libraries.

terts · on Jan 29, 2022

That is a very interesting idea! It is not something we've discussed before as far as I know. For some utils (like chmod) it might be possible, but any util that is focused on outputting information currently does so by printing directly to stdout. So, ls doesn't give you a Vec or Iterator of listed files for instance, but instead prints them to stdout. Nevertheless, it would be a cool experiment to see if we can create something like a "ulib" crate that would provide the functions some functions for which it is possible.

Arcterus · on Jan 30, 2022

I believe I actually had a proof of concept of this functionality in a PR at some point. This would basically have to work like mesabox, where each utility takes some generic input and output and then runs based on them rather than using stdout and friends directly (e.g. to avoid blowing through all your memory by stuffing many gigabytes of output into a Vec on certain commands).

loeg · on Jan 29, 2022

That would be much more elegant than something like libxo.

tpoacher · on Jan 29, 2022

I do wonder, to what extent is this "increase in speed" the result of:

- Refactoring out buggy, convoluted, highly-backwards-compliant code for cleaner more practical code? - Reimplementing code in a manner which simplifies previously complicated bottlenecks that were there in response to bug reports? (and whose simplification potentially risks reintroducing said bugs again)

In all honesty, I would expect that "reimplementing coreutils" as above would have resulted in a speedup even if it was written in c again.

Am I wrong? Is there something about rust that inherently leads to an increase in speed which one could not ever hope to obtain with clean, performant c code?

StefanKarpinski · on Jan 29, 2022

Sometimes I would imagine (I’m no expert in Rust but do know a bit about compilers), Rust’s ability to guarantee unshared access to memory can probably enable optimizations that are hard to coax out of a C compiler.

Many libc functions are also much slower than they could be because of POSIX requirements and being a shared library. For example, libc’s fwrite, fread, etc. functions are threadsafe by default and acquire a globally shared lock even when you aren’t using threads (you can opt out, but it’s quite annoying and non-standard) which makes them horribly slow if you’re doing lots of small reads and writes. Because libc is a shared library, calls to its functions won’t get inlined, which can be a major performance issue as well. By comparison Rust’s read and write primitives don’t need to acquire a lock and can be inlined, so a small read or write (for example) could be just a couple of instructions instead of what a C program will do, which is a function call (maybe even an indirect one through the PLT) and then a lock acquisition, only to write a few bytes into a buffer. That’s a lot of overhead!

And finally, Rust’s promise of safe multithreading no doubt encourages programmers to write code that utilizes threads in situations where only the truly courageous would attempt it in C.

agumonkey · on Jan 29, 2022

Anybody has stats on which coreutils programs are the most used ? I wonder the impact of gnu userland on system perf.

sylvestre · on Jan 29, 2022

I don't have data but I guess ls, cp, mv, ln, chown, chmod, sort and cut are the most popular. Overall, I don't think it is a key part of the OS regular work. Most of the servers and system are busy with services, db, browsers, etc...

hobofan · on Jan 29, 2022

Am I missing something, or how does a PR that bumps the version of an existing brew formula translate to "Brew is proposing coreutils"?

The_rationalist · on Jan 29, 2022

How about making all ELF binaries symbol processing/retrieving faster? The gnu hashmap is very much obscolete and should be replaced by Swisstable. Key observations like this will keep being ignored for the decades to come.

hiccuphippo · on Jan 29, 2022

If one were to replace coreutils in a GNU+Linux system with this, which is MIT licensed, would it be fair to call it MIT+Linux?

wiz21c · on Jan 29, 2022

Yeah, I see the pun. But leaving the GPL license is a very bad idea in my opinion. If we have to rest son shoulder of giant, I'd prefer them being made of concrete, not dust. Coreutils are really a common and should be protected against appropriation.

Depending on how much a rewrite it is, GNU's intellectual property may be enforced at some point. But that's a tricky question for a lawyer. If it were me, I'd have asked GNU first.

mustache_kimono · on Jan 29, 2022

Um, GNU coreutils is a reimplementation of various AT&T UNIX utilities. It's irresponsible to say GNU has IP rights to this implementation, especially where GNU has done a similar reimplementation(!), and where EFF, etc., has made contrary arguments, as amici, in cases like Oracle v. Google.

wiz21c · on Jan 29, 2022

Agreed, I was completely off on this one :-/

nickysielicki · on Jan 29, 2022

Nobody is going to steal a coreutils implementation, make a bunch of internal contributions to it, and yield a competitive advantage through that. The whole point of coreutils is that it’s old and stable and coreish.

In the real world, corporate engineering middle managers do not understand the GPL and do not understand how it does and doesn’t restrict them. They avoid it rather than taking the time to learn it.

codeflo · on Jan 29, 2022

I also think “GNU+Linux” is a terrible name and a worse marketing move, but you might misunderstand what it’s about. There’s no “GNU license”, and nobody ever proposed calling the system “GPL+Linux”. GNU sees itself as a project to replace the entirety Unix (hence the name, GNU’s Not Unix). A kernel is one part of the full OS, so if you combine the Linux kernel with the rest of GNU’s OS, according to this logic you get GNU+Linux. (Or maybe it should be GNU-Hurd+Linux, or GNU=~s/Hurd/Linux/... it's terrible.)

loeg · on Jan 29, 2022

Sure, it’s not “MIT+”, but coreutils is one of the main GNU things. If you replace it with a non-GNU version, a non-GNU libc (musl), non-GNU binutils (llvm), etc, you could imagine a usable Linux distribution that would be inaccurate to call “GNU/Linux.”

yjftsjthsd-h · on Jan 29, 2022

No need to imagine; Alpine Linux exists today and IMO isn't GNU/Linux (I don't know what their default compiler chain is, but Alpine is musl libc and busybox for coreutils, so you can easily have an Alpine system without GNU components).

edgyquant · on Jan 29, 2022

For sure that’s why don’t call Android GNU/Linux since it only uses bash and almost no other GNU utils

yjftsjthsd-h · on Jan 29, 2022

Android doesn't use bash, though? (I agree with your point in general, just pointing out that Android uses, IIRC, mksh)

j16sdiz · on Jan 29, 2022

Iirc, most of the gnu coreutil came from its locale handling.

It supports lots of different legacy encoding.

RcouF1uZ4gsC · on Jan 29, 2022

I can see this project really taking off.

It has two advantages on the GNU version apart from just memory safety with Rust.

1) Ease of portably building. Just type ‘cargo build —release’ and it will just work, even on Windows.

2) MIT License. A company can take these and distribute them as part of a commercial offering without having to worry about GPL compliance.

trelane · on Jan 29, 2022

I'm not sure how point 2 is and advantage for anyone but the company tbh.

It's also going to be _really_ hard to be more portable than GNU coreutils, when it comes to platforms it's available on.

nickysielicki · on Jan 29, 2022

The whole value of coreutils is that it’s old and stable. I just can’t imagine someone making major contributions to rust coreutils and keeping them internal as some sort of competitive advantage.

I do embedded linux for work and while OpenEmbedded makes it pretty easy to manage license obligations, it’s always a pain in the ass to deal with GPL code in a larger team where people will find any excuse to bikeshed about their misunderstandings of the GPL. For questionable historical reasons, my current job manually whitelists every GPL package we ship, so it being MIT makes things less abrasive to teams like mine.

That’s an advantage for everyone, because it makes me more likely to use it and it makes me more likely to submit bug reports and fixes.