Uutils: an attempt at writing cross-platform CLI utilities in Rust

Semaphor · on Dec 6, 2021

2014, 208 comments: https://news.ycombinator.com/item?id=7882211

2016, 490 comments: https://news.ycombinator.com/item?id=11337399

agsamek · on Dec 6, 2021

What is the progress over years?

steveklabnik · on Dec 6, 2021

One fun recent development in 2021: https://sylvestre.ledru.info/blog/2021/03/09/debian-running-...

> tldr: Rust/coreutils ( https://github.com/uutils/coreutils/ ) is now available in Debian, good enough to boot a Debian with GNOME, install the top 1000 packages, build Firefox, the Linux Kernel and LLVM/Clang.

GhettoComputers · on Dec 6, 2021

Thanks, I heard a lot of people saying its not complete but I haven't had any issues with it so far, pretty huge endorsement to see its even a debian package, that's a huge sign of maturity.

pixelbeat__ · on Dec 6, 2021

As a GNU coreutils maintainer, I'm happy to see another implementation. A lot of the coreutils development effort goes into testing, and new implementations should be able to leverage the existing GNU coreutils test suite, as it just uses whatever utilities are in the $PATH

BTW some notes on how the GNU coreutils are tested are at: https://www.pixelbeat.org/docs/coreutils-testing.html

arcticbull · on Dec 6, 2021

> A lot of the coreutils development effort goes into testing, and new implementations should be able to leverage the existing GNU coreutils test suite, as it just uses whatever utilities are in the $PATH

This makes me really happy.

Separately from this, I've been wondering for a long time if there's a way for standards (and de facto standards) to share test suites for other implementers to re-use. Sort of a npm but only for test suites. Does such a thing exist? I wrote a TOML parser recently and had to re-derive the test suite from the specs.

southerntofu · on Dec 6, 2021

I've started thinking about this last year and wrote down some thoughts in this draft post: https://ttm.sh/2l3.md

> Could we get the best of both worlds by treating specification and compliance (testing) as a single problem? This hypothetical approach i call specification-driven development, whereby a specification document is intended both for human and machine consumption. In that case, the specification contains a written presentation of concepts, in addition to a machine-readable test suite that follows a certain format to programmatically ensure that the concepts and behavior described in the specification are implemented properly.

I've centered the document on my personal usecases (CLI and sysadmin checks) but i don't see a reason it couldn't be employed for API/ABI checks.

williamdclt · on Dec 6, 2021

I've been thinking about that when working with recurrence rule libraries. It's a standard about calendaring and there's millions of edge-cases: developping your own library for prod purposes is a huge risk and most of the effort would probably be in tests. If all these projects shared a test suite, they could:

- Fix their own existing bugs - Help other projects to fix their bugs - Be explicit about what they support and what they don't - Help writing new libraries (eg in a faster language, or for another ecosystem)

ahartmetz · on Dec 6, 2021

OpenGL, for example, has an official test suite as well as one or more open source ones (I know of piglit).

arcticbull · on Dec 6, 2021

Indeed, I'm picturing sort of a canonical repository for test cases (with a common interface). I would love to have a place I can obtain a test suite for a given standard, execute it, and even potentially publish my conformance. Potentially even have a badge to post on GitHub repos indicating conformance to a certain version of the standard.

marcodiego · on Dec 6, 2021

Public tests are good but relying on public tests only encourages kludges to make tools pass the tests. A form of overfiting. That's why some compression tests do not publish their test corpora.

johncolanduoni · on Dec 6, 2021

But compression tests only have that issue because there are degrees of success for those tests, even for the same compression algorithm. I don't think they're hiding the test vectors because they're worried about gaming the tests by purposely failing to process valid inputs to achieve better metrics, just writing overly specific heuristics.

For straightforward corner case acceptance tests (which I would assume covers most of the coreutils test suite) there's not really a danger of overfitting unless the developers are literally writing if statements that match a single input from the test and provide the correct output.

staticassertion · on Dec 6, 2021

That feels like an issue to be solved by the tests. If a program conforms to the public test suite the program "works" or the tests aren't covering the specification.

johncolanduoni · on Dec 6, 2021

For compression tests it's a little different, since the problem is often underdefined even for a fixed algorithm and different implementations may produce encodings with different efficiencies (space, compression time, decompression time) for different inputs. Compression implementations can overfit on some inputs and produce subpar results on average even if they produce valid outputs for all inputs.

However I doubt this applies to coreutils' tests, which I suspect are more about conformance.

staticassertion · on Dec 6, 2021

That makes sense if the public tests are "compress these bytes, expect this output" but I'd expect instead to have a lot of specific components with their own individual tests.

arcticbull · on Dec 6, 2021

Do you think this applies if a corpus contains both affirmative and negative tests? As in, including not just conforming JSON but also a set of JSON that should be rejected due to non-conformance? I agree it could be challenging for instance for compression - where there is a more challenging definition of 'wrong.' I'm just wondering if this idea has legs, and appreciate your thoughts.

marcodiego · on Dec 6, 2021

I can't answer with certainty, but I think any fixed set of tests can be cheated with overfiting.

arcticbull · on Dec 6, 2021

I do think that's a material risk - however broadly, do you think that such a scheme would make ecosystem better or worse? If you made a hypothetical JSON implementation in your language of choice, would you use it?

marcodiego · on Dec 6, 2021

> I do think that's a material risk - however broadly, do you think that such a scheme would make ecosystem better or worse?

I don't think it would be easy to cheat if:

  - tested implementation is open source: it would make cheating too obvious,

  - tests are constantly updated: it would make cheating too cumbersome and

  - tests include a randomization: it would not always work.

So, satisfying these points would drastically increase trust on the test corpus and tested program.

> If you made a hypothetical JSON implementation in your language of choice, would you use it?

On my machine? I use my own hacked kernel on my machine! In production? Only if tests indicate my implementation is as good as the best ones available.

ridiculous_fish · on Dec 6, 2021

Gave it a whirl with `cp`:

    > ./target/debug/cp /dev/null /dev/zero
    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }', src/uu/cp/src/cp.rs:1295:54

Good advice is "don't panic." Trying `mv`:

    > ./target/debug/mv . .
    ./target/debug/mv: cannot stat '.': No such file or directory

oof

Anyways the point is that there's lots of legitimate institutional knowledge baked into coreutils and a naive RiiR effort will re-introduce previously fixed bugs.

atoav · on Dec 6, 2021

Both are quite naive Errors I must say. The good thing about Rust is that it doesn't hide the complexities of the file system from you — it requires you to explicitly handle the edge cases. OR you know, you could just write a cp that panics when there is a PermissionDenied because you decided that never happens and happily unwrapped the Result.

The last one is a different kind. Rusts std lib indeed cannot stat . Which means they would have to translate . into the coreutils meaning manually. Their tests should have cought that one tho.

ajross · on Dec 6, 2021

> Rusts std lib indeed cannot stat .

But the kernel can! I mean, stat() isn't a standard library function, it's a system call with (reasonably) well-defined semantics. And "." is a valid path, which returns a valid struct stat block representing a guaranteed-valid (cwd is always live; you're always "somewhere" even if the filesystem has removed the directory) directory.

Coming at this from the perspective of "translate . into the coreutils meaning" is almost certainly the wrong way to think about it. It's rust that has the broken picture of the filesystem environment, coreutils is just doing unix.

user-the-name · on Dec 6, 2021

On Linux.

This isn't a Linux project, it is a cross-platform project.

ajross · on Dec 6, 2021

No, stat() of "." predates linux by almost two decades. And coreutils is a set of unix tools. You can build them on other systems, but only via emulation layers that are reponsible for handling the friction (and those layers would likewise be subject to this same logic).

Or I guess what you're saying is that uutils is a cross platform project? Which is true enough, but it's still responsible for faithfully representing the behavior of the underlying system. And when running uutils on linux, stating "." (in this case to detect that src and dst are the same file) is legal and valid.

It's true there are other ways to solve the same problem than emulating a linux syscall layer, but you have to pick one. You don't get to use "the standard library can't stat ." or "we have to run on windows" as an excuse.

user-the-name · on Dec 6, 2021

Nobody is saying this is not a bug. I am saying that "It's rust that has the broken picture of the filesystem environment" is not true in any way. Rust implements a cross-platform filesystem API. The fact that it does not work the same way as the Linux kernel does not make it "broken".

ajross · on Dec 6, 2021

That's a little out of context. The situation being described[1] is that "rust cannot stat the path '.'". A core requirement of coreutils is to work with a unix filesystem where "." is a valid path. Ergo, from the point of view of the requirements at hand, the rust filesystem layer is "broken". It doesn't do what uutils needs it to do.

[1] And I'll admit that I don't have any understanding of the actual low level issue here or why native pathnames are being interpreted by the runtime and not passed through to the OS layer.

user-the-name · on Dec 7, 2021

Because if you just pass them through, you will not have consistent behaviour across platforms. That was the original point: Rust implements a consistent, cross-platform filesystem API. It does not necessarily match whatever your platform happens to do.

ajross · on Dec 7, 2021

But uutils needs to necessarily match whatever your platform happens to do, or else it breaks[1] in silly ways like this when it turns out that "." stops being a path.

[1] Maybe this is all just a semantic argument about this use of the word "broken"? This is a long-standing usage. It doesn't mean that Rust's standard library isn't useful for anything, it means that it isn't useful here because of design choices that don't match the problem at hand. If it were something that can be fixed, it would just be "buggy". But it can't (for the reasons you mention!). Therefore it's "broken", not "buggy".

throwaway525142 · on Dec 6, 2021

What do you mean with "Rusts std lib indeed cannot stat ."? I wasn't able to find something related to this on the web.

steveklabnik · on Dec 6, 2021

I'm not sure what they mean either, this seems to succeed: https://play.rust-lang.org/?version=stable&mode=debug&editio...

rocqua · on Dec 6, 2021

Note that in the readme `cp` is listed as semi-done.

bradwood · on Dec 6, 2021

Wtf is "semi-done"?

There is "done" and there is "not done"...

smcl · on Dec 6, 2021

I imagine "semi-done" means "works enough that it can be dogfooded by the devs and tried out by the curious among us, but does not exactly mirror behaviour of GNU cp in terms of corner cases and error output".

I think we need to be more flexible than classifying things as done/not-done. Even GNU coreutils has open issues[0] - is it "done" or "not done"?

[0] = https://debbugs.gnu.org/cgi/pkgreport.cgi?pkg=coreutils

smt88 · on Dec 6, 2021

If that's how you categorize software, then no software has ever been "done". There are always new cases, bugs, features, and performance improvements. You will never run out of any one of those things.

In this case, semi-done probably means what it usually means: some common cases are handled, while others aren't yet.

baq · on Dec 6, 2021

if you believe that, there is not a single piece of done software in the world, besides perhaps tex.

einpoklum · on Dec 6, 2021

"semi-done" means:

1. Not done. :-(

2. A lot has been done already, i.e. not just-started.

3. Not almost-done, i.e. there's significant work still to be done

afiori · on Dec 6, 2021

In that case at least one of "done" and "not done" needs to be a spectrum; semi-done is a point on one of those spectra

dane-pgp · on Dec 6, 2021

You think that's confusing? Wait until you hear about half "A" presses.

bastardoperator · on Dec 6, 2021

Portions of code could be completed and others are not? See semifinished.

eecc · on Dec 6, 2021

WIP? Dude, chill… you’re not paying for it

mustache_kimono · on Dec 6, 2021

Could not get mv to fail as you did on the Mac or Linux. Would you re-run with the latest bits and let me know what your platform is? I am going to fix these two bugs, if they still exist.

mustache_kimono · on Dec 6, 2021

Are you running an irregular version of libc? Something else unusual?

mustache_kimono · on Dec 6, 2021

Same with cp. Could not reproduce on the Mac or Linux.

raverbashing · on Dec 6, 2021

So it doesn't work for cases where it shouldn't work. And it errors out but doesn't do anything stupid

I don't see the problem for something that was just released as a very preliminary version

(Though I don't know where cp is getting "PermissionDenied" from)

masklinn · on Dec 6, 2021

> I don't see the problem for something that was just released as a very preliminary version

Fwiw the project os half a decade old.

> (Though I don't know where cp is getting "PermissionDenied" from)

Have not looked at the problematic code but e.g. copy_file_range(2) says

> EPERM fd_out refers to an immutable file.

db48x · on Dec 6, 2021

While that is a good point, it is quite unsightly to panic from unwrapping. Instead it should propagate that error back to the top level where a nicer error message can be generated.

megous · on Dec 6, 2021

More importantly cp should continue copying other inputs even if one of the source files fails to be readable. Handling errors by panic doesn't suggest that will be the case.

db48x · on Dec 7, 2021

I don’t know if that is _more_ important, but I agree that it is also important :)

The reason I say this is that while users will expect this behavior, doing it well (so that it can be parallelized, for example) requires careful thinking about how errors should flow, how lines are printed to stdout, etc.

enriquto · on Dec 6, 2021

There is much to love about this!

GNU coreutils is one of the pillars of our civilization. Re-implementing it in several new languages can only benefit everybody, as it will lead to a higher understanding of all the implementations, including the original one.

So far, the tests of this Rust version seem to be used to compare against GNU coreutils, with a goal to attain feature parity. I hope in the near future, the careful memory management of Rust will motivate new tests that will be passed by the Rust version but failed bu the C one (e.g., some memory leak, or an input that triggers a segfault in the C version). Then, you will see a new curve in the graph concerning the number of tests failed by the C version, which at some point will cross the decreasing curve of the Rust version! The coverage of the rust implementation is still a bit sparse for that, but hopefully it will improve.

There is one feature, though, that is unfortunately missing in this Rust re-implementation: the fact that users of coreutils will always be able to examine the source code of the programs that they run. This is guaranteed by the original copyleft license. Βut this Rust version uses a "stealable" license that allows distributors of these programs to strip users of that right. This is really sad.

mprovost · on Dec 6, 2021

I'm writing a book "Rust From the Ground Up" which teaches Rust by rewriting a coreutil per chapter, from the original BSD sources. The first 3 chapters (true/false, yes, and head) are released. wc is next, then cat, cut, rev, and uniq.

https://rftgu.rs/

nickel_8448 · on Dec 7, 2021

The book looks very interesting. By when do you plan to complete the next set of chapters?

mprovost · on Dec 7, 2021

Thanks! I've written the whole book - now I'm going through and editing and typesetting each chapter and trying to publish one a month.

nickel_8448 · on Dec 7, 2021

That's wonderful. Do you have twitter? I just want to be notified once the book is complete.

mprovost · on Dec 7, 2021

Yes I’m posting updates here: https://mobile.twitter.com/rustftgu

Thanks for the interest!

siraben · on Dec 6, 2021

Nixpkgs has a PR for replacing the build environment of packages with uutils-coreutils[0]. From testing by building various packages over a dozen bugs were reported upstream, but a lot of software does build successfully with uutils[1][2], after upstream fixes, including chromium, vlc, nix, rustc, emacs, vim, uutils itself etc.

It's an interesting experiment to replace such a fundamental dependency with a rewrite. A lot of the build failures are due to uutils not implementing GNU extensions, some of which can get pretty involved.

[0] https://github.com/NixOS/nixpkgs/pull/116274

[1] https://github.com/NixOS/nixpkgs/pull/116274#issuecomment-85...

[2] https://github.com/NixOS/nixpkgs/pull/116274#issuecomment-86...

yjftsjthsd-h · on Dec 6, 2021

I'm not quite sure I follow; is this a proposal to actually use this in nix, or just a PoC that it could work?

siraben · on Dec 6, 2021

The latter! If merged the package set that uses uutils for stdenv would be accessible via an overlay (similar to the cross-compilation infra we have), so one could build pkgsUutils.hello to build GNU Hello, for instance.

Toaster-King · on Dec 6, 2021

The comments complaining about the choice of license reminded me that a FSF licensing intern did the same, and demanded it be changed due to some false claim about it being a derivative work[1]. Let me remind everyone that there was probably a good reason for using a permissive license, and no-one has the right to demand changes based on their definition of freedom - unless they stepped up and contributed.

If this truly injures you, you can either do your own rewrite or, like Muse Group, outright buy the rights to the software and relicense.

[1] https://github.com/uutils/coreutils/issues/834

marcan_42 · on Dec 6, 2021

There's also this issue by FSF director Ian Kelling:

https://github.com/uutils/coreutils/issues/1781

Honestly, it sounds like they're deeply scared of long-standing FSF-backed software having to compete (or even be replaced) with alternatives that just happen not to be GPL (e.g. they have similar thoughts on LLVM/clang), to the point where they decide to troll the issue trackers. It feels really childish.

josefx · on Dec 6, 2021

> with alternatives that just happen not to be GPL (e.g. they have similar thoughts on LLVM/clang)

LLVM/clang is worse than just GPL, as far as I understand the design of GCC is intentionally horribly bad to make it painful to integrate with a proprietary tool chain. Of course this means that things like refactoring support end up being basically impossible to build on GCC, so people move to clang out of necessity.

marcan_42 · on Dec 6, 2021

It's both a technical issue and a political issue; RMS himself has repeatedly advocated against interfaces that would expose internal representations that GCC uses because they could be used to build proprietary add-ons.

These days GCC does have a plugin interface, but they came up with a really funny hack to try to stop people from using proprietary plugins. Programs compiled with gcc can require linking with libgcc, and the libgcc license is "GPL, except it doesn't apply to software compiled with GCC and all-Free plugins". So their idea is that if you add a proprietary plug-in to GCC, that makes all code compiled like that have to be GPLed instead.

It is, of course, questionable whether this hack would hold up in court. It is also rather useless, because it's not hard to reimplement enough of libgcc to make it work. For example, the Linux kernel only links with libgcc for a few architectures.

Joker_vD · on Dec 6, 2021

Something something those who give up software freedom for software's usefulness deserve neither something something.

josephcsible · on Dec 6, 2021

> Honestly, it sounds like they're deeply scared of long-standing FSF-backed software having to compete (or even be replaced) with alternatives that just happen not to be GPL

They are, for good reason. Big companies hate open source, despite their claims to the contrary. Look at Apple's proprietary forks of all the BSD stuff, and handset manufacturers' proprietary forks of Android. The FSF should absolutely be doing everything in their power to minimize how often this happens.

marcan_42 · on Dec 6, 2021

That's an argument you can certainly make. Unfortunately for them, as it turns out, deciding licenses for other projects is not in their power, and trolling other projects' issue trackers is counter-productive to the goal of convincing them change their license.

user-the-name · on Dec 6, 2021

> Look at Apple's proprietary forks of all the BSD stuff

What forks of what stuff?

GhettoComputers · on Dec 6, 2021

OS X, Darwin off the top of my head

marcan_42 · on Dec 6, 2021

https://opensource.apple.com/release/macos-115.html

Here's the kernel, including all the BSD stuff:

https://opensource.apple.com/source/xnu/xnu-7195.141.2/

You can compile it yourself and run macOS with your own kernel build. Here's a blog by Apple's head of XNU development explaining how that works:

https://kernelshaman.blogspot.com/2021/02/building-xnu-for-m...

GhettoComputers · on Dec 6, 2021

Yeah and the open source people gave up, it’s like giving you a free recipe for a beef Wellington in a world they control of food. Nothing uses Darwin besides Apple

user-the-name · on Dec 6, 2021

And? The source is there. There is no requirement to do extra work to make it useful for anyone but yourself. They did not even need to release the source, but they did.

You made a claim about "proprietary forks of all the BSD stuff". The kernel is open source. The rest of the OS is written from scratch by Apple, and originally by NeXT, and is not a fork of anything.

Are you standing by your original claim?

GhettoComputers · on Dec 6, 2021

I never claimed that.

user-the-name · on Dec 7, 2021

What was this then? https://news.ycombinator.com/item?id=29459806

scaraffe · on Dec 6, 2021

If anyone else wants to do this as a learning exercise I found https://www.maizure.org/projects/decoded-gnu-coreutils/ which goes into design and code of most of the utilities from the original coreutils.

gizzlon · on Dec 6, 2021

Should be "uutils is an *attempt* at rewriting coreutils in Rust"

Not to shit on the project, just that the title made it seem like something ready to use .. (most of the tests are still failing)

dang · on Dec 6, 2021

Title changed to that from "Coreutils Rewritten in Rust". Thanks for the heads-up!

pxc · on Dec 6, 2021

This will probably sound completely insane, not least of all to people who prefer non-GNU coreutils, but I kinda hope that as uutils matures, some shells will offer an option to compile in the uutils rewrites as builtins. Especially for newer shells and shells that want to target Windows in a first-class way, that could be a real portability win.

smt88 · on Dec 6, 2021

Why is WSL[1] insufficient for Windows portability?

1. https://docs.microsoft.com/en-us/windows/wsl/about

dagw · on Dec 6, 2021

WSL is great if you want/need a full Linux environment. It is quite heavy and probably overkill if all you want is to use some coreutil commands from the Windows terminal.

tyingq · on Dec 6, 2021

Gitbash is a relatively nice solution for this.

mkl · on Dec 6, 2021

WSL1 is very light in my experience.

pxc · on Dec 6, 2021

I wasn't primarily concerned with Windows portability, but with the portability of scripts altogether, and no longer having to concern oneself with variations in coreutils versions for at least some scripts, even if they run on multiple platforms.

But to answer your question about Windows:

Because requiring an emulation layer is a huge step up in complexity from being able to distribute your shell as a single static binary which is just a few MB. Right now Elvish and Nushell, for example, can be deployed on Windows as binaries that don't require any external dependencies, any installation procedure, or any configuration. If they could link against uutils statically, they could also come with basic utilities without having an environment to manage.

WSL (along with MSYS2 and Cygwin) is deeply stateful. Now to make sure everything is working you have to check the compatibility of your whole system, and the versions of coreutils can still vary in addition to the version of the shell interpreter.

WSL2 also requires virtualization, which can complicate using it under virtual machines.

You may not want everything you run under your shell to depend on WSL, which is also happens with WSL2.

Even with WSL1, a script that requires a chroot environment to deploy and run it is way less portable than one that doesn't.

steveklabnik · on Dec 6, 2021

Because WSL is not Windows. It's a fantastic tool but it's not actual native Windows support. I use Windows to develop software on a daily basis, and I don't have WSL installed at all.

mprovost · on Dec 6, 2021

More interesting than rewriting the GNU coreutils are the projects that are reinventing them. Like ripgrep (which built on the example of ack) is a better grep, by discarding backwards compatibility. Or fd, which abandons find's arcane syntax. We've learned a thing or two since the 70s when most of these tools were first invented.

hencoappel · on Dec 6, 2021

Agreed, it would be an interesting idea to make a new coreutils with everything modernised/cleaned up to be easier to use and more consistent.

DarylZero · on Dec 6, 2021

https://www.nushell.sh/

GhettoComputers · on Dec 6, 2021

Tried it too bad no shell compares to fish for ease of use. Every new shell gets asked "can it do the thing in fish" and the answer is always kind of or no, zsh did not function as well even with the configs and it seemed much slower.

xixixao · on Dec 6, 2021

I was reminded yet again how terribly non intuitive the core file-system commands are for end user on the command line. Maybe having Rust implementations readily available would allow people to fork them and improve the interface in some new shell+utils combo package.

(rm, cp require -r, mv doesn’t, no auto-directory creation for mv - if you don’t believe me look at git/hg which don’t copy the semantics + I’d like reliable dry run and by default confirm every destructive action)

zrm · on Dec 6, 2021

> rm, cp require -r, mv doesn’t, no auto-directory creation for mv

These are all like that on purpose.

mv doesn't have -r because it's a safety feature for rm and cp, since deleting or copying an entire directory is expensive so you should have to specify it explicitly. Moving a directory is really just renaming it, which isn't expensive when it's on the same filesystem (and it usually is).

The most common case for the destination directory not existing when moving something is a typo in the path. Then you'd end up creating a directory you didn't intend to, possibly on a filesystem you didn't intend to, and moving all the other arguments over there. It could be useful to make it possible to explicitly specify that you want this, similar to mkdir -p, but you could add that to the existing mv without breaking anything.

afiori · on Dec 6, 2021

An inspectable and reliable dry run option and/or an interactive confirmation would solve some of these problems.

zrm · on Dec 6, 2021

Not really. Dry run is useful for complex commands you're not sure the effects of. These options are so that when you intend to type:

rm /home/john/some/file.txt

But you accidentally type:

rm /home/john some/file.txt

It says:

rm: cannot remove '/home/john': Is a directory

Instead of john not having a home directory anymore.

And people aren't going to use a dry run option or interactive confirmation every time they run a simple command like that.

afiori · on Dec 7, 2021

I was thinking more of

- listing how many files/directories would be moved

- listing how many files/directories would be overwritten

- telling whether the transfer will cross drives (especially for mv)

reasonably speaking this should not be included in the base posix cp and mv, but could be maybe provided as intrinsics in bash for example

the_gipsy · on Dec 6, 2021

The expensiveness is a bit naïve nowadays. Same for assuming typos by default.

zrm · on Dec 6, 2021

Deleting entire directories by accident is expensive in more than computing resources.

And copying them is still actually expensive. Data expands to consume all available space. Accidentally copy a 16TB directory structure and you're going to max out the I/O on your machine for hours and maybe run it out of space. It's not a big ask to type two characters to confirm that you want to do that.

the_gipsy · on Dec 6, 2021

It’s just inconsistent. There are a infinite possible typos that range from expensive to destructive.

marcodiego · on Dec 6, 2021

> rm, cp require -r, mv doesn’t, no auto-directory creation for mv

These make total sense for me.

signa11 · on Dec 6, 2021

why MIT license ? that is bound to be turnoff for a lot of folks.

also, a minor nit: there are still a lot of work [1] that needs to be done, and imho, it is a bit premature to title this article as is presented here.

[1] https://github.com/uutils/coreutils#utilities

woodruffw · on Dec 6, 2021

Who is MIT a turnoff for? It’s strictly more permissive on the consumer side than GPL is.

Which isn’t to say anything about how a project ought to be licensed; just that MIT enjoys overwhelming popularity with newer projects.

bayindirh · on Dec 6, 2021

> Who is MIT a turnoff for?

For me, for example. I personally prefer my foundations on xGPL (preferably V3 and later), because some company or set of companies just can't fork and run away with it.

I personally consider computing utilities and compilers essential infrastructure and their sustainability while being completely transparent are critical for me.

KZerda · on Dec 6, 2021

I'd argue the opposite. xGPL makes it easier for the founding company to just run away with it. We saw it with MongoDB, where it being AGPL means one company can control and unilaterally relicense it. Other examples I'd see of being GPLed doing little to nothing to prevent such shenanigans are any Oracle owned GPLed properties -- Java, MySQL, and VirtualBox all have user-hostile projects and misfeatures added even with the GPL. Conversely, permissive Free projects like LLVM and Postgres have had a lot harder time with one company controlling, because its non-copyleft nature means that everyone has a fair footing in controlling the direction.

bayindirh · on Dec 6, 2021

The problem is not xGPL, it's the copyright transfer. If you don't force copyright transfer on the patches you accept, you can't relicense a code overnight.

All "xGPL to shenanigans" incidents have underlying copyright transfer problems. Recently, an emulator had gone the same way. They asked for copyright transfer to be able to relicense from GPL, and some folks here have used derogatory adjectives for people who didn't want to transfer their copyrights.

goohle · on Dec 6, 2021

Java -> OpenJDK

MySQL -> MariaDB

OpenOffice -> LibreOffice

Chrome -> Chromium

VSCode -> VSCodium

and so on.

GPL protects my rights.

pjmlp · on Dec 6, 2021

Ah that is why clang in now loosing the race to being C++20, while two of its major founders are more than happy with C++17 for their OS stacks and main languages on those stacks.

So where are the others stepping in to fill the void reaching C++20 compliance and catching up to GCC and VC++?

Aeolun · on Dec 6, 2021

But I work for a company, and I want to run off with these things.

bayindirh · on Dec 6, 2021

Why obeying GPL and sharing your improvements is bad?

RedHat built a company on that model? Provide value and GPL won't threaten your business model.

Aeolun · on Dec 6, 2021

It’s not bad, it’s just that I’m not allowed to. I don’t like to inflict that on others.

Seeing something useful and then having to do it myself anyway because I cannot use it due to the license is painful.

cakrome · on Dec 6, 2021

What I know is if projects use permissive licenses, the software end users getting is usually proprietary. Copyleft is designed to prevent that from happening.

torstenvl · on Dec 6, 2021

It doesn't matter if some other project is proprietary. The proprietary company's proprietary product was always going to be proprietary.

What matters is whether they contribute back anything to open source. And it's a hell of a lot easier to get Legal to sign off on contributing back bug fixes and enhancements on a piecemeal basis rather than adding a recurring obligation to the books.

josephcsible · on Dec 6, 2021

> The proprietary company's proprietary product was always going to be proprietary.

The point of copyleft is to make this not true. Most companies don't have the resources to reimplement Linux from scratch, so they won't be allowed to make their drivers/kernel modifications proprietary.

torstenvl · on Dec 6, 2021

I know that's the theory, but I don't see that happening in practice. See, e.g., TiVo, Android, NVIDIA drivers...

karatinversion · on Dec 6, 2021

> The proprietary company's proprietary product was always going to be proprietary.

A lot is companies would have preferred to have proprietary operating systems, but the Linux license prevents them.

zaarn · on Dec 6, 2021

Plenty of companies aren't hindered by the Linux license as they are not required to give you the source most of the time, they just point you to their SoC vendor. Or worse, they just don't use Linux and use BSD or Windows Embedded instead.

I don't see this as a valid point.

gtsop · on Dec 6, 2021

> Who is MIT a turnoff for?

For me. I want my software to be as much gplv3 as possible. Not trolling. Also not arguing, it's just the way I roll.

gtsop · on Dec 6, 2021

I would honestly like to understand why the downvotes. I am giving a blatantly honest answer of a case of people like me who would be turned off by a licence. I don't expect anyone to necesserily agree but I don't get why my opinion is being downvoted. Is it insulting or irrelevant to the issue or what?

johnisgood · on Dec 6, 2021

I think it is a valid answer to a question that has been posed. I think the reason for the down-votes is that you did not elaborate on as to why MIT is a turnoff for you, and/or why you would rather prefer GPLv3 over MIT.

...or perhaps they did not like the last part of your comment, the "it's just the way I roll" one.

gtsop · on Dec 6, 2021

Maybe you're right. I thought it is understood why pro-gplv3 people like the licence.

Just to elaborate then, I personally want my software to use gplv3 because I side with the ideological/ethical aspects of free software. I want to support projects/teams/orgs that build this future. An mit licence would allow a company to capitalise on the community's effort withought giving nothing back. Or even worse, make modifications and deliver closed source blobs to users. I personally do not like that. Thus, I wouldn't support/use this library.

To be clear, I am not running 100% gpl software atm, but I see it as a journey going there, slowly transitioning.

signa11 · on Dec 6, 2021

> Who is MIT a turnoff for? It’s strictly more permissive on the consumer side than GPL is.

well, ever wondered why router vendors include GPL license paper in the box? this [1] is why...

[1] https://www.crn.com/news/applications-os/205100091/busybox-s...

GhettoComputers · on Dec 6, 2021

Didn’t they just switch to toybox? https://en.wikipedia.org/wiki/Toybox

pjmlp · on Dec 6, 2021

Apple, Sony and Nintendo are quite happy with it.

FreeBSD and clang communities, maybe not so much.

josephcsible · on Dec 6, 2021

MIT is a turnoff for potential contributors, not for users. A lot of people don't want to see their work that was done without compensation end up in some company's proprietary fork.

GhettoComputers · on Dec 6, 2021

Is it much better that they just credit them in some obscure notes nobody reads?

josephcsible · on Dec 6, 2021

The major issue is that the companies don't want to, in CC terms, "share alike". Attribution is a lesser concern.

GhettoComputers · on Dec 6, 2021

Wouldn't you run into the same problem with a company that runs linux computers and never contributes back?

buzzert · on Dec 6, 2021

> It’s strictly more permissive on the consumer side than GPL is.

You have been misinformed. The GPL is a much better license for users. MIT is arguably better for the authors because they can deny certain freedoms to their users (the ability to change the code on a system, for instance).

woodruffw · on Dec 6, 2021

Again, not interested in arguing the actual ethics of the licenses. But I think most people would consider a license that allows modification without demanding redistribution to be more individually permissive than one that requires redistribution. Companies love MIT and it’s ilk, because it’s more permissive of their (arguably poor) behavior.

josephcsible · on Dec 6, 2021

It's more permissive, true, but it's not more free. As an analogy, consider that taking away people's permission to own slaves didn't make society less free.

TuringTest · on Dec 6, 2021

The fact that you need to add a specifier for that 'individually' permissive says everything about the difference in the licenses' goals.

GhettoComputers · on Dec 6, 2021

GPL3 or GPL2? After what github did with copilot, Chinese companies that don’t release source code, or western companies that do the same I don’t see any ways to enforce them, you just need to trust the authors or the users. Has there been a landmark case that solidifies any of this? Chinese companies don’t follow US IP either so I find it hard to believe they can do anything.

deadbunny · on Dec 6, 2021

How does MIT prevent users changing the code? They have explicit permission to take the code and change it however they want.

CorrectHorseBat · on Dec 6, 2021

Not if you don't get the source with the binary.

josephcsible · on Dec 6, 2021

It doesn't prevent that directly. What it does is allow proprietary forks, and those proprietary forks prevent that.

rackjack · on Dec 6, 2021

MIT vs GPL brought up again?

Flame war! Flame war! Flame war!

darthrupert · on Dec 6, 2021

Ok, let's go!

You're stupid and probably ugly for wanting a flamewar!

nirse · on Dec 6, 2021

No, he's right. We're rubber and you're glue!

xvedejas · on Dec 6, 2021

Is it? I thought it was as standard as and on pretty much equal footing with BSD and Apache licenses. Is it substantially different?

no_wizard · on Dec 6, 2021

Who, exactly?

I think most don’t care, and big corps love the MIT license

GhettoComputers · on Dec 6, 2021

Don’t they love them all? Free software in general is nice, PS3 ran Linux, was based on BSD, and I’m sure they used MIT software somewhere there too.

yjftsjthsd-h · on Dec 6, 2021

Big corps love permissive licenses; they do not love all open source licenses, such as AGPL.

GhettoComputers · on Dec 6, 2021

I never heard of it, and I never seen it being used, whats a big project that uses it?

yjftsjthsd-h · on Dec 6, 2021

https://en.wikipedia.org/wiki/Mastodon_(software) would be one major example

GhettoComputers · on Dec 6, 2021

>Big corps love permissive licenses; they do not love all open source licenses, such as AGPL.

Doesn't Microsoft love using Linux for azure, and contributes to it heavily in code and money? Using binary blobs is how they can get around others using their contributions, and it seems like copilot has made a mockery of all the licenses anyway.

>Mastodon

Oh interesting, looks like it was forked by corporations though including Trump's Truth Media, and Gab, and Truth Media just shows the boilerplate github source code. To me it still seems they love all free code.

hayd · on Dec 6, 2021

What would be a better license ? (If you want GPL you can keep using GNU...)

duped · on Dec 6, 2021

Obviously a few of these need work, but I've been in the unfortunate position of recovering a Linux system with a busted .so that broke almost all of coreutils, but cargo worked and so did the coreutils alternatives. Static linking is an absolute godsend.

marcodiego · on Dec 6, 2021

In the old days, /sbin guarded statically linked binaries to used before dynamic link is available. Would surely help in this situation. Don't know exactly why that changed.

EDIT: Just checked, it is still there! And... It doesn't contains sudo. Can't see a good reason why.

otabdeveloper4 · on Dec 6, 2021

Use busybox instead.

duped · on Dec 6, 2021

Doesn't help when `sudo` is broken ...

pseudalopex · on Dec 6, 2021

How do these utils help when sudo is broken?

duped · on Dec 6, 2021

What I wound up doing was getting a working shared library and editing the LD_PRELOAD, then getting sudo back to use normal coreutils.

GhettoComputers · on Dec 6, 2021

I thought you would just end up doing everything as root lol

mariusor · on Dec 6, 2021

sudo and coreutils being broken are separate things though.

phendrenad2 · on Dec 6, 2021

Glad that this is MIT. Copyleft is great but it's important to have non-viral alternatives.

xiaomai · on Dec 6, 2021

"non-viral"? Sounds like late 90s Balmer talking.

(I understand being worried about the license of libs you're going to deploy, but the license of cli utils isn't something you need to worry about "infecting" your proprietary code.)

GhettoComputers · on Dec 6, 2021

MS contributed to Linux because their code was "infected" by GPL code from linux.

Kinrany · on Dec 6, 2021

Why is it important?

xyzzy_plugh · on Dec 6, 2021

This is very cool.

The problem with POSIX is that, while it's possible to implement the bare minimum, it's hard to not have a few extensions. There are some truly braindead ideas that GNU coreutils have absolutely improved upon.

Sadly, a new coreutils collection simply brings further incompatibility. There's already significant switching logic out there to handle BSD vs GNU coreutils (have you ever used sed -i?), adding another flavor makes this sort of thing dead on arrival, at least to me. I'm not retrofitting my scripts to support these implementations too.

Aside, Nix is the only real solution to this problem, as it can replace these tools wholesale, hermetically. Being able to depend on a specific implementation in a portable manner is rather compelling.

nomemory · on Dec 6, 2021

> Many GNU, Linux and other utilities are useful, and obviously some effort has been spent in the past to port them to Windows. However, those projects are written in platform-specific C, a language considered unsafe compared to Rust, and have other issues.

I wonder why some people in the Rust community seem to remind us every few sentences how Rust is superior to C because it's safer.

/s And now it's C, yesterday it was C++... what's next, ASM ?

c3534l · on Dec 6, 2021

Because they're justifying a project whose sole difference to the other implementation is the language its written in.

nomemory · on Dec 6, 2021

So it's more like a "Look, it's possible to do this in Rust!"

than

"We did it in Rust, and we believe it's better because ..."

?

LeonidasXIV · on Dec 6, 2021

It literally says why it is better and you even quoted it:

> However, those projects are written in platform-specific C, a language considered unsafe compared to Rust

Or to paraphrase it using your wording: "We did it in Rust and we believe it is better because it does not have memory unsafety issues that are the number #1 reason for security issues every year"

nomemory · on Dec 6, 2021

So without playing dumb or anything, you say they say:

"We have security issues every year because we program in unsafe languages like C, and to solve this, we should start rewriting things in Rust because it's 'memory safe'."

Amazing!

OJFord · on Dec 6, 2021

If you fancy living on the edge, there's already an AUR package 'coreutils-hybrid' that uses stable uutils (with unprefixed names) where possible, falling back on GNU coreutils:

https://aur.archlinux.org/packages/coreutils-hybrid

podiki · on Dec 6, 2021

I wonder if distros (on the maintainers side) really want to move to more Rust in such core pieces? But maybe, as the readme here suggests, this is more for non-Linux platforms?

I know in GNU Guix the more Rust is used has lead to struggles in packaging and building from source for non-x86-64 architectures (demands of building the Rust toolchain, bootstrapping), e.g. [0]. With something like librsvg [1] being a very low level dependency now, Rust has become rather integral to many parts, with any changes to these libraries requiring massive rebuilds for a source-based reproducible distro like GNU Guix.

[0] https://lists.gnu.org/r/guix-devel/2021-11/msg00197.html

[1] https://gitlab.gnome.org/GNOME/librsvg

mustache_kimono · on Dec 6, 2021

I've contributed a few small bits to uutils and in the testing I did, on those few small bits, the performance case is pretty exciting. Much easier to do concurrency with rayon, etc.

GhettoComputers · on Dec 6, 2021

I’ve spoken to my friend about this and why it’s so much faster, he said it could be done in c as well but rust is much easier to write. Pretty impressed with rust, I just heard many people insulting it at first, throwing around “memory safe” and it seemed like rust programmers were trying to ingress into established programs and there was a great deal of hatred for it.

I fell in love with the performance of rust CLI tools and mainly see its utility as a fast performing binary, memory safety is a bonus but if I had known about its performance earlier I would have ignored anti rust propaganda like quoting Linus saying “Nothing better than C”.

mustache_kimono · on Dec 6, 2021

Yeah, the hate for this project is ridiculous. I'm not a professional programmer. I did it just for fun, because I wanted to learn Rust. Some of these tools need someone to fill out some basic functionality, and you probably use some of these tools everyday, so you already know the spec. (FWIW I'd recommend contributing to anyone who is looking for a basic Rust project. It has a very welcoming project leadership.)

And, yes, so-called "free concurrency" ended up being an incredible performance story, although, full disclosure, I/we were just able to use rayon in many places because Rust is just that composable.

podiki · on Dec 6, 2021

Sorry if that came off as "hate" or being dismissive, did not intend it that way! Just genuinely curious given the difficulties I'm seeing on the distro side around Rust packaging, at least for Guix that goes from source in a reproducible/bootstrapable way (which seems very difficult beyond x86-64).

mustache_kimono · on Dec 8, 2021

Oh, not at all! I'm sorry I gave that impression. I was referring to uutils, and, more particularly, some folks (GPL advocates) who are disappointed about the choice of license (MIT).

See, for example: https://github.com/uutils/coreutils/issues/2757

DenseComet · on Dec 6, 2021

> he said it could be done in c as well but rust is much easier to write

This article is a good example

http://dtrace.org/blogs/bmc/2018/09/28/the-relative-performa...

GhettoComputers · on Dec 6, 2021

Thank you so much for this article, I knew benchmarks were usually nonsense in real world performance when I used it, but rust wasn't ever seen, and these real world implementations show that it really works well.

betimsl · on Dec 6, 2021

I believe that coreutils should be rewritten in Go. Much better option.

andreynering · on Dec 6, 2021

I've seem a few attempts. This is just one example: https://github.com/u-root/u-root/tree/master/cmds/core

EDIT: You may also be interested in https://github.com/mvdan/sh

loudmax · on Dec 6, 2021

Why would Go be a better option than Rust here?

Go and Rust are optimized for different types of environments. There are a lot of situations when selecting Go over Rust would be a more pragmatic choice. I don't see how coreutils is one of these.

tyingq · on Dec 6, 2021

It seems like a reasonable opinion to me. Looking at the list of what's in coreutils, I don't see anything where performance would be a huge issue other than perhaps dd, so any performance edge Rust might provide seems moot. And I would guess that Golang would be less code and faster to write/implement.

I would feel differently if it were something else, like gawk, for example. Rust would be a better fit there.

jgilias · on Dec 6, 2021

Here you 'Go':

https://github.com/aisola/go-coreutils

GhettoComputers · on Dec 6, 2021

Last commit was 2015, is it functional?

einpoklum · on Dec 6, 2021

I have filed an official issue with the uutils repository, asking the developers/maintainers to consider switching to GPL:

https://news.ycombinator.com/item?id=29456115

I've tried making non-confrontational arguments in favor of the switch. Consider adding to them if you agree (or counter-arguing if you oppose, I guess) - but please keep the tone there less argumentative than here.

MichaelBurge · on Dec 6, 2021

If we replace GNU coreutils[1] with this, could we finally call it "Linux" instead of "GNU/Linux"?

[1] And gcc with clang when bootstrapping

bayindirh · on Dec 6, 2021

Is getting rid of GNU that important?

woodruffw · on Dec 6, 2021

I’m not sure if this answers the “how important” question, but: a large part of the GNU ecosystem is functionally unmaintained. The various tools under binutils receive so few patches that they’ve been the standard punching bag for academic fuzzing research for the last 20 years, and the coreutils aren’t much better.

bogwog · on Dec 6, 2021

I think the point of this project is less about getting rid of GNU, and more about promoting Rust.

There's nothing wrong with GNU coreutils.

adgjlsfhk1 · on Dec 6, 2021

https://www.cvedetails.com/vulnerability-list.php?vendor_id=.... 4 vulns in the past decade that rust wouldn't have had.

ljm · on Dec 6, 2021

4 vulnerabilities in 10 years sounds like a fantastic record, considering that the coreutils as a singular package has been around for 20 years (and the original tools for far, far longer). That's twice as long as Rust has been around.

Not to mention that memory safety is just one aspect of security. Rust's ownership model might have prevented those 4 vulnerabilities, but that doesn't mean that a whole host of others couldn't have slipped through.

adgjlsfhk1 · on Dec 6, 2021

If 4 vulns in 10 years for 1 package is acceptable, you are accepting that your OS will constantly have bugs. Sure there are other security problems than memory safety, but of the exploitable bugs, buffer overflows of some sort account for a majority.

ljm · on Dec 6, 2021

My OS does constantly have bugs. It would be pretty unreasonable to say that my OS should never have bugs because of the practical impossibility of delivering perfectly bug-free software. So if people, after 10 years of trying, only found 4 critical vulnerabilities in all of coreutils, then that seems pretty good to me.

Rust itself has had 9 CVEs relating to memory safety in 2021 alone[0], which you can justify because Rust's development is highly active.

https://www.cvedetails.com/vulnerability-list/vendor_id-1902...

steveklabnik · on Dec 6, 2021

Recently there's been some folks "backfilling" CVEs. For example, the last one on that list was filed in 2021, but was fixed in 2015. The second to last one was filed in 2021, but was fixed in 2020.

ljm · on Dec 6, 2021

What would motivate someone to do that? Sounds misleading to file a bunch of CVEs years after the fact.

steveklabnik · on Dec 6, 2021

The idea is, a lot of tooling relies on the CVE system to determine if a system is vulnerable, and so making sure that there is an actual CVE filed for every security bug is a good idea for the robustness of said tooling.

At least, that's my understanding.

staticassertion · on Dec 6, 2021

It would be nice to have a totally memory safe userland, but it's at least worth noting that most of cureutils is generally not going to be working with attacker controlled data in a way that really matters.

Again, would be nice, just not where I'd start. But kudos to anyone doing the work.

GhettoComputers · on Dec 6, 2021

From a performance standpoint I love love trying them and eagerly seeing if I can replace the GNU tools. Its like using musl over glibc.

bogwog · on Dec 6, 2021

What kind of performance issues are you running into with GNU tools?

GhettoComputers · on Dec 6, 2021

I installed rust implementations and they’re all just much faster

haimez · on Dec 6, 2021

About as important as keeping GNU, empirically.

yjftsjthsd-h · on Dec 6, 2021

Yes:) In fact, if you're into that, I'd argue that Alpine is already not GNU/Linux, since it uses busybox and musl.

smugglerFlynn · on Dec 6, 2021

You’d have to call it “Rust/Linux”

wmf · on Dec 6, 2021

Most distros still use GNU libc but I guess you could build a distro with Musl and alt coreutils.

_wldu · on Dec 6, 2021

I've done this in go. It's fun to do, but the real reason I did it was memory safety issues in C. We need to implement as much code as we can in memory safe languages.

npigrounet · on Dec 6, 2021

This is so much better than the previous, buggy, unsafe and dangerous C implementation.

marcan_42 · on Dec 6, 2021

It's unfortunate that FSF director Ian Kelling doesn't see it the same way, and thinks this project has little merit other than to work around the GPL license of coreutils...

https://github.com/uutils/coreutils/issues/1781

I'm curious as to what extent GNU coreutils governance is tied to the FSF these days.

0x0 · on Dec 6, 2021

Isn't that a little rich, considering a lot of GNU software itself started as a reimplementation of proprietary UNIX applications under a different license.

josephcsible · on Dec 6, 2021

If you have a proprietary program, reimplementing it under a FOSS license is a good thing. If you have a copyleft program, reimplementing it under a license that allows proprietary modifications is a bad thing.

sbuk · on Dec 6, 2021

From the OpenBSD Copyright Policy (https://www.openbsd.org/policy.html) just for a different perspective from another FOSS project:

"While this may superficially look like a noble strategy, it is a condition that is typically unacceptable for commercial use of software. So in practice, it usually ends up hindering free sharing and reuse of code and ideas rather than encouraging it..."

josephcsible · on Dec 6, 2021

I find this claim hard to believe since Linux is GPL, and it's used commercially way more than OpenBSD is.