Thank you, thank you. I haven't reviewed this, and I may or may not use it, but wow was freebsd-update from 13.0 to 14.0 slow, especially since I run my freebsd systems off spinning drives, and most of them have src.
I did some looking at freebsd-update, and it's quite the shell script, which makes it daunting to consider changing. All the forking and whatever doesn't seem great; using data structures should be a lot simpler!
But my systems running freebsd-update spend an awful lot of time on I/O, and not so much time on cpu, as I recall. I think the general flow for a replaced file is gunzip to the update dir, then install to the destination. This means all the files are written twice, and I thought there was a third write that I can't remember at the moment. Install doesn't have an option to decompress, but it'd be super handy if it did --- then you could write the file only once to the destination directory and move it into place.
I'm never a fan of quite as many files in a directory as freebsd-update gets up to either. Given that everything is already named by hashes, it's not too hard to have 16 or 256 directories named 0-f or 00-ff that get a slice of the files. Large directories have gotten better over time, but they're still slow.
There's some ability to exploit the embarassing parallelism available too, but that's hard in a complex shell script, so some other environment makes sense. I didn't look, but hopefully that's a knob in rustdate ... Sometimes you'd want it and other times one thing at a time is better, even when it's slow.
One feature I know update can do that I want to look at more but haven't had time is the zfs boot environment stuff --- seems like with the right arguments you could do the update on an inactive environment and reboot to switch, instead of the default way which does the install on the live environment but you could reboot to switch back. I'd be much more patient if the live environment continued working and I could reboot at my convenience once the process finished. Especially if I might be able to run pkg upgrade in the boot environment before the reboot. If rustdate supports that too, that'd be neat.
> I did some looking at freebsd-update, and it's quite the shell script, which makes it daunting to consider changing. All the forking and whatever doesn't seem great; using data structures should be a lot simpler!
It's fairly well broken down, and pretty well organized. I think the most difficult part is that so much of it relies on shell nice-isms that are a pain to re-implement in something like C (although, I guess you could just call fetch and the like directly via 'exec' or 'system'; rather than using their libraries to manage things directly).
At the very least, the uutils project seems to be making good progress on becoming a "true" drop-in replacement for the GNU coreutils that will pass the GNU test suites and treat compatibility issues as bugs.
Honestly though, as time goes on, I'm not sure if rewriting UNIX utilities to be memory-safe actually gets you that much. I mean, having multiple implementations of something as important as the coreutils is good: nobody complains about the co-existence of e.g. BSD and GNU utilities, and IMO there's really no reason to feel any different about RiiR rewrites. That said, I can't really figure out what attack model would really make memory safety a particular priority. In most cases you would be isolating things that require these sorts of utilities using namespaces/jails/etc, and I wonder if memory bugs in the coreutils is actually a serious attack vector.
Of course in this case (FreeBSD-rustdate) it's literally for performance reasons rather than safety, so everything sort of makes sense (not necessarily using Rust for safety per-se, but for robust threading. That does make sense.)
I think it's worthwhile just to get better correctness, regardless of security.
There is some project to write or re-write a cad engine in rust which I think is valuable not for any security reasons but just to get a less buggy cad engine, including easier to keep less buggy as work goes on for years.
I wouldn't have thought freebsd-update was very high on the list of things in desperate need of an overhaul, but I'm willing to grant if the people working on it think it's worth their time, then it probably is.
However the fact that you can make a bad wall out of good bricks, and even a good wall out of bad bricks with enough care and effort and ingenuity, is orthogonal to the fact that good bricks are better than bad bricks.
I mean, yeah, I agree, but is it really worth it? Should I use a Rust rewrite of coreutils that is in its infancy just because it is written in Rust and not C? If it was written in Ada / SPARK and it was formally verified, then yeah, I would definitely go for that rewrite of coreutils, but this is not the case with Rust.
I think libmagic (the file util) might be a good target for this. I think it had security issues in the past and considering how it contains a gazillion random parser for weird file formats, surely there is a good chance there are some more. But nobody would ever run file on a random file you just downloaded, would you? ;)
Yeah, something like that would probably be quite good. That said, I think even if you do have a nice memory-safe implementation of libmagic/file, it's probably a good idea to still use seccomp/namespacing/etc. to jail it when using it in security critical contexts. Those features don't really incur much cost so it's a free extra layer of security, and you still get the robustness bonus of guaranteed memory safety.
It’s not just memory safety, it’s also hackability. Idk how I would go about adding a feature to ls (both GNU and BSD). I have a concrete idea of how I would go about adding a feature to the Rust implementation.
That doesn't sound very compelling: C is still a much more popular language than Rust and a lot simpler too. I realize the latter is somewhat subjective, but it's blatant enough that I'm willing to assert it confidently.
If you want to know how to add a feature to ls in GNU coreutils, you don't have to look very far. The whole program is in src/ls.c. It's pretty large due to the amount of options, but it's pretty simple code all things considered. If you look at a program with less switches, like mkdir, it's even easier.
So what about FreeBSD? Well, not only does it have an official GitHub mirror, but it even accepts "simple" pull requests, apparently. And the implementation of ls is a bit more approachable, since BSD utilities typically don't have as much functionality.
I don't think Rust is going to make things significantly more accessible. It does indeed draw people in and maybe make them less fearful versus C, but in practice a lot of the reason why contributing to these large projects is scary is nothing to do with the language and more to do with the arduous requirements of any project that is as complicated and widely used as they are.
For example, does Rust actually make kernel development more accessible? Maybe not:
> Despite more novice developers being attracted by Rust to the kernel community, we have found their commits are mainly for constructing Rust-relevant toolchains as well as Rust crates alone; they do not, however, take part in kernel code development. By contrast, 5 out of 6 investigated drivers (as seen in Table 5) are mainly contributed by authors from the Linux community. This implies a disconnection between the young and the seasoned developers, and that the bar of kernel programming is not lowered by Rust language.
So if you want to add a feature to ls, maybe you shouldn't wait for operating systems to switch to Rust alternatives.
But when you think about it, it is not surprising that adding a feature to ls is not easy. I mean, you really wouldn't want new features being added to utilities like ls without a ton of care and thought put in, and that is true no matter what language the utility is written in. The difficulty of contributing to some projects has very, very little to do with the actual process of writing code.
"Simpler" is part of the problem. The "simplicity" is delivered by keeping more in the programmer's head and not writing it down in the C source code. But if you're not the original programmer (or you have since forgotten) that's much worse.
Trivial example would be the benefits of a richer type system: You have a TCP socket, a file descriptor, and a non-negative counter
In C that's an int, another int, an unsigned int
In Rust that's TcpStream, OwnedFd, usize
The C is simpler, there are only two types and they're both just integers. But of course in reality the more complicated Rust types model what's actually going on, these are not just integers, we shouldn't do arithmetic on an OwnedFd or a TcpStream, and we can't use a usize when we needed a TcpStream, they aren't the same kind of thing. In C that knowledge lives in your head as the programmer instead.
And I disagree in practice too, as a very experienced (decades) C programmer and a relatively novice (less than 5 years) Rust programmer I am much more confident contributing to a Rust project.
To be honest, Rust having opaque handles is a pretty unimpressive demonstration of its power. If that's the main thing that code had to benefit from Rust, it would absolutely not be worth a massive new toolchain with increased memory requirements and longer compile times.
Obviously, programming language design hasn't stood still in the intervening years since C was created, but C had a lot of momentum and we don't just do massive rewrites for small incremental gains. IMO the only real reason why Rust has been compelling and stands out from most other options is the borrow checker. That's the thing that is truly compelling and that people have been enduring great pains for. However, it's asymmetrical; not all code is going to have very much to gain from the borrow checker, at which point any modern systems programming language and some older ones are similarly compelling. (Note that I'm not suggesting there isn't still benefits to having the guarantees of the borrow checker, especially if you really need memory safety guarantees... but many people don't necessarily, and some that do need even stronger guarantees like formal proofs.)
As far as opaque handles go, you could easily accomplish that with a subset of C++, or even with C (with an asterisk, but one that would not hinder a project like coreutils from doing so.) When talking about stuff like this we're not even really talking about C, but more coding standards that were prevalent with old UNIX and UNIX-like code. And sure, improving that is net gain, but it has little to do with the unique merits of Rust.
Outside of handles, the integer type situation in C is worse than Rust or really any modern programming language, but at least most modern code will cling fairly strongly to <stdint.h> types instead.
P.S.: I speak with relatively similar experience on both fronts, but I can't say the same about being particularly more confident contributing to Rust projects. It's not that it's particularly low, but 1. I don't find too many opportunities and 2. I have never found contributing to C/C++ projects to be that terrible, and I still do so occasionally.
Borrow checking makes the type system actually work, so this matters everywhere. Take Rust's Mutex<T>. You can (and people do) make such a type in C++. But in C++ the type's key feature - that you can't "forget" to take locks and release them - is destroyed by the lack of type safety, they have to ask you to please be careful not to break it, whereas in safe Rust you can't break it.
Google has an analogue of OwnedFd for C++ and again the language doesn't preserve type safety so they need to nag you to please not do all the so-easy things which will destroy type safety and make the type worthless. OwnedFd doesn't need to remind you about that.
Neither C nor C++ have niches, and so when they do attempt a Maybe type - which is very useful in system programming in my experience - it's significantly heavier than the integer handles programmers were used to instead, and of course this means people won't use it in these applications. Rust's Option<OwnedFd> is the same representation (a 4 byte integer type) as the C integer you'd have used, but with the same ergonomics as any other Maybe type, leading to improved safety despite same performance.
Also, I'd say the characteristic difference isn't technology like the borrow checker, but Rust's Culture, the technology is just enabling that culture.
Well of course it matters everywhere in Rust, the borrow checker is how Rust accomplishes those things. It is novel. It's an approach to memory and data-race safety that does not require a GC, and it has relatively low overhead.
The problem with move semantics in C++ is the lack of borrow checking, but simple programs (e.g. a lot of the coreutils realistically) very little transfers of ownership are actually needed, because the programs are just so relatively simple. Meanwhile, GC'd languages can usually ignore this altogether, at the cost of some overhead.
It's not just the borrow checker. One interesting way to see how much lifting is needed is to examine the "Circle C++" compiler. Sean Baxter's language / C++ experiments takes the same approach and you'll see Sean needed not only an equivalent to Rust's borrow checker, but its trait system (roughly equivalent to the Concepts proposal in C++ 0x, not the "Concepts Lite" which is modern C++ 20 Concepts) including Rust's auto traits Send and Sync, and a whole bunch of stdlib infrastructure.
If you try to do without Send and Sync you have to choose, either your language has data races and they induce UB (as in, say, Go and several popular garbage collected languages) or your language doesn't have real multi-threading. Rust needs these traits to be able to provide a data rate free safe language with multi-threading. Could other approaches exist? Well for one thing after Rust 1.0 there has been considerable work on theory and practice for Java-like loss of sequential consistency without UB. But there isn't today any good "This definitely just works" alternative. For a future Rust-like language, an industrialization of best practices, this is the best we currently know although it's reasonable to expect that one make a decade from now will have better options.
> Despite more novice developers being attracted by Rust to the kernel community, we have found their commits are mainly for constructing Rust-relevant toolchains as well as Rust crates alone;
Is this surprising? The tooling is what needs fixing right now. Rust has been in the kernel for what a year? What are you expecting?
Oh sure, but for anyone hoping that a glut of new Linux contributors are just around the bend once the Rust tooling is in place might be suffering from unwarranted optimism. Judging by mailing list discussions, it's looking like Rust in the Linux kernel is just as complicated as C in the Linux kernel, and it won't really be a revolution so much as an incremental step. I believe that many kernel developers will probably adopt Rust because it offers them better facilities, but I don't think most Rust developers will suddenly take up kernel programming.
Impressive. The current state doesn't really bother me given the frequency of system updates but I do sometimes wonder what on earth it's even doing that's taking so long.
Copyright 2024 Matthew D. Fuller <fullermd@over-yonder.net>
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
I think this may be a person who is concerned a out being tainted by a license. Life if the code was source-available only and they saw it and then were accused to copying some functionality into another open source project.
Now seriously, I'm not particularly interested on this, but if it helps here are additional points of reference to at least be able to verify some integrity in case you want to try it.
Doesn't prove the absence of any malicious intent, but at least it should help prove that nothing is changing between requests (i.e. that the file I got is the same as the file you got).
I did some looking at freebsd-update, and it's quite the shell script, which makes it daunting to consider changing. All the forking and whatever doesn't seem great; using data structures should be a lot simpler!
But my systems running freebsd-update spend an awful lot of time on I/O, and not so much time on cpu, as I recall. I think the general flow for a replaced file is gunzip to the update dir, then install to the destination. This means all the files are written twice, and I thought there was a third write that I can't remember at the moment. Install doesn't have an option to decompress, but it'd be super handy if it did --- then you could write the file only once to the destination directory and move it into place.
I'm never a fan of quite as many files in a directory as freebsd-update gets up to either. Given that everything is already named by hashes, it's not too hard to have 16 or 256 directories named 0-f or 00-ff that get a slice of the files. Large directories have gotten better over time, but they're still slow.
There's some ability to exploit the embarassing parallelism available too, but that's hard in a complex shell script, so some other environment makes sense. I didn't look, but hopefully that's a knob in rustdate ... Sometimes you'd want it and other times one thing at a time is better, even when it's slow.
One feature I know update can do that I want to look at more but haven't had time is the zfs boot environment stuff --- seems like with the right arguments you could do the update on an inactive environment and reboot to switch, instead of the default way which does the install on the live environment but you could reboot to switch back. I'd be much more patient if the live environment continued working and I could reboot at my convenience once the process finished. Especially if I might be able to run pkg upgrade in the boot environment before the reboot. If rustdate supports that too, that'd be neat.