The Linux Kernel Prepares for Rust 1.77 Upgrade

kramerger · 2024-02-18T14:31:03 1708266663

The LKM post mentions binary size improvements.

One issues I have had with Rust applications is the huge binary size (yes, I know this has improved a bit lately). Is there a good comparison between kernel C and kernel Rust code in this regard?

bluejekyll · 2024-02-18T15:19:45 1708269585

This is a good guide on building small Rust binaries: https://github.com/johnthagen/min-sized-rust

This talks about going to extreme lengths on making the smallest Rust binary possible, 400 bytes when it was written, https://darkcoding.net/software/a-very-small-rust-binary-ind...

The thing is, you lose a lot of nice features when you do this, like panic unwinding, debug symbols, stdlib… for kernel and some embedded development it’s definitely important, but for most use cases, does it matter?

ericbarrett · 2024-02-18T20:11:05 1708287065

> debug symbols

In ye olden days it was common to distribute a binary without debug symbols, but to keep a copy of them for every released build¹. If an application crashed (panicked, signalled, etc.) you got a core dump that you could debug using the stripped binary together with the symbol file. This gave you both smaller binary sizes and full debugging capability at the cost of some extra administration. I'm not sure if this is possible with "stock" Rust, but if you need lean binaries but want to do forensic investigation it's something to look into.

1. https://sourceware.org/gdb/current/onlinedocs/gdb.html/Separ...

wongarsu · 2024-02-18T21:19:16 1708291156

It's possible, but rust follows platform conventions in only doing this by default on Windows. However it is now easy to configure by setting split-debuginfo in your Cargo.toml [1]

1: https://doc.rust-lang.org/cargo/reference/profiles.html#spli...

makapuf · 2024-02-18T14:51:04 1708267864

From (1) I guess most of the binsize comes from stdlib that the kernel does not use, so I guess that's two different problems.

(1) https://github.com/johnthagen/min-sized-rust

vlovich123 · 2024-02-18T19:12:21 1708283541

That wouldn’t explain what the size improvements are from the upgrade as the kernel always built nostd.

surajrmal · 2024-02-18T14:51:21 1708267881

Most size issues come from not using release builds, using too many dependencies, or overuse of generics. The rust std lib being linked in statically also contributes.The kernel shouldn't suffer from any of these problems. Plenty of embedded use is able to use rust in highly constrained environments without size issues compared to C.

wongarsu · 2024-02-18T16:16:36 1708272996

If you do release builds, strip debug symbols and turn LTO from thin to full, do dependencies and the static stdlib still matter? You should be only paying for code that's called at that point.

At that point I suspect the biggest culprits are overuse of monomorphisation, and often just more stuff happening compared to equivalent C++ code because the language makes larger code bases more maintainable. I'd also count some niceties in that category like better string formatting or panic handling, which is an insignificant cost in any larger software but appears big in tiny hello-world type programs.

surajrmal · 2024-02-18T16:35:44 1708274144

Overuse of monorphisation by the community is typically the right choice given the average use case for them is servers where this need not make a meaningful difference. As with any ecosystem, choices folks make may not be suitable for everyone. Those differing requirements require fracturing into smaller ecosystems which share common requirements. Ultimately, that's what's happening with rust and it's very healthy to see in my opinion. You can't force the overall ecosystem to optimize for a minority of users.

This is also true for everything in general. Having the one best thing for foo isn't as helpful as an array of choices, each with different tradeoffs. You simply choose the one best for your needs. Whether it's cheese at the supermarket, an webserver framework, operating system intrinsics, or command line argument handling. Some things can be standardized and serve as a common base for everyone, but it's challenging to do that without at least one person's requirements. Standards also always feature creep until someone tries to reset it with a new standard which is less complex, but I guess that's a different topic.

tialaramex · 2024-02-18T17:21:06 1708276866

I believe the menu of options is undesirable until you actually know you have requirements you can evaluate them against. As much as possible even if I do have a choice, there should be a default and I needn't be asked. When I make a Rust project, cargo notices I have git and, since I didn't say otherwise, it mints a Git repo for the new project automatically. It doesn't insist on asking if I want one, and then asking if it should have the obvious name, and then asking if it should use my default local git settings, the defaults for all of these are IMNSHO obvious and my tacit approval via not having explicitly turned this off is enough.

Do you want a Doodad, a Gooba or a Wumsy? No idea? Me either. So until I care, I'd rather not be asked to choose. But once I discover that I need something with at least 40% Flounce, I can see that Doodads and Goobas both are rated at 50% Flounce, whereas Wumsy has only 10% Flounce, now we're making an informed choice, it should be easy enough to insist on a Doodad to meet my requirement.

If I measure that Monomorphization is out of hand in my codebase I can use dyn to get that back under control for a fair price, but I think the default here is sound.

fl0ki · 2024-02-18T18:43:24 1708281804

I agree, but I have yet to see a single real-world example of a Rust project meaningfully reducing its binary size by switching from monomorphization to dynamic dispatch in its own code. Many Rust developers boast that they virtually never use `dyn`, but then still appeal to it when arguing that Rust has dynamic dispatch so monomorphization is an avoidable cost.

Sometimes you can provide `T = Arc/Box<dyn Foo>` where `T: Foo` is required, but only if the trait is designed to be object-safe, not simply by default. If you get to design the trait and all of its consumers yourself, you might have this option, but it's very possible that you're using a library that does not make this possible. You can easily be the first person to bother trying the `dyn` for a trait and running into these limitations.

Besides that, you might not even have that much control of the concrete type used. For example, if you are generating large schemas with serde, serde decides how that code is monomorphized, not you. In contrast, for better or worse, the path of least resistance in Go is to use a reflection-based serialization framework which has notable runtime costs (that may or may not matter to a given project) but successfully avoids compile time and binary size costs. (There are other reasons that Go binaries end up even larger than Rust ones, this just isn't one of them)

Despite Rust's general principle of giving its users informed choices here, I am not aware of any option that does 100% dynamic dispatch for (de)serialization, so in practice this is a largely unavoidable cost in each project that is decided only by how complex the schema is.

It's also only fair to point out that C++ tends to end up in this place too, mitigated only by dynamic linking and not any magical property of the language itself. Even C can head this way because monomorphizing with macros has the same effect, though due to how such code is structured, it's also less likely to be inlined than C++ or Rust.

tialaramex · 2024-02-18T22:36:19 1708295779

That's a fair observation, I know when I was first writing Rust my inclination was to return impl IntoIterator<Item = T> from functions which are going to actually return a Vec<T> because hey, if I change my mind you can still iterate over whatever I give you now instead with no code changes.

But of course that's an anti-pattern because they are in reality likely to forever just return Vec<T> and knowing that helps you. My early choice only makes sense if either I can't tell you anything more specific than impl IntoIterator<Item = T> or I already know I intend to make a change later. So these days I almost always write down what exactly is returned unless either I can't name it or no reasonable person would care.

For serde in particular my guess is that if you need lots of dynamism serde is the wrong approach even though it's popular. It might be interesting to build a different project which focuses on dynamic dispatch for the same work and tries to re-use as much of the serde eco-system as possible. Not work which attracts me though.

Thiez · 2024-02-19T11:36:32 1708342592

Note that `impl Foo` return types don't actually cost anything extra with regards to code-size, the compiler knows what the actual type is and there is no dynamic dispatch. Only actual generics have an impact here, and `impl` in a return position doesn't count.

tialaramex · 2024-02-19T16:03:15 1708358595

The code size cost doesn't live in my code, but in yours.

Because I didn't admit you were getting a Vec, if you actually need a Vec you actually can't just use the one I gave you. You must jump though hoops to turn whatever I gave you into a Vec, bloating your code.

The implementation is pretty clever, it is probably not going to meticulously take my Vec to pieces, throw it away and make you a new one, instead just giving the same Vec. But this trick is fragile, so much better not to even need it.

fl0ki · 2024-02-19T14:08:38 1708351718

Maybe a more specific way to put it is: you only pay for the (combinations of) types you actually use, whether that's in argument position, return position, or even a local binding. So if it's always Vec<T> it's not costing much more in compile time or code size, but if it's sometimes another type then you do now pay for both.

zadokshi · 2024-02-18T15:02:27 1708268547

Saying part of the problem is “using too many dependencies” is not an overly helpful thing if the ecosystem keeps on trying to download 3Gb of build dependencies because you tried to use some simple little library. The problem is obvious, it’s the solution that is much more difficult.

surajrmal · 2024-02-18T15:11:06 1708269066

It's not a problem when you compare it to C. You have few available dependencies to choose from with C. If you are equally picky and constrain yourself to parts of the ecosystem which care about binary size, you still have more options and can avoid size issues.

For things like a kernel, it is moot as most deps are simply not possible to use anyway.

When you consider the full ecosystem, you need to really compare it to alternatives in largely managed languages like Java, go, node, etc. those binaries are far larger.

bscphil · 2024-02-18T22:01:22 1708293682

> If you are equally picky and constrain yourself to parts of the ecosystem which care about binary size, you still have more options and can avoid size issues.

What's an example of this for, say, libcurl? On my system it has a tiny number of recursive dependencies, around a dozen. [0] Furthermore if I want to write a C program that uses libcurl I have to download zero bytes of data ... because it's a shared library that is already installed on my system, since so many programs already use it.

I don't really know the appropriate comparison for Rust. reqwest seems roughly comparable, but it's an HTTP client library, and not a general purpose network client like curl. Obviously curl can do a lot more. Even the list of direct dependencies for reqwest is quite long [1], and it's built on top of another http library [2] that has its own long list of dependencies, a list that includes tokio, no small library itself.

In terms of final binary size, the installed size of the curl package on my system, which includes both the command line tool and development dependencies for libcurl, is 1875.03 KiB.

[0] I'm excluding the dependency on the ca-certificates package, since this only provides the certificate chain for TLS and lots of programs rely on it.

[1] https://crates.io/crates/reqwest/0.11.24/dependencies

[2] https://crates.io/crates/hyper/0.14.28/dependencies

heinrich5991 · 2024-02-19T17:33:04 1708363984

Probably ureq[1].

[1]: https://crates.io/crates/ureq/2.9.6/dependencies

pdimitar · 2024-02-19T00:00:32 1708300832

> If you are equally picky and constrain yourself to parts of the ecosystem which care about binary size, you still have more options and can avoid size issues.

The market and your boss do not care about that. They want tasks X and Y done. You have no time to vet 15 alternatives and pick the most frugal one in terms of binary size. Not to mention that for many tasks you have no more than 3-4 alternatives anyway, and none of them prioritize binary size. What are you going to do? Roll your own? Deadline is looming ever closer, I hope you can live without sleep for several days then.

We all know the ideal theory.

Aurornis · 2024-02-18T15:05:47 1708268747

> if the ecosystem keeps on trying to download 3Gb of build dependencies because you tried to use some simple little library.

Downloading 3GB of dependencies is not a thing that happens in the Rust ecosystem. Reality is orders of magnitude smaller than that. Why are you exaggerating so much?

Some people bristle at the thought of external dependencies, but if you want to do common tasks it makes sense to pull in common dependencies. That’s life.

bscphil · 2024-02-18T22:15:12 1708294512

> Downloading 3GB of dependencies is not a thing that happens in the Rust ecosystem. Reality is orders of magnitude smaller than that.

Assuming they're talking about the built size of dependencies that are left lying around after cargo builds a binary, they're really not exaggerating by much. I have no difficulty of believing that there are Rust projects that leave 3GB+ of dependency bloat on your file system after you build them.

To take the last Rust project I built, magic-wormhole.rs [1], the source code I downloaded from Github was 1.6 MB. After running `cargo build --release`, the build directory is now 618 MB and there's another 179 MB in ~/.cargo, for a total of 800 MB used.

All this to build a little command line program that sends and receives files over the network over a simple protocol (build size 14 MB). God forbid I build something actually complicated written in Rust, like a text editor.

[1] https://github.com/magic-wormhole/magic-wormhole.rs

pdimitar · 2024-02-19T00:02:32 1708300952

I am not a fan of this as well but you have to consider that a good part of these are caches.

pjmlp · 2024-02-19T10:02:29 1708336949

This is why XCode, Android Studio/NDK, VC++ and co have such huge sizes people complain about, compiled binaries for all major variations of compile flags are part of the download.

Also why those GNU/Linux repos are actually multiple DVDs nowadays.

bscphil · 2024-02-19T20:27:26 1708374446

> GNU/Linux repos

I'm not sure I understand your point with these, as of course no one ever installs the complete repository (e.g. all of Debian), because there's a ton of software in it you don't need or want. Assuming you mean the installation media, at the very least Arch Linux is still less than 1 GB.

Moreover, I think the point in comparing the behavior of Rust dependencies with other ecosystems (C, C++, Haskell, Python) is that most of this cruft is left behind in the individual directories used to build the software. I occasionally write programs to solve some problem, or for fun, and usually I have to download nothing at all, because I can rely on the dependencies supplied by my system and already installed on behalf of other programs (yes, I'm well aware that this doesn't cover all use cases). Rust is fundamentally not designed to work that way, and the large build sizes and huge dependency trees have a multiplying effect on that foundational issue.

Narishma · 2024-02-18T17:13:51 1708276431

The download size may not be a big issue, but all those dependencies take up a lot of storage space once they're compiled.

gregors · 2024-02-18T15:54:09 1708271649

Maybe they meant node_modules as the joke goes.

nindalf · 2024-02-18T17:23:41 1708277021

I think it was a false equivalence between node_modules and Rust. Like any language where developers rely on a package manager to pull in libraries will necessarily be 3GB in size.

Aurornis · 2024-02-18T15:03:21 1708268601

> One issues I have had with Rust applications is the huge binary size

Turn off the standard library and your binaries can be incredibly small. This is how it’s used in microcontrollers and the Linux Kernel doesn’t use the full standard library either.

nequo · 2024-02-18T16:06:45 1708272405

Due to dead code elimination, the compiler already omits all of that part of stdlib that your code doesn’t use.

pornel · 2024-02-18T16:18:46 1708273126

Not quite. Every Rust program will have some code path that may panic, and the default panic handler uses debug formatting, which uses dynamic dispatch, which prevents elimination of the rest of the printing machinery.

There’s panic_immediate_abort unstable setting that makes Rust panics crash as hard as a C segfault, and only then you can get rid of a good chunk of stdlib.

bpye · 2024-02-18T21:43:49 1708292629

The printing machinery is quite unfortunate. Beyond being large, dynamic dispatch makes any attempt at stack size analysis much harder.

I’ve used Rust for some embedded side projects and I really wish there was a way to just get some unique identifier that I could translate (using debug symbols) to a filename and line number for a crash. This would sort of be possible if you could get the compiler to put the filenames in a different binary section, as you could then just save the address of the string and strip out the actual strings - but today that’s not possible.

nequo · 2024-02-18T16:49:38 1708274978

Does this mean that only the printing machinery is not eliminated or that other parts of stdlib are present in the binary too even though unused?

fl0ki · 2024-02-18T18:30:02 1708281002

The printing machinery alone is quite large when you consider that it includes the code & raw data for Unicode, whether or not similar facilities were already available on the host libc. Though you're not likely to avoid that in any non-trivial Rust program anyway, as even a pretty barebones CLI will need Unicode-aware string processing.

I generally find Rust binaries to be "a few" megabytes if they don't have an async runtime, and a few more if they do. It has never bothered me on an individual program basis, but I can imagine it adding up over an entire distribution with hundreds of individual binaries. I see the very real concern there, but personally I would still not risk ABI hazards just to save on space.

carterschonwald · 2024-02-18T15:29:57 1708270197

So one issue I can imagine being the culprit with rust is the specializing / c++ style semantics of rust generics. C code generics tend to be void* flavored or point to a struct of function pointers. Which will generate less code. Not sure how this translates to the kernel setting thoughb

pornel · 2024-02-18T15:48:35 1708271315

That is true. Rust makes it easy to overuse monomorphisation. There are tools like `cargo-bloat` that find these.

However, most complaints are about size of “Hello World”, which in Rust is due to libstd always having debug info (to be fixed soon), and panic handling code that includes backtrace printing (because print to stdout can fail).

Printing of backtrace is very bloaty, because it parses and decompresses debug info.

mcqueenjordan · 2024-02-18T15:18:14 1708269494

Another thing just to mention here is `strip`, which IIRC `cargo build --release` doesn't do by default. I think `stripping` binaries can reduce binary size by up to 80-85% in some cases (but certainly not all; just tried it locally on a 1M rust binary and got 40% reduction).

FWIW, you can configure this in Cargo.toml:

[profile.release] strip = true

nindalf · 2024-02-18T17:25:54 1708277154

Check out Making Rust binaries smaller by default (https://kobzol.github.io/rust/cargo/2024/01/23/making-rust-b...). Previously discussed a few weeks ago at https://news.ycombinator.com/item?id=39112486.

That change will be live on 21st March, so manual strips won't be required after that.

habibur · 2024-02-18T15:50:03 1708271403

You can strip C compiled binaries too. And that halves the binary size. The point is for example a hello world Rust binaries is 300kb after striping while C compiled one is 15kb. A difference of 20 times.

pornel · 2024-02-18T16:05:59 1708272359

Such comparison exaggerates the difference, because it’s a one-time constant overhead, not a multiplicative overhead.

i.e. all programs are larger by 275KB, not larger by 20x.

Rust doesn’t have the privilege of having a system-wide shared stdlib to make hello world executables equally small.

The overhead comes from Rust having more complex type-safe printf, and error handling code for when the print fails. C doesn’t handle the print error, and C doesn’t print stack traces on error. Most of that 200KB Rust overhead is a parser for dwarf debug info to print the stack trace.

estebank · 2024-02-18T17:29:10 1708277350

You might be interested in reading https://darkcoding.net/software/a-very-small-rust-binary-ind...

Narishma · 2024-02-18T16:14:34 1708272874

But C doesn't statically link the standard library by default like Rust does.

habibur · 2024-02-18T18:12:57 1708279977

Hello world deps for C :

    linux-vdso.so.1 (0x00007fff25cb8000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fe5f08d9000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fe5f0ae2000)

And for Rust

    linux-vdso.so.1 (0x00007ffc109f9000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8eda404000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f8eda222000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8eda4a8000)

Rather Rust has 1 more dynamically linked library than C.

fl0ki · 2024-02-18T18:47:24 1708282044

That might be true for Hello World, but libgcc_s is where a lot of builtins for C itself go, so you'll find it ends up linked into a lot of non-trivial C programs as well. See https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html

Kamq · 2024-02-18T20:46:50 1708289210

You missed the word "statically" in the post you commented on.

Dynamically linked libs rarely contribute heavily to binary bloat

habibur · 2024-02-19T03:14:01 1708312441

The benefit of statically linking becomes moot when it doesn't reduce the number of dynamically linked libraries. That's the point.

Dylan16807 · 2024-02-20T02:14:56 1708395296

That's not why rust statically links the runtime. The main benefit is that they don't have to try to design and maintain a stable ABI for it. Which is not moot.

More generally, you statically link something to avoid distribution hassles of various kinds, not because you care about the specific number.

loeg · 2024-02-18T17:06:08 1708275968

Are you talking about https://github.com/rust-lang/compiler-team/issues/688 ? I think that issue provides a lot of interesting context for this specific improvement.

forrestthewoods · 2024-02-18T19:51:28 1708285888

> is the huge binary size

Can you quantify this? How big is too big? Ideally for a real program, and not an experiment to make the tiniest possible program.

On my Windows machine ripgrep rg.exe is just 4.2mb. Making that smaller feels irrelevant.

I’m not convinced that binary size is a real problem. But I’m open to evidence!

cout · 2024-02-18T21:23:50 1708291430

It would be nice if it fit on a standard size 1.44MB floppy, but given that I haven't used a floppy drive in about a decade, yeah I guess it doesn't matter much.

raoulkent · 2024-02-18T22:14:57 1708294497

Just out of interest, what kind of systems do you work on if you've been using floppy discs in the last 25+ years?

cout · 2024-02-18T23:09:34 1708297774

I think my current motherboard does not have pins for a floppy drive, but every motherboard I've owned before that does. I just kept moving the floppy drive from chassis to chassis every time I upgraded just in case I needed it. IIRC the last time I used a floppy was either to archive old data to CD-ROM or boot the computer when I couldn't find a USB thumb drive.

I do still own my first computer, an IBM PS/2 Model 50Z, which still has its original floppy drive. Other parts I upgraded -- the 286 was replaced with a 386 SX/Now!, the 30MB ESDI was upgraded to 100MB, and it now has a full 2MB of RAM. I keep the floppy drive because it reads disks that no other floppy drive has been able to read.

SushiHippie · 2024-02-18T14:21:52 1708266112

Do I understand it correctly that upgrading to a new rust version is mostly implementing new best practices and new features, instead of needing to "fix" your code, as rust is backwards compatible?

I've only used rust nightly for my own projects and didn't give too much thought about rust versions

codetrotter · 2024-02-18T14:25:50 1708266350

For normal projects yes.

And you can use clippy to tell you about changes you should make.

For example, in my projects I run this in the CI pipeline:

  cargo clippy --all-targets --all-features

and

  cargo fmt --all --check

In addition to the regular test and build steps.

This both means that I follow clippy recommendations and cargo fmt in the first place, and also that my CI tells me about any clippy changes if I didn’t notice them myself as well as any formatting I’m not following. In my main IDE I auto format the code of course. But sometimes I make small changes in vim and don’t run the format step myself so it’s nice to have for that reason as well.

For the integration of Rust into the Linux kernel I imagine it’s a bit more convoluted.

estebank · 2024-02-18T17:35:26 1708277726

For Rust on Linux it's a bit more involved because they use nightly features, which can change from day to day. In practice there's an implicit tiered strata, with features that rarely change and features that frequently change in bursts. I would like it if we formalized that distinction a bit. We already mark when a feature is very likely to change (unstable_features), but not when they are very close to being stabilized.

steveklabnik · 2024-02-18T16:15:15 1708272915

Yes, generally. For example, recently David Tolnay shared the amount of burden Meta has when upgrading the compiler: https://old.reddit.com/r/rust/comments/19dtz5b/freebsd_discu...

> I estimate it's about ½ hour per 1 million lines, on average.

That being said, Rust for Linux isn't using stable Rust, so they have a higher burden than projects that do.

pornel · 2024-02-18T15:57:31 1708271851

Rust-for-Linux made it more complicated for themselves, because they chose to enable unstable/experimental features of the compiler without waiting until they’re released, so they don’t get the stability and compatibility guarantees that normal Rust projects get.

bonzini · 2024-02-18T16:41:46 1708274506

The alternative was to do the same and also not merge what they had. Stuff like custom allocator support is not optional.

saghm · 2024-02-18T17:24:58 1708277098

I was confused when reading this because I was pretty sure that using other allocators had been supported for a while in Rust. From refreshing myself on the details, it seems that replacing the default allocator is stable (https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), but the API for arbitrary allocators (which includes stuff like being able to do "zero-size" allocations) is not yet stable (https://doc.rust-lang.org/std/alloc/trait.Allocator.html). I guess if there were ever a project that needed fine-grained control over how allocators be work, it would be the kernel.

remexre · 2024-02-18T21:14:36 1708290876

There's also https://docs.rs/allocator-api2/latest/allocator_api2/ -- I end up using this more often than I end up using custom allocators, simply because it has a stable version of the currently-unstable functions for constructing uninitialized Vec<MaybeUninit<u8>>s.

charcircuit · 2024-02-18T21:08:27 1708290507

More importantly allocator_api adds try_new which means you can handle allocation errors.

saghm · 2024-02-19T00:25:22 1708302322

Ah, that is important. I didn't even notice that wasn't possible in the GlobalAlloc API, but you're definitely right that it's not.

LegionMammal978 · 2024-02-19T00:42:46 1708303366

GlobalAlloc can be used for fallible allocations: its functions just return a null pointer on failure, as with malloc() and realloc() in C. The main limitations are around the safe heap data structures in the standard library, which don't stably expose any fallible APIs except for Vec::try_reserve().

saghm · 2024-02-19T02:15:25 1708308925

Hmm, I'd expect that would mean that it's possible to add those APIs today then rather than requiring the `Allocator` trait. Is the idea that the Allocator a parameter (maybe a generic one) when calling `try_new`, so they don't want to stabilize anything now?

LegionMammal978 · 2024-02-19T16:28:45 1708360125

The Allocator type is an unstable parameter on the heap type; Vec<T> is unstably Vec<T, A>, Arc<T> is unstably Arc<T, A>, and so on. (The allocator "A" defaults to Global, which is an Allocator that forwards to the registered GlobalAlloc.) I think the Linux kernel also wants an Allocator trait for other reasons than fallibility, such as allocating different kinds of objects on different heaps.

CUViper · 2024-02-18T15:51:28 1708271488

Rust is backwards compatible when you stick to stable features, but the kernel uses unstable features that can and do incur breaking changes.

https://github.com/Rust-for-Linux/linux/issues/2

surajrmal · 2024-02-18T16:40:13 1708274413

It seems prudent to limit rust usage in the kernel until that list can be burned down to zero. It makes sense that you need to at least get rust in the kernel to find out what missing features you need to have implemented and stabilized, but excessive use will make folks lives painful as they try to track upstream rust releases.

fl0ki · 2024-02-18T18:56:27 1708282587

Please bear in mind that Linux has used non-standard GCC extensions to C for decades as well. The tradeoffs here are their call to make.

Besides, at this stage, it makes perfect sense for Linux to use unstable Rust features. It was one thing to say Rust should be great for writing kernels, it's another to actually get feedback on how it needs to be better, and that's only possible if the potential improvements are motivated by those who need them and incubated without the constraints of backwards compatibility nor the risks of locking in permanent tech debt.

Rust's unstable feature concept was designed for exactly this kind of freeform evolution and it's working exactly as intended. As for the specific tradeoffs being made in Linux, its contributors are in a much better position to weigh those than we are.

estebank · 2024-02-18T17:43:27 1708278207

What you propose is exactly what's been done by the kernel. They are integrating the language in a non-mandatory way, to both exercise the kernel side and the language itself. The unstable features haven't been stabilized because either they have open questions on their implementation (and having a customer using them helps define them) or no-one has cared enough to complete them (and having a customer using them gives them the extra push). Either way what's happening now is exactly the process you are proposing.

The article is about updating the Rust version the kernel targets where a feature they use (offset_of) was stabilized.

almatabata · 2024-02-18T16:56:38 1708275398

If you ignore dependencies and stick to stable features yes.

If you include dependencies then it can happen that a dependency relies on unstable features. In which case you might have to upgrade the library version (if they support the new compiler version). The library might have changed the API by then which would force you to change your code.

Except for the above use case, upgrades to the latest version of the compiler have been painless for me.

paavohtl · 2024-02-18T17:15:01 1708276501

Dependencies are not allowed to use unstable features either with the stable compiler. The only exception is the standard library, which uses numerous unstable features even with a stable distribution of Rust.

tialaramex · 2024-02-18T18:15:46 1708280146

Worth briefly explaining the rationale for this (stdlib gets to use unstable features)

Rust's stdlib is maintained with the rest of the language and by the same broad team, so, if you're tweaking unstable feature X, you are also responsible for ensuring the stdlib people using feature X sort that out. I'm not sure if Rust's internal policies mean you shouldn't land a change to the main tree without accompanying stdlib patches, or whether you're only required to give them adequate notice, but either way it's not going out the door in a stable release being incompatible with its own implementation.

This couldn't really work with 3rd party libraries.

jcranmer · 2024-02-18T18:47:22 1708282042

There's a second category of unstable features to mention here.

Some of the features are essentially perma-unstable, because they're exposing some compiler intrinsics for the library to be able to use. This is the equivalent of things like __builtin_* for C compilers.

loeg · 2024-02-18T16:51:35 1708275095

So this is a huge tangent but couldn't most of the uses of the non_null!() macro in this diff just be (safe!) pointer comparisons or subtractions, without the unsafe{} logic to convert a pointer value to a reference just for the purposes of comparison or subtraction? https://lore.kernel.org/lkml/20240217002717.57507-1-ojeda@ke...

saghm · 2024-02-18T17:12:12 1708276332

Although I think I recall that the kernel doesn't use Rust's standard library (which is consistent with the diff you linked), it's possible that the standard library's documentation on the pointer subtraction might reference a concern they could share (https://doc.rust-lang.org/std/primitive.pointer.html#method....):

> If any of the following conditions are violated, the result is Undefined Behavior:

> * Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object.

> * The computed offset cannot exceed isize::MAX bytes.

> * The offset being in bounds cannot rely on “wrapping around” the address space. That is, the infinite-precision sum must fit in a usize.

> Most platforms fundamentally can’t even construct such an allocation. For instance, no known 64-bit platform can ever serve a request for 263 bytes due to page-table limitations or splitting the address space. However, some 32-bit and 16-bit platforms may successfully serve a request for more than isize::MAX bytes with things like Physical Address Extension. As such, memory acquired directly from allocators or memory mapped files may be too large to handle with this function.

> Consider using wrapping_sub instead if these constraints are difficult to satisfy. The only advantage of this method is that it enables more aggressive compiler optimizations.

If their pointer subtraction uses similar semantics, there might be issues if they want to compare pointers from different allocation objects, are worried about 32-bit or 16-bit platforms, or maybe even consider the performance concerns too worrisome. The Rust I write tends to be a bit higher-level and doesn't require unsafe, so my instinctual reaction to "using unsafe for performance" generally errs on the same of abject terror, but it's a fundamental part of what makes the safe side of abstractions I use possible, and the kernel is probably one of those places that needs to do that sometimes, so I'd reluctantly have to admit I'm probably not qualified to evaluate whether these cases would merit unsafety for performance alone, but the first two concerns sound like legitimate things that the kernel would need to handle.

tialaramex · 2024-02-18T17:56:18 1708278978

> Although I think I recall that the kernel doesn't use Rust's standard library

Rust's standard library has three elements

core has stuff you get with the Rust language, like any use of Rust, Rust for Linux has core. You could technically implement Rust without core, or at least, without most of it, but that's not really the Rust language, you've instead made your own weird fork.

[T]::sort_unstable() is a core function which sorts a slice of some Ordered type T but may re-arrange elements despite them comparing equal hence the word "unstable".

alloc depends on an allocator. You may not have an allocator, e.g. you're a tiny embedded controller, in which case you likely don't want and can't use this. Rust for Linux re-implements alloc, basically cloning the "official" alloc and fiddling with it.

Vec::try_reserve() is a feature found in alloc, it tries to allocate enough space to ensure your Vec has a certain amount of capacity beyond its current size, and if not reports it could not.

std further depends on an Operating System, it offers exciting features like knowing what the time is, reading a file, connecting to a remote service over TCP/IP, or making a thread. Rust for Linux does not provide std.

File::create() is a std function which creates files.

The function you were interested in is part of core (although you were looking at its re-export from std) and so yes, it exists in Rust for Linux.

loeg · 2024-02-18T17:21:16 1708276876

> If their pointer subtraction uses similar semantics, there might be issues if they want to compare pointers from different allocation objects, are worried about 32-bit or 16-bit platforms, or maybe even consider the performance concerns too worrisome.

Any of these type of issue would equally invalidate using unsafe{} to cast the pointer to a reference, which is what non_null! does.

1letterunixname · 2024-02-18T17:12:41 1708276361

My wishlist would include gradually refactoring core in Rust and formal verification a-la seL4 to prove correctness. There's no point to refactor churn from one language religion to another for low entropy, core code without improvements in assurance that it's also provably bug-free, race-free, and secure while also being as fast or possibly faster.

amelius · 2024-02-18T15:28:22 1708270102

How many % of the kernel is Rust now, in terms of LoC?

rwmj · 2024-02-18T15:33:54 1708270434

Rust: 20,887 lines

C: 33,351,596 lines

(That's just doing 'wc -l' rather than using any proper code metrics tool)

SushiHippie · 2024-02-18T23:03:10 1708297390

I've run tokei [0] on the Linux git repository on the latest commit [1] and this was the output:

  ===============================================================================
   Language            Files        Lines         Code     Comments       Blanks
  ===============================================================================
   C                   33553     23772322     17694564      2662642      3415116
   C Header            24554      9562920      7395591      1436546       730783
   Device Tree          5041      1512839      1240129        76384       196326
   ReStructuredText     3473       711669       539971            0       171698
   JSON                  788       443098       443096            0            2
   YAML                 3905       433860       352107        16626        65127
   GNU Style Assembly   1317       372131       271873        55613        44645
   Shell                 894       172036       120033        21590        30413
   Plain Text           1739       151033            0       123992        27041
   Makefile             2946        76889        52985        12355        11549
   Python                203        68545        54627         4452         9466
   SVG                    74        49420        48159         1171           90
   Perl                   59        43992        34124         4074         5794
   Happy                  10         6069         5359            0          710
   Assembly                5         3319         3065            0          254
   C++                     5         2138         1860           61          217
   BASH                   59         1943         1318          335          290
   Unreal Script           5          707          445          158          104
   ASN.1                  16          660          445           87          128
   Autoconf                5          429          373           26           30
   LD Script               8          376          288           29           59
   CSS                     3          295          172           69           54
   Gherkin (Cucumber)      1          291          199           58           34
   TeX                     1          236          156           74            6
   XSL                    10          200          122           52           26
   HEX                     2          173          173            0            0
   Module-Definition       2          128          113            0           15
   C++ Header              2          125           59           55           11
   RPM Specfile            1          108           93            1           14
   Objective-C             1           89           72            0           17
   Vim script              1           42           33            6            3
   Markdown                1           36            0           27            9
   Automake                3           31           23            3            5
   Ruby                    1           29           25            0            4
   INI                     2           13            6            5            2
   TOML                    1           12            2            9            1
   Apache Velocity         1           12           12            0            0
   CMake                   2            8            8            0            0
  -------------------------------------------------------------------------------
   Rust                   64        12637         9489         1612         1536
   |- Markdown            55         8243          808         5557         1878
   (Total)                          20880        10297         7169         3414
  -------------------------------------------------------------------------------
   HTML                    2           28           22            3            3
   |- JavaScript           1            7            7            0            0
   (Total)                             35           29            3            3
  ===============================================================================
   Total               78760     37400888     28271191      4418115      4711582
  ===============================================================================

So If we would only count code and not comments, it is only 9489 LoC Rust. Which would be about 0.03% and if we take all lines and not only LoC it would be around 0.05%

[0] https://github.com/XAMPPRocky/tokei

[1] https://github.com/torvalds/linux/commit/b401b621758e46812da...

geertj · 2024-02-18T16:35:52 1708274152

How much of those 20K lines are Rust infrastructure, versus drivers written in Rust?

bonzini · 2024-02-18T16:42:55 1708274575

The latter are basically a rounding error, though there is a rewrite of the Android binder driver.

amelius · 2024-02-18T15:43:36 1708271016

Ok. Does that include device drivers?

izacus · 2024-02-18T16:04:31 1708272271

antoinealb · 2024-02-18T15:36:34 1708270594

At least according to the Github's language breakdown for https://github.com/Rust-for-Linux/linux, C is still 98.3% of the repository, and Rust is in the 0.1% of "others".

sroussey · 2024-02-19T00:59:56 1708304396

Curious, why Rust instead of Zig?

bitwize · 2024-02-19T08:06:16 1708329976

Can Zig guarantee perfect memory safety?

hgs3 · 2024-02-19T16:17:33 1708359453

It doesn't need to. Memory issues are easy to catch with good tests and Rust approach to memory safety has its own trade-offs.

sroussey · 2024-02-19T21:45:16 1708379116

About as well as Rust, right?

bsd_source · 2024-02-18T14:36:43 1708267003

Rust is going to kill LFS

horeszko · 2024-02-18T14:53:29 1708268009

LFS is Linux from Scratch right? (https://www.linuxfromscratch.org/index.html)

What are the implications of using Rust on building Linux?

rekado · 2024-02-18T17:10:07 1708276207

The Rust bootstrapping story is ugly.

https://guix.gnu.org/blog/2018/bootstrapping-rust/

charcircuit · 2024-02-18T21:04:29 1708290269

LFS does not need to do all of that. LFS already uses the host computer's C compiler, so it seems just as reasonable to also use the host computer's rust compiler.

MuffinFlavored · 2024-02-19T00:07:31 1708301251

Doesn't part of the Rust compliation chain end up using `cc` anyway eventually for like, linking or something?

That might not apply at a "system's" level but I'm guessing in the massive Linux compilation job with module support you're making a bunch of object files with exported symbols?

charcircuit · 2024-02-19T01:02:20 1708304540

The Linux kernel's build system uses rustc only for making object files.

legobmw99 · 2024-02-19T01:44:37 1708307077

That honestly doesn’t seem that horrible - particularly compared to some of the nigh-on-impossible tasks like bootstrapping GHC

(For the curious) https://elephly.net/posts/2017-01-09-bootstrapping-haskell-p...

rekado · 2024-02-19T09:39:47 1708335587

That's my blog post!

Yes, the GHC story is also terrible (and just a tad worse than the rust story). The GHC problem is worse largely because the origins of GHC are murky. While it's still possible to get copies of the early GHC versions through the Internet Archive, the code lives firmly in the 1990s and assumes that you have access to long lost Haskell compilers. Turns out that all these Haskell compilers (with the exception of Hugs) have the same kind of problems that GHC has --- only worse because they are even older, depend on binaries of unreleased previous versions, and are really difficult to build with tools from the last two decades.

ta8645 · 2024-02-18T15:08:52 1708268932

Surely LFS will survive? Might need to update a few steps is all.

twosdai · 2024-02-18T15:12:22 1708269142

Change? Impossible. This is software after all. It never changes. /s

1letterunixname · 2024-02-18T17:09:48 1708276188

Explain because I don't see that happening.