Towards Linux Kernel Memory Safety

anameaname · on Nov 16, 2017

Source for those interested: https://github.com/torvalds/linux/blob/master/lib/refcount.c

I'm kind of surprised that the authors went with compare-and-exchange for their ref count, rather than get-and-increment. Even if the counter does overflow, it's usually detectable, and reversible. This is true even in the face of concurrent updates. The benefit is that there is only one atomic operation in the fast path.

Someone · on Nov 16, 2017

”usually detectable” isn’t good enough. You must assume attackers will try to hit your code zillions of times until they hit the ‘unusual’ case.

fulafel · on Nov 16, 2017

The MPX part was a good excuse to look around at the status of MPX support. It seems GCC is the major compiler to support it, but it's under threat of depreciation because the bootsrap build with MPX has been broken for so long. Also the lack of AMD buy-in (patents?) seems like a big hinderance compared to more compatible compiler-based approaches for C memory safety, like SoftBound, CCured, etc.

zvrba · on Nov 16, 2017

> but it's under threat of depreciation because the bootsrap build with MPX has been broken for so long.

Huh. Even MSVC supports it (currently experimental though): https://blogs.msdn.microsoft.com/vcblog/2016/01/20/visual-st...

pjmlp · on Nov 16, 2017

The problem with any of such approaches, versus safer languages, is that they only work properly if 100% of the source code is available.

joosters · on Nov 16, 2017

I don't understand your point. If you haven't got all the source code (e.g. random drivers), you're unlikely to have any say in the language that the binary code blobs were written in either.

pjmlp · on Nov 16, 2017

Languages like Ada just forbid many of the errors that lead to memory corruption in C, by default.

So it would only happen with a binary Ada library in two ways:

- the binary was manipulated and thus does not map to the original source code

- the code was compiled with permission for Uncheked packages enabled

In C, 100% of the source code is unsafe, specially when compiled with -O3.

joosters · on Nov 16, 2017

I was referring to your original comment: 'they only work properly if 100% of the source code is available.'

This project is adding protection to the linux kernel. Now, this bounds checking could be added to a kernel build with 100% of the source code available, in which case all is good. Or, it could be added to a kernel build where some of the drivers are available only in a 'binary blob' form. In which case, rewriting the kernel in Ada (or whatever) would not help you with the binary blobs, since they are out of your control. So, as I understand it, your point of 'only working properly if 100% of the source code is available' would seem to be just as true for a code rewrite as it is for the paper's approach to the problem.

zvrba · on Nov 16, 2017

Which it is for most of internet infrastructure.

AstralStorm · on Nov 16, 2017

But not vulnerable embedded devices, cell phones and IoT hardware.

zvrba · on Nov 16, 2017

How many of these even run on the latest Intel CPUs?

empath75 · on Nov 16, 2017

Long term, the solution has to be switching to an OS that’s written in a memory safe language.

StreamBright · on Nov 16, 2017

Or it has to run on a memory safe architecture.

http://phrack.org/issues/64/6.html

"- separated kernel and process address space (both can use the whole address space). Such an implementation, to be efficient, requires a dedicated support from the underlaining architecture. It is the case of the primary and secondary context register used in conjunction with the ASI identifiers on the UltraSPARC (sun4u/sun4v) architecture.

The only limitation to kernel (and processes) virtual space on systems implementing an userland/kerneland separated address space is given by the architecture (UltraSPARC I and II can reference only 44bit of the whole 64bit addressable space. This VA-hole is placed among 0x0000080000000000 and 0xFFFFF7FFFFFFFFFF).

This memory model makes explotation indeed harder, because we can't directly dereference the userspace. The previously cited NULL pointer dereferences are pretty much un-exploitable. Moreover, we can't rely on "valid" userland addresses as a place to store our shellcode (or any other kernel emulation data), neither we can "return to userspace"."

lunixbochs · on Nov 16, 2017

> separated kernel and process address space

This already exists on x86_64 Linux in the form of SMEP and SMAP.

> With SMEP enabled, it’s no longer possible to map exploit payloads in userland, as the CPU will trigger a fault if it attempts to execute those user pages in kernel mode

http://vulnfactory.org/blog/2011/06/05/smep-what-is-it-and-h...

> SMEP can be used to prevent supervisor-mode code from unintentionally executing user-space code. SMAP extends this protection to reads and writes

https://en.wikipedia.org/wiki/Supervisor_Mode_Access_Prevent...

These mitigations aren't overly novel at this point and don't prevent exploits (just complicates them). Here's a recent CVE PoC containing an SMEP and KASLR bypass: https://github.com/xairy/kernel-exploits/blob/master/CVE-201...

Full memory safety takes this much further, and should be far harder to bypass.

StreamBright · on Nov 16, 2017

Good to know. I think there is no silver bullet in security so anything making exploitation harder is good.

StudentStuff · on Nov 16, 2017

So Rust it is then, eh? Could work through the kernel module by module to rewrite it in rust, but it'd be a multi-year process (like Firefox's rewrite has been).

ynezz · on Nov 16, 2017

You mean rewriting unsafe C code to Rust by annotating it with `unsafe` keyword? :-) Try to write in Rust something which would touch bare metal hardware. It would be PITA, it would be full of unsafe keywords and assembler code so it would possibly result in similar amount of bugs in the end, just in different parts.

anp · on Nov 16, 2017

This is empirically not true if you look at the usage of unsafe in pure Rust kernels. Yes, there is assembly needed, and some unsafe Rust code, but it's very possible to wrap up unsafe operations in safe APIs.

For example, the kernel repository for Redox (a written-in-Rust OS project) appears to have 242 usages of the unsafe keyword (including comments bc lazy), out of 18205 lines of Rust source.

zaarn · on Nov 16, 2017

It's actually not that much unsafe.

You can wrap most unsafe operations to be safe with proper checks, I've done it myself and it's much less than you would expect since a lot of the memory accesses can be made safe trivially.

red75prime · on Nov 16, 2017

I can't come up with a situation where C doesn't need inline assembly, while Rust does. Do you have some specific example on your mind?

Santosh83 · on Nov 16, 2017

Just to turn this discussion towards the practical aspect: does anyone know if a concrete effort has been made to sound out Torvalds and the other senior kernel devs about accepting Rust into the kernel?

I would imagine that initially as PoC one can start by implementing a simple module... obviously a small portion of the Rust stdlib and runtime needs to be ported and of course rustc needs to be integrated into the kernel toolchain...

Perhaps someone has done it privately (i.e., without attempting to submit patches back to source)?

steeleduncan · on Nov 16, 2017

In [1] (from 2016) he says

    I don't think you actually solve any of the really hard kernel problems with your choice of programming language.
    The big problems tend to be about hardware support
    (all those drivers, all the odd details about different platforms,
    all the subtleties in memory management and resource accounting)

I think what he is saying is that rewriting c->(e.g.) rust doesn't improve make progress on any of the design problems in the linux kernel.

In [2] (2013) he expands on that, essentially explaining how well C maps to machine instructions, and how that helps optimise the kernel.

Personally I think Rust maps very well to machine code, and although it doesn't solve design problems, I do think it saves a lot of time debugging the security flaws and bugs that can result from memory bugs.

[1] https://www.infoworld.com/article/3109150/linux/linux-at-25-... [2] https://www.reddit.com/r/programming/comments/1t0gfy/linus_t...

Santosh83 · on Nov 16, 2017

I tend to agree with his point that you may perhaps not solve design and driver problems just changing language, but can anyone dispute that a memory-safe language can at least help eliminate routine programming pitfalls like off-by-one, null deferences, double-free and so on? And these are the types of innocuous and inadvertent errors that are exploited by black-hats, sometimes with costly consequences.

I don't see a strong argument against a memory-safe language, although granted it won't solve architectural and hardware issues. Perhaps the effort to support, say Rust, in the kernel is just not worth the trade-off at this point? Perhaps no one has taken on the mantle for doing so? But I can't see how one can categorically argue against it...

agumonkey · on Nov 16, 2017

maybe a different approach to hardware programming, some kind of safer and higher abstraction system/language to improve quality and reduce effort ?

slrz · on Nov 17, 2017

> Just to turn this discussion towards the practical aspect: does anyone know if a concrete effort has been made to sound out Torvalds and the other senior kernel devs about accepting Rust into the kernel?

This is not how things work. You don't arrive at changes like this by out-talking Linus. Talk is cheap and Linus is opinionated.

You need to demonstrate that what you want to do is worthwile, i.e. is not only feasible but also provides a payoff that carries the weight of what you're proposing. You also need to be fiercely committed to your idea and to making it happen inside mainline Linux.

In other words, show me the code (and maintain it out-of-tree for some time, tracking upstream changes and gathering support from other developers).

empath75 · on Nov 16, 2017

Don’t think it’ll ever happen. More likely something like redox will get to the point that it’s production-ready and people will just stop using Linux.

I don’t think it’s far off, at least for vms, tbh.

sixbrx · on Nov 16, 2017

> at least for vms

Yeah that may be a really good upside to VMs, that they enable new OSs to have a more level playing field with the established OSs. Letting the VM layer handle all the messy details of real hardware.

IcePic · on Nov 16, 2017

..and of course Rust needs to be ported and have its .. "runtime" fixed for all platforms linux runs on. Otherwise this will be a decent way to kill off a bunch of platforms.

koffiezet · on Nov 16, 2017

Was going to point this out too... Having the Rust compiler support all platforms GCC is able to target is going to be a massive task - nevermind maintaining them.

That's one of the main advantages of C - it runs on pretty much everything.

ksec · on Nov 16, 2017

Well at least someone is already trying to write an OS with Rust.

https://github.com/redox-os/redox

pjmlp · on Nov 16, 2017

Not necessarily, Google's approach has been to forbid kernel access (ChromeOS), to make a big pain to use native code only for specific use cases (Android) or to develop a new microkernel in a mix of C++/Rust/Go/Dart (Fuchsia).

Rusky · on Nov 16, 2017

Fuchsia doesn't use Rust.

pjmlp · on Nov 16, 2017

Better inform yourself before posting.

https://fuchsia.googlesource.com/garnet/+/HEAD/public/rust/c...

Rusky · on Nov 18, 2017

Those are bindings.

Varriount · on Nov 16, 2017

Seriously? What is it worth people thinking that Rust is the only choice for such things? Memory safety is nothing new. Languages like D, Nim, and Swift all allow for the possibility of both memory safety and low-level access.

mkesper · on Nov 16, 2017

Memory safety without GC and with zero-cost abstractions is the combination rust enables. Nim and swift (and Go) strictly require a GC and D without GC doesn't work without heavy lifting.

dom96 · on Nov 16, 2017

> Nim and swift (and Go) strictly require a GC and D without GC doesn't work without heavy lifting.

I think Nim belongs with D here. You can freely disable the GC in Nim.

Full disclosure: I'm a Nim core dev.

zurn · on Nov 16, 2017

Refcounting and costly abstractions are in wide use currently - just done in C without the safety.

Varriount · on Nov 16, 2017

What's wrong with garbage collectors? Depending on the type, you can get performance characteristics better than or similar to destructor-based management.

Even current operating systems use forms of garbage collection (such as refcounting) to track resources.

cyphar · on Nov 16, 2017

The problems with a GC (in the context of embedding it in a larger codebase written in a another language) is that the calling conventions are more complicated and the existence of a GC (especially mark-and-sweep GCs) means that the embedding is not "seamless". Even with a C FFI interface it's still harder than (for example) using Rust or C in such a project.

And if both languages have GC, then you're in even more trouble.

johncolanduoni · on Nov 16, 2017

For what language runtime are the calling conventions an issue? Even the most calling-convention dependent GC I can think of (GHC Haskell's) can handle standard C ABIs and .NET is a good example of making this whole process relatively seamless. If you can export a C FFI with pinned pointers in both directions, I don't see how it's any harder than Rust or C.

The multiple GCs in a process issue is usually due to the collector not playing nice with signal/exception handler installation when using safepoints, which is purely an issue of lazy implementations. There's even ways to collect cross-heap cycles when working this multiple GCs (Xamarin on Android does this fairly well).

jhasse · on Nov 16, 2017

Most things get moved to the heap.

pjmlp · on Nov 16, 2017

Only in GC languages that don't offer good language features for value types, which usually isn't the case with the ones targeted for low level systems development.

Rusky · on Nov 16, 2017

"Good language features for value types," while retaining memory safety, means at least one of these things:

* Boxing anything you take a reference directly to

* Forbidding sum types and complicating the GC to allow interior pointers

* Forbidding references to value types

* A borrow checker

Guess which one is most flexible, and novel in Rust.

pjmlp · on Nov 16, 2017

A language like Modula-3 doesn't suffer from those issues.

We only don't have it thanks to Compaq killing the project after acquiring DEC Olivetti. Politics as usual.

Rust ideas come from Cyclone and ATS, it is not novel in Rust.

The borrow checker still is pretty much WIP, that the NLL changes will surely improve, but don't not yet solve all ergonomic problems.

Rusky · on Nov 18, 2017

Modula-3 either uses one of those techniques, or is not memory safe. (Looks like primarily the first one- manual deallocation is unsafe and it relies on GC for the rest?)

Cyclone and ATS do not have Rust's combination of ownership and borrowing.

pjmlp · on Nov 16, 2017

Ada already enabled it.

johncolanduoni · on Nov 16, 2017

Ada's model for safe allocations is far more limited than Rust's, however. Particularly in ways that would require more unsafe code when writing a kernel.

pjmlp · on Nov 16, 2017

I don't see why, given that Unchecked_Deallocation calls should only exist inside data structure implementations, never called directly from safe code.

Which should just use controlled types, pools or standard containers.

Also there are a few open experiments with Ada,

https://muen.codelabs.ch/

https://github.com/Lucretia/bare_bones

https://marte.unican.es/

Then there are the commercial ones used in the industry, making use of bare metal profiles like Ravenscar.

DigitalJack · on Nov 16, 2017

it's 20 million lines of code.

staticassertion · on Nov 16, 2017

Right, but you don't have to rewrite it at once. There are certain areas of the kernel that have historically had more issues, are at higher risk of attack, etc. Those will make up far less code than 20 million lines, and you can start there.

monocasa · on Nov 16, 2017

You could triage. Zorro bus support can probably wait until someone has a specific itch to scratch.

vilhelm_s · on Nov 16, 2017

Firefox is 17 million!

akmittal · on Nov 16, 2017

They rewrote 160,000 lines and it took them more than a year

> It replaces approximately 160,000 lines of C++ with 85,000 lines of Rust.

https://blog.rust-lang.org/2017/11/14/Fearless-Concurrency-I...

mundanevoice · on Nov 16, 2017

The whole Firefox codebase has not been ported to Rust and will not be anytime soon. So your argument doesn't hold true.

daveFNbuck · on Nov 16, 2017

Isn't that a point in the rewrite's favor? You don't have to port everything to gain a lot of the benefits.

mundanevoice · on Nov 16, 2017

> You don't have to port everything to gain a lot of the benefits.

I agree with this. But just saying Firefox has X millions of lines of code doesn't say anything about the possibility for Linux to be ported to Rust.

daveFNbuck · on Nov 16, 2017

It does say that the 20 million lines statistic isn't sufficient to dismiss the idea. That's not saying much, but I think it's a valid response to someone pointing out that the Linux kernel is 20 million lines.

craftyguy · on Nov 16, 2017

you're comparing apples and rutabagas. code line count is a hilariously inaccurate way to measure the complexity required for a portable OS kernel vs a userspace application.

mundanevoice · on Nov 16, 2017

> code line count is a hilariously inaccurate way to measure the complexity required for a portable OS kernel vs a userspace application

I never said that. My whole premise is that you can't say anything like this by just saying code count on two very different projects. Maybe you didn't read my comments properly.

mfukar · on Nov 16, 2017

What benefits, precisely, though? New bugs? http://insert-script.blogspot.de/2017/11/firefox-settings-co...

irishsultan · on Nov 16, 2017

What's the relation with Rust? I don't see it mentioned in the post, it affects Firefox before version 57, but not 57 (not that there wasn't any Rust code in Firefox before 57) and it seems to be a bug in DOMParser, which as far as I know is not written in Rust.

jhasse · on Nov 16, 2017

Better security. Oh wait ... Chrome still wins although it's written in C++.

rurban · on Nov 18, 2017

Rust is not safe. ATS, Pony, C#, Java, lisp come to my mind. But maybe even D, Nim, swift, crystal. Singularity/Midori or the LispM were good candidates. And it cannot be a monolithic kernel of course.

akkartik · on Nov 16, 2017

Not necessarily: http://www.cl.cam.ac.uk/~srk31/research/papers/kell17some-pr...

HN discussion: https://news.ycombinator.com/item?id=15179188

pcwalton · on Nov 16, 2017

If someone was going to deploy a version of the Linux kernel with SoftBound (which is among other things what this paper suggests), they would have done so by now. It hasn't happened, because (a) contrary to what that paper suggests, performance really is a concern; (b) the entire ecosystem of C is what makes it so popular, not the language itself. Replacing the runtime is a huge amount of work.

pjmlp · on Nov 16, 2017

C performance is a result of 40 years taking advantage of UB optimizations.

johncolanduoni · on Nov 16, 2017

I think they were talking about SoftBound vs vanilla C performance, not Rust vs C.

Nokinside · on Nov 16, 2017

It's possible to program in C in a way that is memory safe and verify it programmatically (even with memory allocations and recursion). It's even do static verification of the absence of runtime errors, arithmetic overflows, inconsistent locking, etc. Best tools are commercial unfortunately.

pcwalton · on Nov 16, 2017

Can you elaborate as to how to prevent use-after-free statically in C without forbidding dynamic memory allocation?

_0w8t · on Nov 16, 2017

A static checker for C code can do the similar checks that the borrow checker in Rust does. Essentially it enforces a variation of linear type system. Note that many things for Rust are on ideas from Cyclone [1], that was a research project to provide minimal extensions to C type system to ensure memory safety.

[1] - https://cyclone.thelanguage.org/

pcwalton · on Nov 16, 2017

I read the Cyclone papers when I designed the original Rust borrow checker, and we spoke to Dan Grossman.

Cyclone only works because it uses garbage collection for the heap. There is a uniqueness extension, but it is limited compared to what Rust can do. (See Grossman's "Existential Types for Imperative Languages" presentation for details.) The uniqueness extension was not designed to (and is not flexible enough to) supplant the garbage collector in most real-world projects.

This theoretical static checker for C code would not be compatible with virtually any existing C code in the wild, and so it would be effectively a new language.

pjmlp · on Nov 16, 2017

Only works if 100% of the code is available, thus not usable by companies selling commercial libraries.

jcelerier · on Nov 16, 2017

That's not really relevant when talking about the linux kernel, though, is it ? There's tons and tons of open-source code that could just run clang-analyze today and eliminate swathes of memory bugs.

pjmlp · on Nov 16, 2017

Sure it is, starting by the GPU drivers.

_0w8t · on Nov 16, 2017

Static checker can work in principle on binary code. Alternatively, one can annotate API the library provides for the static checker. This is especially feasible with C where API types and usage is typically rather simple. Clearly, this implies trusting API to be implemented correctly or at least using the static checker to prove that memory-safety bugs are originated in the library, not in the application code.

Nokinside · on Nov 16, 2017

By proving that memory references don't live longer than the allocated memory.

The use of dynamic allocation must be carefully restricted and programmer may need to provide provide assertions or modify the code to pass verification.

There is two ways tho think static verification:

1. Language level verification. All valid programs are verified to be correct in respect of safety guarantees the language gives.

2. Verification of safety features that can't be proved in general. In most cases it's possible to write program in a way that it can be verified to be correct, but finding a way to write provably correct program may need some work.

pjmlp · on Nov 16, 2017

Now try to use that in the presence of commercial libraries.

jhasse · on Nov 16, 2017

Try to use Rust in the presence of commercial libraries.

pjmlp · on Nov 16, 2017

Easy just like any compiler for a memory safe language.

There might exist logical bugs, but the language semantics allow for a higher thrust of code quality, assuming the binaries haven't been tampered with.

Every line of C code is unsafe, specially if the binary was compiled with -O3.

Regarding Rust, the only issue is lack of support for binary libraries in Cargo, one needs to call rustc directly.

Rusky · on Nov 16, 2017

Linux doesn't do any of this, and doing so is tantamount to running a borrow checker on the code, which requires more information than is present in vanilla C.

pcwalton · on Nov 16, 2017

At that point you will have basically reinvented Rust. The resulting language will be so incompatible as to essentially be a new language.

pjmlp · on Nov 16, 2017

I have been hearing that since about 1990, never materializes, even on projects where Insure++ was being use.

Not even Frama-C can prevent all use cases, if developers don't care.

_0w8t · on Nov 16, 2017

I wonder what makes C so different from Ada where static verifies are used successfully. Is it because typical C code uses way too many pointers making full static verification much harder?

pjmlp · on Nov 16, 2017

Language semantics, most of C errors that lead to memory corruption are simply forbidden in Ada.

There is no way to corrupt memory in Ada unless you explicitly import one of the Unchecked packages and ask the compiler permission to do naughty stuff.

jcelerier · on Nov 16, 2017

> Ada where static verifies are used successfully.

and yet even ADA's standard library sometimes has bugs: http://blog.adacore.com/formal-verification-of-legacy-code

touisteur · on Nov 16, 2017

Note the article you're citing is going from Ada (already strong safety checks, but you can still write bugs, just not a whole class of them) to Spark2014, so formal proof of absence of runtime errors (first) and automated verification of functional properties. Even nicer, in a stdlib, so not a toy example and something to build on... You're showing (thanks !) how to go (with some effort, but not so much) from 'so much safer than C already' to 'no bug let's move on and maybe remove all the runtime checks to get C-like performance or better'. I'd say it looks pretty cool.

pjmlp · on Nov 16, 2017

Sure, every programming language can have bugs.

However C gets the usual set of logical errors, common to any safe programming language, memory corruption caused by lack of checks on memory access and implicit type conversions, and 200+ cases of UB when taking advantage of full compiler optimizations.

Reducing the amount of possible errors per line of code is already a plus, even if doesn't remove all of them.

adrianN · on Nov 16, 2017

It's probably easier to use a language that has those guarantees already than trying to bolt them onto C. Those tools and coding standards are useful if you're constrained to using C for whatever reason, e.g. if you're on an embedded platform that only has a vendor compiler. For general purpose hardware you have a lot fewer headaches if you start with a language that was designed for safety.

margorczynski · on Nov 16, 2017

And I think the compiler/interpreter can optimize the code much better when the creators know that such constructs are available in the language itself

zaarn · on Nov 16, 2017

Long term the solution is to switch to an OS that is memory safe and has a safe architecture that simply forbids exploits from working.

Memory safe languages are not panacea, not even close, and especially not in an operating system.

adrianN · on Nov 16, 2017

If by long-term you mean at least fifty years, then maybe.

DSingularity · on Nov 16, 2017

Can anyone here comment on the races in MPX and whether this will prevent deployment?

https://intel-mpx.github.io/microbenchmarks/#multithreading

youdontknowtho · on Nov 16, 2017

This is really cool. I would love to hear more about this. Those bounds checking instructions will hopefully make a big difference.

lima · on Nov 16, 2017

Is this PAX_REFCOUNT or something entirely new?