RustBelt: Logical Foundations for the Future of Safe Systems Programming

bitL · on Jan 9, 2016

How do you address the inherent unsafeness of hardware? Even "pure" functional languages are hacking around this...

pcwalton · on Jan 9, 2016

> How do you address the inherent unsafeness of hardware? Even "pure" functional languages are hacking around this...

By proving the hardware correct. Then if you're running proven memory safe Rust on formally verified hardware, you're guaranteed by construction that the Rust you're writing is memory safe (for the formally-specified notion of "correct" and "memory-safe" in use, which will probably issues like manufacturing problems and compiler bugs).

A Google search for [hardware formal verification] can get you started, if you're interested. There's been a good deal of work in this area.

johnbender · on Jan 9, 2016

Assuming you mean the weak memory models, the compiler normally enforces the guarantees made by the semantics of the programming language ... assuming you have a semantics.

That's where this project comes in and that is why the C++ memory model definition/formalization was so important. Programmers need to know what guarantees the compiler gives about the behavior of the code after compilation and the semantics is the final word.

bassislife · on Jan 9, 2016

I think the assumption is that the hardware spec is correctly implemented and that it runs ideally well. Otherwise that would require some kind of probabilistic model I guess. But you're right, "purity" is a simplification. Computations are inherently effectful(and even side-effectful depending on the reference frame) in practice.

Also, I seem to understand that the research is merely done in the context of the Rust language. The title is maybe a bit too much :)

pjc50 · on Jan 9, 2016

Elaborate on what you mean please?

GolDDranks · on Jan 9, 2016

What do you mean? Do you mean that hardware is unreliable or might break and thus even a sound program, or a program that doesn't do anything that counts as unspecified behaviour, is not 100% safe? Or that with "unsafe" escape hatches often needed to directly interop with hardware are a problem?

GolDDranks · on Jan 9, 2016

If it's the former, I agree.

If it's the latter, well, in Rust, the norm is to design safe abstractions around unsafe code. For example the vector class of the Rust standard library contains some unsafe code, but with the all parts that are unsafe being private, the public API exposed is essentially safe.

If this project comes up with reasonable semantics what can be considered to be "safe" with unsafe code, we can verify, that the unsafe bits of the standard library (and other libraries) are being used in a safe manner.

Animats · on Jan 9, 2016

They're hiring "postdocs". Do "postdoc" positions in Germany mean, as in the US, paying $30K for people who could make $200K in industry?

samth · on Jan 10, 2016

CS postdocs in the US are certainly not paid $30k.

Source: I've been a postdoc in CS, and I currently employ postdocs.

Animats · on Jan 10, 2016

OK, $44K. [1]

[1] http://www.payscale.com/research/US/Job=Postdoctoral_Researc...

irq-1 · on Jan 9, 2016

Rust 1.5 src has 2121 instances of "unsafe {". Didn't expect it to be so high.

kibwen · on Jan 9, 2016

That's not a particularly useful metric, because that would also be counting all the occurences in documentation (Rust makes extensive use of executable docstrings, so restricting your search to just .rs files doesn't solve the problem), test cases (where you absolutely want lots of tests for unsafe functionality, and Rust often puts tests in the same file as the functions being tested, so excluding certain directories doesn't solve the problem), all the APIs that wrap C interfaces (where unsafe blocks are necessarily unavoidable), and bits of the compiler itself (for e.g. talking to LLVM).

Even then, the standard libraries are exactly where you want your unsafe code to be; safely encapsulating useful unsafe patterns in a highly-visible and well-audited venue is ideal, and reduces the usage of unsafe code in (presumably less well-audited) third-party libraries.

akkartik · on Jan 9, 2016

Here's a concrete data point in rust 1.5.0 to critique:

  $ grep 'unsafe {' src/lib* -r |wc -l
  1529

I'm skipping src/test and src/doc and a few other things, so this is likely a mild under-estimate on that front.

Docstrings seem to always require a comment leader, so let's ignore any lines where the first non-whitespace characters are not '//'.

  $ grep 'unsafe {' src/lib* -rh |grep -v "^\s*//" |wc -l
  1480

Inline tests are created using 'mod test', so let's look:

  $ grep 'mod test {' src/lib* -rl |wc -l
  6
  $ grep 'unsafe {' `grep 'mod test {' src/lib* -rl` |wc -l
  13

So of 1.5k occurrences, 13 lie in modules with inline tests. Let's conservatively ignore them all.

Final count of non-docstring non-test occurrences of 'unsafe {' in the 1.5.0 sources: 1463.

kibwen · on Jan 9, 2016

That's still not a particularly useful metric because the vast majority of the "lib" directories being counted there are for the compiler itself (e.g. libsyntax, librustc_blahblah), yet more test cases (e.g. libcoretest), deprecated (e.g. librand), or miscellaneous (e.g. librustdoc, liballoc_jemalloc).

For the context of the OP, all we care about are unsafe blocks that have effects on the semantics of generated code. This means that all we care about are the unsafe blocks in the standard library.

The standard library itself is composed of a few libraries, which can be determined by looking at the `extern crate` statements in this file: https://github.com/rust-lang/rust/blob/master/src/libstd/lib... . Ignore libtest and liblog because they're conditionally used only in "test" mode. Ignore librand because it's deprecated. That leaves libcore, libcollections, liballoc, librustc_unicode, and liblibc (and then we must also consider libstd itself). Of these, we're going to ignore liblibc because it's just a wrapper over libc, and any `unsafe` in there is going to be uninteresting. We're also going to ignore the specific directory "libstd/sys" because these are wrappers over system-specific low-level APIs and are going to be overwhelmingly uninteresting as well.

You also didn't restrict your search to .rs files, so we'll do that too. Here's the final tally for each lib directory (using the latest Rust checkout), according to the following command:

  git grep -v "^\s*//" -- "*.rs" | grep "unsafe {" | wc -l

libstd (sans sys): 344

libcore: 151

libcollections: 155

liballoc: 49

librustc_unicode: 2

Total: 701

This is still going to be an overestimate because, looking at the lines that are left, many of them are calling compiler intrinsics directly which is as uninteresting as calling C code. You could reduce even that number by writing a wrapper function that encapsulates the intrinsic into a safe function, but that exposes yet another problem with the "counting the number of occurrences of `unsafe`" metric": the precise number of `unsafe` blocks in a chunk of code is subjective.

You could have one big block for many consecutive unsafe operations, or you could have fine-grained unsafe blocks for each individual unsafe operation. It's up to the author (and there are arguments in favor of each). Furthermore, the precise number of actually-unsafe operations within a module that uses `unsafe` is up to the discretion of the module author (I recommend reading http://doc.rust-lang.org/nightly/nomicon/working-with-unsafe... for elaboration).

Furthermore, I reiterate my point from my earlier comment: if you're going to have unsafe code anywhere, the standard libraries are exactly where you want it, and therefore the amount of unsafe code in libstd is unabashedly above average.

EDIT: I should also mention that not all `unsafe` blocks, even the interesting ones, pose the same difficulty to formal verification. In fact, my very-uneducated guess would be that liballoc, despite having relatively few uses of `unsafe`, will be the hardest to verify due to the fact that it contains the tricky `Rc` type.

jpt4 · on Jan 9, 2016

Minor point, (= 1461 (- 1480 6 13)), not 1463, unless I misinterpreted.

akkartik · on Jan 9, 2016

Sorry I meant to say (- 1480 13) = 1467.