Bugs You'll Probably Only Have in Rust

erickt · on June 14, 2017

One of the most important tools when writing unsafe rust is compiletest [1]. It's a tool extracted from the compiler project that lets you write tests that are supposed to fail compilation. Since safe abstractions rely on the type system to make unsafe code safe, it's critical to make sure the compiler is properly rejecting code. I wrote a post about this years ago when I got hit by one of the bugs Gankro wrote about [2].

[1]: https://github.com/laumann/compiletest-rs

[2]: http://erickt.github.io/blog/2015/09/22/if-you-use-unsafe/

wyldfire · on June 14, 2017

> Making unsafe a big scary "all bets are off" button is only compelling if most of our users don't need to use that button. Rust is trying to be a language for writing concurrent applications, so sharing your type between threads requiring unsafe would be really bad.

It would be neat if we could decompose unsafe like so "unsafe[this_feature,that_feature] {}". The unqualified "unsafe" could still refer to a global "free reign", but you could opt-in to "only let me violate these specific rules." It would be a hint to maintainers and might help make the std lib and other core libraries be/remain defect-free.

Another interesting "oh shoot" w/unsafe that I'm curious about: when I intentionally/unintentionally alias two variables in my unsafe block, this will invalidate assumptions made elsewhere in safe code. This is my unsafe block's bug, but it seems like something that could take a good while debugging to attribute back to my unsafe block. I don't think there's a good resolution to this one other than perhaps documentation/best practices.

Gankro · on June 14, 2017

Re: parameterized unsafe -- I think it's been discussed and rejected, I don't remember where. I think it was mostly a matter of "yes this would be more powerful, but the complexity isn't worth it".

Note that we sort of made a "new" kind of unsafe with the UnwindSafe trait: https://doc.rust-lang.org/std/panic/trait.UnwindSafe.html

That's probably how we intend to solve these kinds of problem in the future.

Re: aliasing -- if it's a serious enough problem, one of two things will happen:

* Someone will develop a version of asan/ubsan for Rust.

* The Rust devs will be forced to reduce the extent to which they apply alias analysis by default (possibly with a flag to opt into it). At least temporarily.

The rust devs have backed off optimizations in the past when they break stuff in the ecosystem (struct layout optimization). But they also work with the affected devs to fix those bugs so they can turn the optimization on.

wyldfire · on June 14, 2017

> Someone will develop a version of asan/ubsan for Rust.

This already happened (japaric [1]). But ASan won't save you from a bug due to optimization-because-I-assumed-these-locations-dont-alias (maybe TSan might?).

[1] https://users.rust-lang.org/t/howto-sanitize-your-rust-code/...

dbaupp · on June 14, 2017

As you say, none of the existing sanitisers catch Rust-specific problems, which is, I assume, what the parent meant by "for Rust". That said, they will likely catch many of the consequences of such violations, just not pinpoint the cause as precisely.

comex · on June 14, 2017

A Rust-specific sanitizer has been proposed, though. See my other reply (which I was in the middle of writing when this thread popped up, so I didn't see it):

https://news.ycombinator.com/item?id=14553679

dbaupp · on June 14, 2017

That isn't yet an existing sanitizer. :)

It is definitely an important missing piece to bridge the gap to ASan/TSan, but it's still just a proposal/work in progress, not least because, AIUI, the precise rules it needs to enforce aren't yet entirely clear.

cytzol · on June 14, 2017

> This is my unsafe block's bug, but it seems like something that could take a good while debugging to attribute back to my unsafe block.

You are correct: it's possible to write nefarious code inside an 'unsafe' block then only suffer its effects outside of it, and Rust has to document that fact. The Nomicon, mentioned in the blog post a whole bunch, points this out early on:

> 'unsafe' does more than pollute a whole function: it pollutes a whole module. Generally, the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy.

[ https://doc.rust-lang.org/nightly/nomicon/working-with-unsaf... ]

kibwen · on June 14, 2017

> It would be neat if we could decompose unsafe like so "unsafe[this_feature,that_feature] {}"

I sometimes feel the same way, but remember that the `unsafe` keyword only unlocks four additional features:

1. Dereferencing a raw pointer

2. Calling an unsafe function or method

3. Accessing or modifying a mutable static variable (and this might conceivably even be removed entirely someday)

4. Implementing an unsafe trait

It's unclear to me how to make this any more fine-grained such that annotating the "kind" of unsafe you're using would be useful and enforceable by the compiler (which is crucial, because otherwise why not just use a comment?).

In practice I think really the only "distinction" in unsafe Rust that I want is the ability to distinguish unsafe blocks that exist only to call external C code.

cpeterso · on June 14, 2017

> 3. Accessing or modifying a mutable static variable (and this might conceivably even be removed entirely someday)

Mutable static variables removed or the unsafety of accessing them? Didn't Rust, at one early point, not allow mutable global variables?

kibwen · on June 14, 2017

`static mut` specifically. To elaborate, we obviously can't just go and remove it now due to our backwards-compatibility promise (at least not without a long deprecation period and a breaking major language version bump). Furthermore, even if we wanted to we actually can't completely replace `static mut` just yet: the intended replacement (using normal (non-`mut`) `static` variables that have `UnsafeCell`s in them) isn't completely usable until our constant-evaluation story is fleshed out further. And the unsafety would still be present one way or the other, but this would allow us to make the language simpler and make it a bit easier to explain the `unsafe` keyword (it would be a general extension of our policy to push complexity out of the language and into libraries whenever possible, which we believe makes the implementation easier to audit and results in a safer and more reliable language).

steveklabnik · on June 14, 2017

One of the compiler team members advocated for removing static mut entirely, but it didn't quite happen before 1.0. It's totally feasible to do so if const fn was stabilized, but it's not, so...

eddyb · on June 14, 2017

comex · on June 14, 2017

> Another interesting "oh shoot" w/unsafe that I'm curious about: when I intentionally/unintentionally alias two variables in my unsafe block, this will invalidate assumptions made elsewhere in safe code. This is my unsafe block's bug, but it seems like something that could take a good while debugging to attribute back to my unsafe block. I don't think there's a good resolution to this one other than perhaps documentation/best practices.

As part of the unsafe code guidelines effort, there's been talk of adding an 'unsafe checker' mode to rustc, analogous to valgrind or Clang's AddressSanitizer, which would alter code generation to add checks that could catch many classes of incorrect behavior at runtime. (This would have a high performance cost and would be intended as a debugging tool.) One of the things it would probably do is keep a global map of all live references, and complain if references are created that break the rules, e.g. a mutable reference is created to something that already has a (mutable or immutable) reference somewhere else in the program. Thus it could catch the kind of bug you mentioned.

Of course, you would have to remember to run the checker, and as a dynamic rather than static analysis it would only catch errors that are actually exhibited at runtime (so it probably wouldn't catch the MutexGuard example from the original blog post, unless there was some real code that raced on a MutexGuard). Still, in practice it should help a lot with ensuring that unsafe code doesn't break the rules.

Edit: Niko talked about this in a blog post in February. He proposes a somewhat more complex tracking system than the global list of references I mentioned:

http://smallcultfollowing.com/babysteps/blog/2017/02/01/unsa...

eru · on June 14, 2017

At Standard Chartered our in-house Haskell dialect did a tiny bit of that: it distinguishes between ReadIO and general IO. ReadIO is meant to be idempotent operations only.

projectorlochsa · on June 14, 2017

unsafe with features sounds like monad transformers in haskell.

if implemented it would probably outperform haskell (given that monad transformers in haskell have runtime overhead, unfortunately).

kibwen · on June 14, 2017

So happy that Gankro is back writing things about Rust, and especially delighted to hear that the Rustonomicon is going to be fleshed out more. :)

barsonme · on June 14, 2017

If you—like me—were interested in Diesel ORM's zero sized types thing, here's a pretty decent explanation: https://np.reddit.com/r/rust/comments/3ur9co/announcing_dies...

edit: Go also has zero-sized types (struct{}), so I wonder if this is also possible? Probably not, I don't think, since the compiler doesn't see through interfaces.

Manishearth · on June 14, 2017

If you're interested in more of this, Sean, the author of Diesel, is giving a talk @ RustConf in August (http://rustconf.com/program.html#sean) which will probably cover these tricks in more depth :)

[Talks will be recorded, but also tickets are on sale right now if you want to be there in person!]

tatterdemalion · on June 14, 2017

> Go also has zero-sized types (struct{}), so I wonder if this is also possible?

No. It specifically uses Rust's generics system, and the fact that generics are monomorphized at compile time, whereas Go interfaces are not.

C++ templates can be used in similar ways.

dom0 · on June 14, 2017

Yes, but differently; C/C++ explicitly prohibit zero-sized types due to object identity.

Gankro · on June 15, 2017

To clarify: C says it's UB, C++ rounds up to 1.

bluejekyll · on June 14, 2017

I have to say, these RCA's of the various bugs are great for getting a better understanding of the internals of the language.

In a lot of ways it makes me trust Rust even more, because there is a deeper understanding of exactly how these guarantees are made.

halestock · on June 14, 2017

Question for the rust folks - are there any features that wouldn't have been possible without "unsafe"? That is, if rust never had unsafe, would it have been fundamentally limited in any way? Or is it required for e.g. interoperability with C?

dbaupp · on June 14, 2017

C and other FFI is fundamentally outside the control of the Rust compiler (or any other non-C compiler) and, furthermore, the foreign functions can have arbitrarily complicated preconditions (or plain bugs) that lead to memory safety violations. This means that, whether it is marked or not, these operations are semantically `unsafe`, as in, they risk memory unsafety because the compiler can't guarantee it. This does mean that all languages with FFI have safety holes.

Additionally, not having the facility for low-level/unchecked code just means that things like optimised data structures/memory management/hardware interaction get implemented either in the compiler or in other languages. The former is much harder to reason about and to modify: one is essentially writing code that generates compiler IR, which is more annoying and error prone that both just writing the code directly and just writing the IR directly (one way to think about this is the compiler is one big `unsafe` block). The latter is unfortunate because it results in impedence mismatches when doing the FFI calls both semantically and with performance, and it also means that code doesn't get to benefit from the usual Rust safe checks and high level features (like ADTs) that are all still available inside `unsafe` blocks.

steveklabnik · on June 14, 2017

I'll give you the shortest example: in order to build an operating system in Rust for x86, you need to do this:

  let p = 0xb8000 as *mut u8;

VGA drivers use the memory mapped at 0xb8000 to drive the device. This creates a pointer, p, at that address.

In order to demonstrate this is safe (okay so unsafe isn't in this example, creating p is safe, but writing to/reading from it is not), a language would have to know:

1. That your code is running in kernel mode, that is the entire concept of ring 0 vs ring 3.

2. That the VGA spec specifies that location in memory.

Yeah, in _theory_, you could have a language that does this, but that'd tie your language so, so, so deeply to each platform, that it's not feasible.

This can be extrapolated to all kinds of other low-level things.

jacquesm · on June 14, 2017

> That your code is running in kernel mode, that is the entire concept of ring 0 vs ring 3.

That need not be the case though. You could have a kernel side allocator that sets up the MMU to map that memory to a pointer that you return which lives in the space of the process. The MMU would take care of the required arithmetic to access the memory at its actual location using an offset.

That way you can map resources from real addresses into arbitrary addresses on the user side.

I think the correct term for this mechanism is 'system address translation'.

steveklabnik · on June 14, 2017

The language would still have to understand all of that in order to write that kernel side allocator in safe code.

jacquesm · on June 14, 2017

I don't see how that follows. The language can't possibly understand the intricacies of what the MMU is capable of (besides, every MMU is different), and as far as the language is concerned what is returned is simply a valid offset and a length to go with it to indicate where the allocated segment ends.

steveklabnik · on June 14, 2017

I think you're strongly agreeing with me. It's not feasible to have in the language.

k__ · on June 14, 2017

Can't the prohibitions be modularized?

Like, when you compile for x86 there are a bunch of rules that aren't generally safe, but on that platform they are.

steveklabnik · on June 14, 2017

Modularization wouldn't help the fact that you'd still need a module per platform and that's not feasible, see the other replies to my comment. There's just far, far, far too many details.

Gankro · on June 14, 2017

Do you mean:

* All unsafe operations don't exist.

* All unsafe operations exist, but the literal unsafe keyword and its machinery doesn't exist

The latter is how most ostensibly safe languages work. See Haskell's UnsafePerformIO, Swift's UnsafePointer, and Java's JNI for 3 examples off the top of my head.

The former is just a really gimped language that would have been a pain in the neck to implement libraries for (see other replies for examples).

jeeyoungk · on June 14, 2017

It also depends on how "deep" you want to get.

A lot of built-in constructs uses unsafe under the hood. Vector (Rust's dynamic arrays) does memory allocation / resizing under the hood for example, and there's no safe way to do it, unless some safer array allocation primitive is exposed.

Also things like mem::swap(x, y) cannot be implemented at all with safe rust. in order to perform swap, you need a temporary variable. That temporary variable would be uninitialized, which Rust does not allow.

Note that in c++ it invokes copy constructor - http://www.cplusplus.com/reference/algorithm/swap/ - but Rust's mem::swap works for types that does not implement the Copy trait (Rust's equivalent to copy constructor).

Same can be said about slice splitting functions, which is primarily used to work around Rust's borrow checker.

vvanders · on June 14, 2017

You pretty much nailed it with you question. FFI is by nature unsafe since C doesn't have lifetime semantics and traits around thread safety.

Animats · on June 14, 2017

[flagged]

kbenson · on June 14, 2017

> (I've complained about this in the past, pointing to places There's a Rust fanboy who usually replies, without citing any references, denying everything. The usual excuses are 1) nobody really uses unsafe crate X much, 2) the crates fanboy is using contain less unsafe code than I'm finding.)

While I've actually seen you point this out multiple times here, and seen the response you describe, I can't help but feel you are characterizing it with unhelpful flair which is likely to result in nonconstructive responses.

pcwalton · on June 14, 2017

> Here's a typical example, in crate "url": [1] unsafe conversion of a slice from a UTF-8 string by calling "str::from_utf8_unchecked". Minor speedup, but if the slice calculation is off, unsafe, because advancing through a bad (or misaligned slice of a?) UTF-8 string may cause a subscript error.

Simon is a reasonable guy and is quite receptive to bug reports. If you think that that function should be made safe, feel free to file an issue on GitHub.

Note that URL parsing is quite performance-critical in browsers, especially where data: URLs are concerned. The code in question is used in Firefox.

Manishearth · on June 14, 2017

I assume "rust fanboy" is me.

Please stop it. This is the fourth time you're making this unsubstantiated claim.

Yes, I backed up my argument with "the crates I am using" (that is, all the crates in the entire dependency tree of any crate I've used in the last year), which is at least somewhat representative of the crates that folks use, as opposed to your wild claims that have zero backup (or are backed up with one or two mentions of crates, where subsequently the use of unsafe is justified by me or others).

Here's an example of one of these discussions: https://news.ycombinator.com/item?id=14276129 . Here's another: https://news.ycombinator.com/item?id=13280347 . In the second one I actually admit that I was surprised at the number of crates doing unchecked indexing, that's definitely not denial. I still maintained there that it was quite manageable and not some crazy pervasive problem as you make it out to be, but you've literally never provided evidence otherwise. That's not denial.

As I've mentioned before, I'm genuinely interested in finding these cases so that they can be fixed (or replaced with well-audited general purpose unsafe abstraction libraries -- e.g. a cheap ASCII-within-utf8 handling library that lets you do things like munch ascii bytes while parsing without extra overhead (or whatever the use cases are -- I'm not yet clear on the set of use cases out there which is why I'm interested to hear more). I'm not "denying everything", and other people have even pointed this out (https://news.ycombinator.com/item?id=14277634) in the discussions we've had.

-----

FWIW, IIRC UTF8 stuff has been a perf bottleneck in the url crate. It's not premature (file a bug, though). That said, we might be able to write that code differently to avoid it or have a more general way of doing it with clearer invariants (I'll think about it!).

The ordermap crate does implement a hash map without unsafe code (leveraging vec). The stdlib hashmap uses unsafe because its design (which has different tradeoffs wrt ordermap) needs to be able to have buckets be empty (where whether or not the bucket is empty is based on a nontrivial set of invariants from the robin hood based design) without using something like an Option which has extra memory and CPU overhead. Optimizers are not smart enough to figure out the robin hood invariants, so this is not premature either.

----

I no longer believe you're making these claims in good faith, so I'm not really interested in discussing this further with you. I just want to counter the wild claims here for the benefit of other folks reading this.

If other folks wish to constructively point out cases of unsafe being used this way, I'd love to hear about it, both to get a better idea of the ecosystem, and to help improve it. I do want the community to move towards less unsafe code.

burntsushi · on June 14, 2017

> IIRC UTF8 stuff has been a perf bottleneck in the url crate

Right, and this isn't surprising at all. It's not at all obvious to me that it's premature. It's really easy for from_utf8 to show up in a profile, and I've had occasion to either work around it safely (perhaps by staying in `&[u8]` land) or by resorting to `unsafe`. I did the latter in the CSV crate for managing `StringRecord`s.[1] The record is stored contiguously in memory, which means that even if the entire record is valid UTF-8 as a whole, then each individual field may not be. Checking UTF-8 validity on each field or on access leads to a slow down. (There are benchmarks.) It's much nicer to code a fast path that checks if the entire record is all ASCII. (Because then you can assume each field is valid UTF-8.)

[1] - https://github.com/BurntSushi/rust-csv/blob/34e957e63bb92853...

dbaupp · on June 14, 2017

I have found it very interesting to use the safe function when debug_asserts are on and the unsafe one otherwise (or more generally, some abstraction over safe/unsafe variants that compiles away to nothing in the optimised case). This plays really well with fuzzing/quickcheck (etc) too.

My favourite example of doing this personally is: https://github.com/Aatch/ramp/pull/48

Animats · on June 14, 2017

That's how you put in a buffer overflow backdoor.

dbaupp · on June 15, 2017

Could you be more specific about what you mean?

The relevant (i.e. when it isn't premature optimisation: benchmarks have been run) alternatives here are

1. always risk memory unsafety, fast

2. crash on conditions that would trigger memory unsafety, slow

3. use 1 in release builds and 2 in debug/testing builds.

This is effectively having an in-code sanitiser. You can definitely argue that using an actual sanitizer might be better/less error prone, but this scheme allows flagging more general invariants than what sanitisers currently understand, and, can/should be used in conjunction with sanitisers (more checking of `unsafe` is always better).

Of course, poorly implementing the abstraction might lead to behavioural differences between the two modes, but one has to be careful when writing unsafe code anyway. Additionally, typically these abstractions can be small and even written in ways that make it clear that there's only extra checking, returned results are the same (i.e. the only code difference in the abstraction's public API is some extra debug_assert calls).

jononor · on June 15, 2017

Another alternative: 4. Provide two codepaths, foo() without unsafe and foo_unsafe() for using the unsafe/fast paths.

Then the consumer/caller can decide on the tradeoffs. Documentation should maybe prefer the safe, a benchmark can show the difference.

Can be used in combination with your current approach of checking in debug builds.

Some test cases should exist to exercise the unsafe checks, and show problems to avoid when using the unsafe variant.

dbaupp · on June 15, 2017

That is another alternative although I am not sure it is a great one in the context of code where the unsafe operations expose a safe interface. That is, there should be no memory unsafety no matter how the API is used and the safety checks are just there to double check the library's internal invariants that the scoped `unsafe` is relying upon to be correct.

The downsides are that it is much more complicated to maintain: either ~2 times as much code, some ugly macros encompassing a lot of code or something along the lines of passing around SafeOps/UnsafeOps types everywhere.

Your suggestion definitely makes sense if foo_unsafe API isn't trying to expose a safe interface (as in, some combinations of parameters may result in memory unsafety), but that's not the case for the specific cases in URL and regex that are under consideration here.

Manishearth · on June 14, 2017

Yep. We could potentially build a set of robust ASCII APIs with an asciichar type (wrapper around u8) that let you cheaply frob the ASCII portions of a utf8 string, or manipulate a pure ASCII string with free conversion to utf8 after the fact. I'm currently unsure what kinds of functionality is necessary here, but I bet it's all doable as a nicely packaged zero cost API.

Animats · on June 14, 2017

[flagged]

openasocket · on June 14, 2017

> Note lack of links to code above.

He provided links to several other threads where he explicitly has made a list. In this one https://news.ycombinator.com/item?id=13280347 he says he looked at ~600 crates he actually uses, and found 70 used unsafe code. Those 70 are given here: https://gist.github.com/Manishearth/6a9367a7d8772e095629e821... with an explanation of how and why they use unsafe. In that comment, he even addresses that he's suprised the number is that high and that work can be done to lower it. You are seriously misrepresenting his contributions to this discussion.

mcguire · on June 14, 2017

"The bug was a missing annotation, and the result was that users of Rust's stdlib could compile some incorrect programs that violated memory safety."

IIUC, technically, the bug was a missing implementation of a trait and the result was a data race (which I (weirdly, maybe) don't think of as memory safety).

In other words, TL;DR: magic is neat, except that sometimes it really sucks.

I may have misunderstood Ralf's bug. Is it really the case that MutexGuard<T> was seen as Sync if T was Send, rather that Sync? Wouldn't that be a bigger problem than just the case of MutexGuard?

dbaupp · on June 14, 2017

Data races can lead to other more conventional displays of memory unsafety. For instance, a read of a pointer length pair (a &[T] slice in Rust) while a write is occuring could resulting in a torn read where the pointer is the pre-write value but the length is the post-write one. If the original length was short than the new one, this can lead to a buffer overflow.

One framing of the MutexGuard problem is that the type wasn't declared in a way that reflected its semantics best, although it is clearly unfortunate that doing this is more complicated than the incorrect way.

vitalyd · on June 14, 2017

> I may have misunderstood Ralf's bug. Is it really the case that MutexGuard<T> was seen as Sync if T was Send, rather that Sync? Wouldn't that be a bigger problem than just the case of MutexGuard?

So T: Sync if &T: Send. MutexGuard internally contains a &Mutex<T> (and Poison, but that's irrelevant here). T was Cell<i32>. If you follow the rabbit hole, you'll net out that T was Send, and therefore MutexGuard was Sync.

grogers · on June 15, 2017

My confusion (and I suspect others) is about what it means for &T to be Sync. Cell<T> isn't safe to be shared across threads (so isn't Sync) but it is Send if T:Send. But that means &Cell<T> is Sync? You can share a reference to something across threads but not the thing itself? What does that even mean?

You could imagine an alternate world where MutexGuard is Send, to allow transfer of ownership of a lock to a different thread while keeping the mutex locked. But that would mean &MutexGuard is Sync, WTF?

dbaupp · on June 15, 2017

The syntax was a bit confusing: T: Sync means (&T): Send and also (&T): Sync. T being Send or not doesn't affect the threadsafety of &T (Send is about transferring ownership which cannot happen with a &T).

You are correct to be confused about (&Cell<i32>) possibly being Sync, because the assumptions that were implied were wrong: when Sync talks about sharing a T, that can be entirely thought of as transferring a &T to another thread (aka Sending the &T). In this sense, sharing a &T between threads (as in, (&T): Sync) is the same as transferring a &&T to another thread, but the inner &T can be copied out so the original &T was also transferred (not just shared) between threads; that is to say, (&T): Sync is 100% equivalent to T: Sync.

Anyway, back to the example here, Cell<i32> is not Sync, so neither is &Cell<i32>, but M = Mutex<Cell<i32>> is Sync (this is a major reason Mutex exists in that form: allowing threadsafe shared mutation/manipulation of types that do not automatically support it), and thus &M is Sync too. Since MutexGuard<Cell<i32>> contains &M, it was thus automatically, incorrectly Sync.

For your second confusion, it is okay for &MutexGuard to be Sync, if MutexGuard itself is. The problem here was MutexGuard was Sync incorrectly in some cases. (MutexGuard is semantically just a fancy wrapper around a &mut T, and so should behave the same as that for traits like Send and Sync.)

mcguire · on June 15, 2017

Ah! I wasn't following the references close enough.

tatterdemalion · on June 14, 2017

Your "IIUC" is just a restatement of the sentence you quoted.

You understood the bug correctly, but its not a bigger problem. You probably are lacking context on auto traits, but this blog post contains the context you need if you read it again.

mcguire · on June 14, 2017

Wait just a minute. Ralf Jung writes,

"This means that the compiler considers a type like MutexGuard<T> to be Sync if all its fields are Sync."

Is that true in general? Is a type thread safe if all its fields are thread safe individually?

dbaupp · on June 14, 2017

Send and Sync are about data races, which lead to memory unsafety, not other forms of thread safety (like dead lock freedom, or maintaining non-unsafe relationships between fields). If there's no unsafe code, then there's no way to have a data race when the individual components are also data race free.

vitalyd · on June 15, 2017

I think it's fair to question a type being auto-derived to be Sync if only individual fields are Sync. It may lead to improper sharing of a reference to this value where threadsafety across fields is needed. That would be a bug, yes, but compiler never made the author pause to consider putting Sync there explicitly (and presumably thinking this through). So that "linting" aspect that's applied to, e.g., *mut T is not present.

dbaupp · on June 15, 2017

I'm not entirely sure what you mean.

The compiler does make the author pause: dangerous types like `*mut T and `UnsafeCell<T>` are not Sync, so types containing them are also not automatically Sync.

In any case, Rust only guarantees no memory unsafety (requiring no data races). It tries to help with other things, like cleaning up resources via destructors, but these are "best-effort" rather than guarantees. The only way to get a data race and/or memory unsafety with an automatically implemented Sync is with `unsafe` code, and any `unsafe` code near a data type "infects" it and so means the whole type require great care.

Tooling like asan and tsan and, hopefully, Rust-specific sanitizers and static analyzers will make this easier to get right, but fundamentally as soon as `unsafe` comes into the equation the programmer has to be paranoid. Of course, as the MutexGuard problem indicates, humans getting it right error-prone, which is why the aforementioned tooling and formal proofs---like the one that found that problem---are important, as is building and using appropriate abstractions (e.g. MutexGuard is semantically designed to be a &mut T, so maybe it could indicate this by using PhantomData, or even just storing that directly: this does require manual work, but pushing conventions like that might bridge help the gap to having great `unsafe` tooling in future).

vitalyd · on June 15, 2017

I suppose my point was that auto-deriving Sync may not have been such a great idea :). I understand the rationale for it but it does open up traps for people to fall into.

vitalyd · on June 14, 2017

Somewhat tangential, but what ensures memory visibility in Rust? Say I allocate a struct (heap or stack), and then pass an immutable reference to a function that takes T: Sync. Assume the struct itself is Sync (e.g. bunch of integer fields). What ensures that the other thread sees all writes to this struct prior to the handoff?

dbaupp · on June 14, 2017

It is the responsibility of cross-thread communication abstractions to use the right fencing (if it is touting itself as safe), probably with the various things in std::sync (especially ...::atomics) if it is pure Rust. For instance, spawning a thread, using a channel (std::sync::mpsc) or a mutex all do such things.

Just calling a function taking T: Sync doesn't need to do any of this, since that call happens all on a single thread. The function might do it internally if it needs to, but that is its own explicit implementation decision.

vitalyd · on June 15, 2017

Ok, that's what I figured - thanks.

That does bring up the question, though, whether it's correct to say that a Sync type doesn't permit data races. In the example I gave above, publishing a Sync struct incorrectly can exhibit data race like symptoms on the receiving thread. So even though the type itself is Sync, that's not enough of a guarantee in the face of "unsafe" publication.

dbaupp · on June 15, 2017

In safe Rust, sharing a value of any Sync type between threads can't result in data races. Send and Sync provide thread safety guarantees about types that other safe abstractions can rely upon, and fencing correctly is one of the things those abstractions have to do to be safe.

I guess "Sync types don't have data races" is the abbreviated version of "Sync types don't have data races in any safe code, no matter how weird and wonderful". That said, this qualification doesn't seem very interesting to me: something equivalent is required about pretty much any statement about any guarantee in any language with unsafe code or FFI (e.g. in Python, something along the lines of "pointers don't dangle in any code that doesn't use `ctypes`"), and thus is elided in a lot of discussions about programming languages.

If you consider `unsafe` Rust, then failing to fence correctly is just one way to get a data race.

burntsushi · on June 15, 2017

It is a guarantee---or else it's a bug (which is the same as every other safe foundation). A type T gets to be `Sync` in one of two ways:

  1. It is "auto derived" when all of its constituent types are Sync.

  2. It is explicitly implemented using `unsafe impl Sync for T {}`. Note the use of the `unsafe` keyword.

vitalyd · on June 15, 2017

Right, but my question isn't about T itself, but rather how it's published to another thread. The example I gave is of a plain struct with no atomics or any other synchronization types internally. A &T is auto-derived to be Sync. But, if a publisher incorrectly publishes this reference, the other thread may see a partially initialized value.

GolDDranks · on June 15, 2017

It's the responsibility of the code that transfers the reference to another thread to ensure that. Deep down, past the abstractions, you can't transfer stuff between threads without using unsafe code. It's the responsibility of this unsafe code to ensure that if the value is visible from another thread, all the writes in the current thread have completed before it's visible. One can do this using memory fences.

Manishearth · on June 15, 2017

There are three ways of sharing data across threads.

One is by sharing the data with the thread when it is spawned via a closure. Spawning will fence. No problem there.

The second is to use a good 'ol Sender/Receiver channel pair. These are effectively a shared ring buffer that you can push to and pop from. They also have a fence somewhere.

Finally, you can stick your data into a mutex shared between threads (and let the other thread wait and read it). This will IIRC fence, or do something equivalent.

You can of course build your own ways to do this, but they will need unsafe code to be built (the three APIs above are also built with unsafe code). It is up to you to ensure you handle the fences right when doing this.

The responsibility here is on the publishing mechanism. Most folks use one of the three ways above using primitives from the stdlib depending on the use case.

vitalyd · on June 15, 2017

Yeah, I understand and what I expected to be the answer. My point is that when people talk about Sync not allowing data races, there's the asterisk attached to that statement. That footnote is that publishing code, which is completely separate from the type itself, needs to uphold its responsibility. Unsafe code is usually discussed in light of raw pointers and more generally raw memory ops, but I rarely see this aspect mentioned.

burntsushi · on June 15, 2017

My point above was subtle but important: the asterisk you're mentioning here is not specific to Sync. This is true of all safety guarantees in Rust. unsafe code must uphold the invariants that safe code will rely on, otherwise it's buggy. For example, if the implementation of `Vec<T>` accidentally got its internal `length` out-of-sync with the data on the heap, then nothing bad in and of itself necessarily happens immediately. The bad thing only happens the next time you try to do something with the `Vec<T>`, which will be in safe code.

The safety of things in Rust is built on abstraction. If abstraction gets something wrong in its `unsafe` details, then there is a bug there. In other words, the asterisk you're mentioning is "You can trust that safe Rust is free of memory safety, unless there are bugs." I feel like that's discussed and acknowledged quite a bit.

> Unsafe code is usually discussed in light of raw pointers and more generally raw memory ops, but I rarely see this aspect mentioned.

I guess I don't see a difference. The safeness of Rust code depends on the correct use of `unsafe`, and this applies to everything, not just Sync.

This idea of `unsafe` being "trusted code" is one of the first things that the Rustonomicon covers: https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html

vitalyd · on June 15, 2017

> I guess I don't see the difference

At a high and general level, yeah, it's all "unsafe". But, most conversations about unsafe don't talk about this aspect. So while what you say is true, I'm merely pointing out that this concurrency aspect doesn't seem to be mentioned much. And while it's implied at a high level, I think it's worth mentioning.

Basically, there's no issue - I just think this should be called out more when concurrency is discussed.

Manishearth · on June 15, 2017

You need raw pointer ops (or, well, dealing with UnsafeCell, which does raw pointer stuff), or syscalls for these too.

Concurrency is not special here. There are all kinds of invariants unsafe code might be required to uphold. So yeah, we could mention concurrency, but then we could also mention UTF8, noalias, initialization, the vector length invariant, the HashMap robin hood invariant, various BTreeMap invariants, etc etc. "Make sure you have fences" is just another semi-specific invariant.

I disagree that "most conversations about unsafe don't talk about this aspect", compartmentalizing unsafe invariants is a major part of these discussions (it's like the first chapter of the nomicon, even)

vitalyd · on June 15, 2017

> Concurrency is not special here I beg to differ. Concurrency comes with its own bag of hazards, as I mentioned in my reply to burntsushi. Comparing its invariants with Vec's length invariant/HashMap's RH invariant, and any other single threaded/internal invariants misses the point.

Unsafe discussions that I've seen rarely talk about fences - they tend to focus on raw pointer ops, ffi, transmutes, unbound lifetimes, and in general, are single thread focused.

Manishearth · on June 15, 2017

Right, but I can make the same point about other invariants. Each comes with its own bag of hazards. You can write pages about the robin hood invariants.

Concurrency is particularly complex, perhaps. I think one of the reasons you don't see that much discussion of this is that in general folks in Rust don't write that many internally-unsafe concurrent abstractions. There are a bunch of great safe building blocks out there (stdlib ones, rayon, crossbeam) which folks use for concurrency; it's very rare to build your own. So that might be it.

At least with the stuff I work on like 50% of the unsafe Rust discussions have been around thread safety and ordering and fences, but we're in that relatively rare situation where we need to build those abstractions, so perhaps it's just me who sees these discussions happening.

-----

It's also probably just that discussions introducing unsafe will deal with problems people are used to -- and memory safety is a far more "normal" problem than thread safety.

burntsushi · on June 15, 2017

Could you say more about why you see concurrency as a special thing to call out? I'm genuinely curious so that I can adapt how I explain this stuff in the future. It would help to understand the distinction you are seeing that I am not. :-)

vitalyd · on June 15, 2017

Only because I've had several people ask me how Rust handles memory ordering after they've learned of Sync. Sync documentation focuses very narrowly on there being no data races against the type that implements it. There's also obviously a difference between a struct consisting of a single AtomicUInt (for example) and a struct consisting of a plain uint. The former can be published unsafely because it internally already provides the appropriate fences by virtue of the atomic. In the latter, it requires the publishing to provide the necessary fences. Again, I don't mean to say that doesn't make sense or is unexpected (after one thinks about it), but I'd urge a bit more focus on that part as well.

Given there's no official memory model, similar to say Java Memory Model, there's not much to go by (correct me if I'm wrong). The JMM, for instance, spells out what needs to happen for happens-before edges to be created. It also talks about racy publication of values. Granted there's no close analog to Rust's unsafe in Java. But saying "T: Sync means it's free of data races in safe code", while correct, is a slightly vacuous statement since that T interacts with other components. And yes, those components will likely involve unsafe code, and unsafe code has its own caveats, but still, I don't think it hurts to make this more pronounced. Concurrency is a special beast, rife with its own hard-to-debug hazards. Being a bit more verbose and possibly repetitive about the hazards won't hurt :).

Manishearth · on June 15, 2017

I don't think it's vacuous. It's very important for compartmentalizing unsafe. It interacts with other components, and that statement tells you the responsibilities of both T implementing Sync, and abstractions accepting Sync types.

It is vacuous at a big picture level where you're trying to understand the complete thread safety story, but that was never what that statement was trying to convey.

----

I think documentation here can be improved, though. When I get more time one of the things I want to do is do a major revamp of the concurrency docs, including paragraphs on how the memory ordering stuff works. I'll include filling out the Sync docs in this, thanks for the feedback!

dbaupp · on June 15, 2017

The atomic type doesn't provide the necessary fences automatically. Operations on it have various sorts of fencing, but these do not necessarily connect in the way you think they do. The partial initialisation problem of creating a struct and publishing a pointer happens for the atomic types too, as they don't (and should not!) do everything with sequential consistency.

vitalyd · on June 15, 2017

Note that I didn't mention anything about it being automatic. It's merely a case of being more explicit about what may happen to the value.

Sequential consistency is irrelevant to the example I gave of a single valued struct. I wasn't talking about any ordering with respect other loads/stores.

dbaupp · on June 16, 2017

Being explicit isn't relevant: a relaxed store to an atomic followed by publishing a pointer to it won't have the expected happens-before relationship (program order won't be respected).

In any case, the only reason to think about publishing is how loads and stores are ordered, so it isn't irrelevant. The single valued struct can still have the partial initialisation problem, it just appears as the single field being completely uninitialised.

vitalyd · on June 16, 2017

That's all true, but I think we're talking past each other a bit. My point isn't about what mechanics would be used or the type of memory order specified (that's the irrelevant part), but merely to highlight that this could use more attention in the docs. The atomic used may be viewed as a "lint" to a reader/user that something interesting may be going on with how this type is used - similar to how *mut isn't automatically Sync; it's an opportunity for people to pause and think things through. If you're looking at a type absent of any atomics or concurrency primitives, it's not apparent that it may be used like that (particularly since Sync is auto-derived and doesn't appear in the source code of the type).

I know one can look at this as being unimportant because only unsafe code that works with Sync types carries the burden of upholding the invariants. But, I still contend that memory ordering ought to be discussed, somewhere, while there's no official memory model documentation (not even sure that's in the plans, although I feel like I may have come across a github issue for that).

Anyway, I don't think I'm saying anything new in this reply over my others. If you feel existing docs are adequate in this regard, that's fine - we can agree to disagree :).

tonydanza · on June 14, 2017

[flagged]

sctb · on June 14, 2017

We've banned this account for posting only unsubstantively. We're happy to unban accounts if you email us at hn@ycombinator.com and we believe you'll post civilly and substantively in the future.

lightedman · on June 15, 2017

And this is why I stick with ASM - I don't have to rely upon everyone else not screwing the pooch when it comes to them developing a language - I just talk straight to the computer, nothing gets lost in translation, my programs are 200x smaller and 400x faster than anything written in Rust.

2D Second Life clone, with full programming capability with built-in database - 2 megabytes. Solid ASM. Rust can't even come close, and never will.

runeks · on June 15, 2017

> [..] my programs are 200x smaller and 400x faster than anything written in Rust.

And take 100x longer to develop :)

lightedman · on June 17, 2017

Nope. Once I start typing the code simply flies.