The parent post is correct. Unsafe code must uphold the invariants of safe Rust....

Animats · on Nov 28, 2016

Unsafe code must uphold the invariants of safe Rust.

Ideally, yes. In practice, maybe. We're probably going to see "unsafe" code that assumes good behavior on the part of the caller. That's a classic problem with APIs.

pcwalton · on Nov 28, 2016

There's no way to solve that problem without just forbidding unsafe code entirely. Unsafe code can have bugs; that's why you should keep it to the minimum and keep it well-known and audited.

In this case, the byteorder crate would have been more appropriate than handrolling unsafe code.

wahern · on Nov 29, 2016

I think the issue, if there is one, is that arguably the more appropriate thing to do would have been to implement the code without unsafe, using an intrinsically correct algorithm. The type-punning is premature optimization. _That_ was the misstep.

That the byteorder crate exists is irrelevant in as much as this was an example of the urge for premature optimization leading the developer down the wrong path. The same amount of time pondering whether to even bother using a pre-existing library might have been better spent second-guessing the urge to type-pun at all.

Also, looking at the byteorder crate, I wouldn't be surprised if it's even slower than the simpler and correct loop I posted elsethread. read_num_bytes in that create uses copy_nonoverlapping, which I assume is analogous to memcpy in C. That's a very round-a-bout and inefficient way to accomplish the task, and likely patterned after similarly bad C code.

To even make it worthwhile, any byteorder library should provide some kind of iterator interface so that it can maintain alignment state while permitting the loop to be unrolled by the compiler. (And it might require a closure or someway of expanding a block of code inline.) That's probably the only way it could outperform the simple, hand-rolled, endianness- and alignment-neutral solution. But it doesn't provide that kind of interface AFAICT.

It's all sort of ironic, which I suppose was the point upthread--this is an example of the irrational urge for premature optimization and of bad programming idioms being hauled into Rust land completely unhindered by Rust's type safety features. And the better, correct, and likely more performant way of accomplishing this task could have been done just as safely from C as it could from Rust.

burntsushi · on Nov 29, 2016

> Also, looking at the byteorder crate, I wouldn't be surprised if it's even slower than the simpler and correct loop I posted elsethread. read_num_bytes in that create uses copy_nonoverlapping, which I assume is analogous to memcpy in C. That's a very round-a-bout and inefficient way to accomplish the task, and likely patterned after similarly bad C code.

It wasn't patterned after any C code. ptr::copy_nonoverlapping doesn't necessarily compile down to memcpy. Namely, concrete sizes are given, so the compiler backend can optimize this down to simple loads and stores on x86, which is probably going to do better than the bit-shifting approach. Namely, loading a little-endian encoded integer on a little-endian architecture should be as simple as a single word-sized load (because the byte swap is unnecessary). It would be interesting to consider whether the safer and more readable bit-shifting approach could be compiled down to the same code, but when I wrote the byteorder crate, this wasn't the case.

This isn't the only place that ptr::copy_nonoverlapping is useful. I used it in my snappy[1] implementation as well, specifically to avoid the overhead of memcpy. To be clear, this wasn't my idea. This is what the C++ Snappy reference implementation does as well. Avoiding memcpys in favor of unaligned loads/stores is a dramatic win. I know this because I tried to write my Snappy implementation without specific unaligned loads/stores, and it performed quite a bit worse. The performance of the Rust implementation is now on par with the C++ implementation. Of course, this is always dealing with raw bytes---there's no type punning here.

ptr::copy_nonoverlapping is a bit generic for this use case. That's why we recently accepted an RFC to add read_unaligned/write_unaligned to the standard library[2]. (Which are implemented via straight-forward calls to ptr::copy_nonoverlapping.)

[1] - https://github.com/BurntSushi/rust-snappy/blob/master/src/de...

[2] - https://github.com/rust-lang/rfcs/blob/master/text/1725-unal...

wahern · on Nov 29, 2016

  Namely, concrete sizes are given, so the compiler backend can optimize this down to simple loads and stores, which is going to do better than the bit-shifting approach

It can't optimize it down to simple loads and stores unless it can prove that it's aligned. If it can't optimize it to a simple load, it has to check for alignment. If it has to check for alignment, it's unlikely to be faster than the byte-loading function. The bit-shifting approach can be parallelized by superscalar CPUs if you unroll the loop. Whereas the alignment check cannot be parallelized on CPUs where alignment matters, whether or not it's been unrolled to load in chunks.

FWIW, memcpy can be similarly optimized in C. memcpy -> scalar assignment is an optimization that GCC (and probably clang) performs. But if it can't prove alignment it can't optimize it to a scalar load/store, and alignment typically can't be proven except for small functions where the optimizer can see the definition of the array _and_ can prove any pointer derived from the array is properly aligned. That's generally not the case when juggling user-provided strings because there are too many conditionals between where memcpy is invoked and the origin of the pointer.

Also, as a general rule unaligned loads are slower even on x86, so it often times makes sense to check for alignment regardless, especially to optimize the case of loading a long series of integers. And when performance matters, that's precisely what you want to do if you can. You want to batch load the series of integers because doing operations in batches is the key to performance on any modern processor. Indeed, it's the key to SeaHash as well. And that's what I meant by saying effort and code complexity is better spent refactoring the algorithm at a higher level than trying to micro-optimize such a small operation. And in addition to often reaping much better gains, you often marginalize if not erase any benefit the micro-optimization might had provided. It's beyond dispute that the gains from SeaHash primarily come from how it refactored its inner loop to operate on a 64-bit word instead of 8 8-bit words.

burntsushi · on Nov 29, 2016

> It can't optimize it down to simple loads and stores unless it can prove that it's aligned. If it can't optimize it to a simple load, it has to check for alignment. If it has to check for alignment, it's unlikely to be faster than the byte-loading function.

I had edited my comment after-the-fact to include the "on x86" qualification.

> And that's what I meant by saying effort and code complexity is better spent refactoring the algorithm at a higher level than trying to micro-optimize such a small operation.

Your advice is overspecified. If you want to make something faster, then build a benchmark that measures the time you care about and iterate on it. If "micro optimizations" make it faster, then there's nothing wrong with that. I once doubled the throughput of a regex implementation by eliminating a single pointer indirection in the inner loop. It doesn't get any more micro then that, but consumers are no doubt happier with the increased throughput. In general, I find most of your hand waving about performance curious. You seem keen on making a strong assertion about performance, but the standard currency for this sort of thing is benchmarks.

I did all of this with byteorder when I built it years ago. I'll do it again for you.

    $ curl -sOL https://gist.github.com/anonymous/042d05e1e480b89434a673b30534efd8/raw/d2c9a4516a57c26da23c8beaffd5ad583da0a889/Cargo.toml
    $ curl -sOL https://gist.github.com/anonymous/042d05e1e480b89434a673b30534efd8/raw/d2c9a4516a57c26da23c8beaffd5ad583da0a889/lib.rs
    $ RUSTFLAGS="--emit asm" cargo bench
    test bit_shifting ... bench:   1,999,496 ns/iter (+/- 53,427)
    test type_punning ... bench:     476,105 ns/iter (+/- 11,920)

(The `RUSTFLAGS="--emit asm"` dumps the generated asm to target/release/deps.)

The benchmark reads 1,000,000 64 bit integers from a buffer in memory and sums them.

Analyzing the hotspots of each benchmark using `perf` is instructive. For type_punning:

    $ perf record target/release/deps/benchbytes-a1cc37a72d289957 --bench type_punning
    $ perf report

The corresponding asm is:

    cmpq	$7, %rsi
    jbe	.LBB4_10
    movq	(%rbx), %rcx
    addq	(%rcx,%rax), %rdi
    addq	$8, %rax
    addq	$-8, %rsi
    cmpq	%rax, %rdx
    ja	.LBB4_6

Notice how tight this loop is. In particular, we're dealing with a single simple load to read our u64. Now let's repeat the process for bit shifting:

    $ perf record target/release/deps/benchbytes-a1cc37a72d289957 --bench bit_shifting
    $ perf report

The hotspot's corresponding asm is:

    .LBB5_6:
    	cmpq	$7, %rsi
    	jbe	.LBB5_10
    	movzbl	(%rdx,%rbx), %ecx
    	movzbl	1(%rdx,%rbx), %eax
    	shlq	$8, %rax
    	orq	%rcx, %rax
    	movzbl	2(%rdx,%rbx), %ecx
    	shlq	$16, %rcx
    	orq	%rax, %rcx
    	movzbl	3(%rdx,%rbx), %eax
    	shlq	$24, %rax
    	orq	%rcx, %rax
    	movzbl	4(%rdx,%rbx), %ecx
    	shlq	$32, %rcx
    	orq	%rax, %rcx
    	movzbl	5(%rdx,%rbx), %eax
    	shlq	$40, %rax
    	orq	%rcx, %rax
    	movzbl	6(%rdx,%rbx), %ecx
    	shlq	$48, %rcx
    	movzbl	7(%rdx,%rbx), %edi
    	shlq	$54, %rdi
    	orq	%rcx, %rdi
    	orq	%rax, %rdi
    	addq	%rdi, %r12
    	addq	$8, %rbx
    	addq	$-8, %rsi
    	cmpq	%rbx, %r11
    	ja	.LBB5_6

It's no surprise that the type punning approach is faster here. (N.B. Compiling with `RUSTFLAGS="-C target-cpu=native"` seems to permit some auto-vectorization to happen, but I don't observe any noticeable improvement to the benchmark times for bit_shifting. In fact, it seems to get a touch slower.)

I could be reasonably accused of micro-optimizing here, but I do feel like reading 1,000,000 integers from a buffer is a pretty generalizable use case, and the performance difference here in particular is especially dramatic. Finding a real world problem that this helps is left as an exercise to the reader. (I've exceeded my time budget for a single HN comment.)

> It's beyond dispute that the gains from SeaHash primarily come from how it refactored its inner loop to operate on a 64-bit word instead of 8 8-bit words.

Do you feel anyone has contested this point? I note your use of the word "primarily." If type punning gives a 10% boost to something that is already fast, do you care? If not, do you think other people might care? If they do, then what exactly is your point again?

Note that I am responding to your criticism of byteorder in particular. I don't really know whether the OP's optimization of reading little-endian integers is actually worth while or not. I would hazard a guess, but would suspend certainty until I saw a benchmark. (And even then, it is so incredibly easy to misunderstand a benchmark.)

wahern · on Nov 29, 2016

  Notice how tight this loop is. In particular, we're dealing with a single simple load to read our u64.

Notice that you're reading the data into a statically allocated buffer, and doing it in such a way that it's trivial for the compiler to prove alignment. This is a classic case where the benchmark is irrelevant for a general purpose implementation.

Try running the code so that the buffer is dynamically allocated, and so that the first access is unaligned.

Now, I'm not saying that type-punning can't be faster, but to do it properly from a general-purpose library it should be done correctly so that every case is as fast as possible.

Assuming I'm correct and that the modified benchmark sees substantially different results, reimplement byteorder such that it produces the same tight loop even when the data isn't aligned.

I don't think it can be done without modifying the byteorder interface to expose something more iterator-like, because it needs to maintain state across invocations for doing the initial unaligned parse followed by the aligned parse.

If you can get it done in a reasonable amount of time[1], look at the difference between type-punning and byte-loading. I'll bet that relative difference will be much smaller than the difference between the unaligned performance before you refactored the interface, and the unaligned performance after refactoring the interface. In that case my point would stand--the most important part is refactoring code at a higher-level; gains quickly diminish thereafter.

If my argument is over-specified, that's because it's meant as a rule of thumb. Specifying a rule of thumb but then qualifying it with "unless" is counter-productive. For inexperienced engineers "unless" is an excuse to avoid the the rule; for experienced engineers "unless" is implied.

Note that I'm no stranger to optimizing regular expressions. I wrote a library to transform PCREs (specifically, a union of thousands of them, many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input) into Ragel+C code and got a >10x improvement over PCRE. After that improvement micro-optimizations were the last thing on our minds. (RE2 couldn't even come close to competing; and unlike re2c, the Ragel-based solution would compile on the order of minutes, not lifetimes.)

We eventually got to >50x improvement by doubling-down on the strategy and paying someone to modify Ragel internally to improve the quality of the transformations.

[1] Doubtful as I bet it's non-trivial and you have much better things to do with your time. But I would very much like to see just benchmarks numbers after making the initial changes--dynamic allocation and unaligned access. I don't have a Rust dev environment. I'll try to do this myself later this week if I can. However, given that I've never written any Rust code whatsoever it'd be helpful if somebody copy+pasted the code to dynamically allocate the buffer. I can probably figure the rest out from there.

glangdale · on Nov 29, 2016

Hi, author of Hyperscan (https://github.com/01org/hyperscan) here.

I strongly suspect we don't support enough of this:

> many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input

... to really support your use case. But we're interested in the workload, especially as we're looking at extensions to handle more of the zero-width assertion cases. We'll never be able to handle some of them in streaming mode (they break our semantics and the assumption that stream state is a fixed size for a given set of regular expressions).

Can you share anything about what you're doing with zero-width assertions?

burntsushi · on Nov 29, 2016

> Notice that you're reading the data into a statically allocated buffer

It is not statically allocated. The data is on the heap. The give-away is that the data is in a `Vec`, which is always on the heap.

> and so that the first access is unaligned

I modified both benchmarks in this fashion:

    let mut sum: u64 = 0;
    let mut i = 1;
    while i + 8 <= data.len() {
        sum += LE::read_u64(&data[i..]);
        i += size_of::<u64>();
    }
    sum

The results indicate that both benchmarks slow down. The gap is narrowed somewhat, but the absolute difference is still around 4x (as it was before):

    test bit_shifting ... bench:   2,293,921 ns/iter (+/- 65,243)                                                                                                                                                        
    test type_punning ... bench:     659,350 ns/iter (+/- 15,550)

The loop is not so tight any more:

    .LBB4_6:
    	leaq	-8(%rcx), %rdi
    	cmpq	%rdi, %rsi
    	jb	.LBB4_11
    	cmpq	$7, %rax
    	jbe	.LBB4_12
    	movq	(%rbx), %rdi
    	addq	-8(%rdi,%rcx), %rdx
    	addq	$8, %rcx
    	addq	$-8, %rax
    	cmpq	%rsi, %rcx
    	jbe	.LBB4_6

> Now, I'm not saying that type-punning can't be faster, but to do it properly from a general-purpose library it should be done correctly so that every case is as fast as possible.

You haven't actually told me what is improper with byteorder. I think that I've demonstrated that type punning is faster than bit-shifts on x86.

You have mentioned other workloads where the bit-shifts may parallelize better. I don't have any data to support or contradict that claim, but if it were true, then I'd expect to see a benchmark. In that case, perhaps there would be good justification for either modifying byteorder or jettisoning it for that particular use case. With that said, the data seems to indicate the the current implementation of byteorder is better than using bit-shifts, at least on x86. If I switched byteorder to bit-shifts and things got slower, I have no doubt that I'd hear from folks whose performance at a higher level was impacted negatively.

> Note that I'm not stranger to optimizing regular expressions. I wrote a library to transform PCREs (specifically, a union of thousands of them, many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input) into Ragel+C code and got a >10x improvement over PCRE. After that improvement micro-optimizations were the last thing on our minds. We eventually got to >50x improvement by doubling-down on that strategy and modifying Ragel internally. Much like micro-optimizations RE2 couldn't even come close to competing; and unlike re2c, the Ragel-based solution would compile on the order of minutes, not lifetimes.

My regex example doesn't have anything to do with regexes really. I'm simply pointing out that a micro-optimization can have a large impact, and is therefore probably worth doing. This is in stark contrast to some of your previous comments, which I found particularly strongly worded ("irrational" "premature" "bad" "incorrect"). For example:

> It's all sort of ironic, which I suppose was the point upthread--this is an example of the irrational urge for premature optimization and of bad programming idioms being hauled into Rust land completely unhindered by Rust's type safety features. And the better, correct, and likely more performant way of accomplishing this task could have been done just as safely from C as it could from Rust.

Note that I am not making the argument that one shouldn't do problem-driven optimizations. But if I'm going to maintain general purpose libraries for regexes or integer conversion, then I must work within a limited set of constraints.

(OT: Neither PCRE nor RE2 (nor Rust's regex engine) are built to handle thousands of patterns. You might consider investigating the Hyperscan project, which specializes in that particular use case (but uses finite automata, so you may miss some things from PCRE): https://github.com/01org/hyperscan)

dbaupp · on Nov 29, 2016

Compilers understand memcpy, especially in the context of type punning (historically being the recommended standards-compliant way to do it) where one has small constant sizes. The copy_nonoverlapping "function" is actually a compiler intrinsic, but even if it wasn't, compilers like LLVM recognises calls to "memcpy" and even loops that reimplement memcpy and canonicalise them all to the same internal representation.

Animats · on Nov 28, 2016

There's no way to solve that problem without just forbidding unsafe code entirely.

That's not at all clear. It's worth looking at unsafe code and asking "why was this necessary"? What couldn't you do within the language? As patterns reoccur, it may become clear what new safe primitives are needed.

Manishearth · on Nov 28, 2016

> As patterns reoccur, it may become clear what new safe primitives are needed.

In these cases you have a choice between inventing a safe language primitive or inventing a safe library primitive. This exists in most cases for seemingly-safe operations, like the byteorder crate in this case.

If it were a language primitive it would be just as trustworthy as the corresponding verified library primitive.

pcwalton · on Nov 28, 2016

Why is it better to add safe primitives directly to the compiler rather than implementing them in libraries?

Animats · on Nov 29, 2016

The compiler can look at more data to decide if something is valid, or can be optimized.

C++ is trying to add move semantics via templates, but can't get all the way to Rust's borrow checker that way.

pcwalton · on Nov 29, 2016

OK, but we're not talking about the borrow checker, we're talking about byteorder. There's nothing about the byteorder crate that would benefit from being added to the compiler.

In fact it would make it less safe, since we'd be debugging code that emits LLVM IR instead of writing in an actual language.

Manishearth · on Nov 29, 2016

Right, but in this case it doesn't need to. I have yet to see an example of an operation that:

- should be safe in Rust but isn't

- needs /compiler/ support to work well (can't be done cleanly as a library)

- isn't already on the track for implementation (non-lexical lifetimes, SEME regions)

You did mention uninitialized arrays but uninitialized data is inherently unsafe. It's not an operation that can be made safe. Instead, you make it safe by encoding the invariants specific to your use case in your code and creating a safe wrapper -- these invariants differ by use case, so it can't be made a generic operation.

Animats · on Nov 29, 2016

You did mention uninitialized arrays but uninitialized data is inherently unsafe. It's not an operation that can be made safe.

Sure it can. You just need primitives which can be used in asserts such as

    is_initialized(tab,i,j)

indicating that an array is initialized within those limits. Then you can write asserts such as

    assert(is_initialized(tab,i,j-1));
    ... initialize tab[j]
    assert(is_initialized(tab,i,j));

Standard program verification technology. Verification of unsafe sections is a useful goal, and deserves language support. Hand-waving about "encoding the invariants specific to your use case in your code" is insufficient. You need to write them down and prove them. Then you can eliminate them from the run-time code.

pcwalton · on Nov 29, 2016

You want us to add dependent types to Rust (which is what you just proposed)? Half the time I see you complaining about Rust you're complaining that it's too complicated!

Manishearth · on Nov 29, 2016

> You just need primitives which can be used in asserts such as

Sounds like you're going along the path of a dependent type system (in this specific case)? Yes, that could be done, and would perhaps let you reduce a couple of unsafe blocks in the implementation of Vec and other buffer-based abstractions (but not get rid of all of them).

FWIW there is active work going on for formal verification of Rust (both safe and unsafe code), in the RustBelt project.

In general making unsafe blocks run formal verification would be an interesting thing to do (and would solve this problem completely). I don't think it deserves language support, however (nice-to-have, not must-have). This is a very different goal from your original point of adding a few language features that ease writing lower level abstractions.

--------

Ultimately, you're right. While pcwalton did mention "There's no way to solve that problem without just forbidding unsafe code entirely."; this is a possible alternative -- have language support for scoped formal verification that allows you to use "unsafe" operations safely. I think this is an extreme solution to what I consider to be a mostly nonexistent problem.

For really security sensitive code this would indeed be very useful (and is probably a big motivator behind the RustBelt project). Or just use SPARK or something.

But for most Rust users I think the current system is pretty robust and provides enough primitives to write clean, easy-to-verify abstractions with (verifiable) safe API boundaries. (when I say "verify" here I mean it in the informal sense). I haven't come across unsafe code doing contortions, and I have had the (mis?)fortune of going through a lot of unsafe code. The only rough edges are with FFI, and these are mostly due to a lot of things about unsafe code being underspecified (which don't crop up as often in pure rust unsafe code, but do crop up when you through some C/++ FFI in the mix). There is active work on specifying the exact semantics of unsafe code however, so that should be fixable once it happens.

nickpsecurity · on Nov 29, 2016

Don't forget there's a middle ground between not having them and manual, formal verification. It started with Eiffel with basic contracts that checked properties during testing and/or runtime. That did well in commercial deployments. SPARK took it formal with a basic, boolean encoding for programmer understanding. It uses a subset of Ada to prove absence of all kinds of error conditions without runtime checks or manual proof. It can optionally do more with a theorem prover but optional. Eschersoft did Perfect Developer to do it in high-level language with C, Java, or Ada generation. So, at least three are doing it in products with industry deployments with two highly concerned about performance in low-level applications.

Although I'm not getting into this dispute, I will add in general that Rust might benefit from such contracts or push-button verification of key properties as deployed successfully in Eiffel, SPARK, Ada 2012, and Perfect Developer. A language subset might be used like in SPARK to allow automated verification of those sections against common types of errors. Three follow-up benefits will be easier changes/integrations in maintenance phase, automated test generation from specs, and aiding dynamic analysis by giving it invariants to look at. Could be optimization benefits but I'm not qualified to say on that. Intuitively seems possible like using minimum-sized, data structure for a number range in spec or type. Stuff like that.

These techniques are really under-utilized despite being proven out many times over in high-reliability products.

Manishearth · on Nov 29, 2016

Yeah, this exists, and would be interesting. I again think that it's a bit too extreme a solution to be baked into Rust itself, but I'd love a SPARKish Rust variant.

nickpsecurity · on Nov 29, 2016

I'd like both. SPARK's stuff was ported to Ada 2012. It can be done for Rust as well. The trick is to make it optional so people don't have to pay attention to it. Maybe even have editors filter it out for people not paying attention to it. At the least, it being used in standard library and OS API's would let it enforce correct usage of those in debug/testing mode. 80/20 rule says that should have some impact given how much damage we've seen C and Java developers do misusing the foundational stuff.

Manishearth · on Nov 29, 2016

Yeah, me too. However, I think we should wait for the formal verification of Rust to be completed before trying this. While it is possible to make something SPARKish without complete formal verification, it's probably better to build it using concepts learned during the formal verification.

Manishearth · on Nov 28, 2016

> We're probably going to see "unsafe" code that assumes good behavior on the part of the caller.

I have yet to see any of this.

I have noticed that it's harder to write correct unsafe code when it comes to parallelism and FFI, but parallelism has always been a hard problem and the FFI problems generally come from the fact that you need to know the invariants being upheld on the other end, which is trickier.

But for this kind of unsafe code -- designing (non-parallel) abstractions -- upholding invariants is pretty straightforward.

Gankro · on Nov 29, 2016

> I have yet to see any of this.

mem::forget-pocalypse was this. (Rc/Arc, Vec::drain, thread::scoped)

Any UB bug that results from an overflow is kind've implicitly this.

BTreeMap::range still has an UB bug from trusting the caller! I literally asked you to fix it! https://github.com/rust-lang/rust/issues/33197

Bugs happen man.

Manishearth · on Nov 29, 2016

> mem::forget-pocalypse was this.

I would say that this is from a time when the invariants were not understood. In particular, the fact that leaking is safe to do in safe code was not known.

(The invariants are still not completely understood, but there's work to specify that, and IMO they're understood enough to be able to avoid unsafe bugs)

> BTreeMap::range still has an UB bug from trusting the caller!

Fair :) I'd completely forgotten about that one.

Gankro · on Nov 29, 2016

> I would say that this is from a time when the invariants were not understood.

Yeah, but it's not like "oh this is an obvious thing to consider trusting the caller about". It's an exceptionally niche problem that you'd only know about if someone told you about it. Especially since a Rust programmer shouldn't be expected to write unsafe code often, if ever!

Similarly: not trusting traits to be implemented correctly. Not trusting closures to not-unwind.

Manishearth · on Nov 29, 2016

Fair. I'm not saying that your average Rust programmer will be able to deal with unsafe code immediately. But I do think that at this stage the list of things you can and cannot rely on (and the invariants you must uphold) is clear enough that in theory you could make a checklist to deal with this. The nomicon provides much of the background for folks wanting to figure this out and write unsafe code.

These days I've been writing a lot of unsafe code (for FFI) and I do want to get around to penning a concise guide (or just expanding the nomicon). But I'm mostly waiting for the unsafe code subteam to figure out a couple things before doing this (specifically, the exact boundaries of rust's noalias UB becomes important in FFI and this is not specified yet).

But yeah, it's not necessarily obvious. I'd like to make it easier to get this understanding of unsafe code though.

burntsushi · on Nov 29, 2016

One of the barriers I've erected in my own head is whether my unsafe code is generic or not. As soon as your unsafe code starts depending on an arbitrary T (or some trait that T satisfies), then the scope of what you need to consider seems to widen quite a bit. I tend to either avoid this type of unsafe or find a way to constrain my problem. Using `unsafe` for pointer tricks or efficient movement of memory on concrete data types feels more self-contained to me, and therefore easier to verify.

(I don't have any particular point to make btw. Just sharing thoughts.)

Manishearth · on Nov 29, 2016

Yeah, this is super important. I thought about it and I too tend to think very hard about generics here.

As with all such things, it's harder to enumerate what's in your head :)

Gankro · on Nov 29, 2016

Hmm, a quick checklist-style thing is a pretty good idea!