You can do linear lists and trees in Rust. The problem is backlinks. If you refcount everything, you can have backlinks, but otherwise there's a safety problem. Backlinks require an invariant which covers two variables, and you can't express that in Rust.
Back pointers are a special sort of pointer from an ownership perspective. They don't carry ownership, but are locked in an invariant relationship with the pointer that does. If you could express that in Rust, it would be compile-time checkable. Rust needs a way to say "field Q of struct T2 is a backpointer to field P of struct T1".
During pointer manipulation, that rule will momentarily be broken. But it's still checkable. It just needs some analysis. First, the checker would have to identify a block of code in which a backpointer was being manipulated, and locate the corresponding manipulation of the forward pointer. This is the code block of interest. It's an atomic transaction, in the database sense. Maybe the user would have to identify the code block with something like "atomic", with syntax like "unsafe".
Then, the checker would have to establish that if the invariant held at entry to the code block, it will hold at exit from the code block. This is a simple application of program verification technology. If the atomic section is small, this is not difficult.
Typical uses would be updating doubly-linked lists, rebalancing trees, and pulling part of a DOM-like tree out and moving it somewhere else.
This is one class of unsafe code - transient. It's the easiest to check, because it's a local problem. At the end of the code block, a safe state has been reestablished.
The second class of unsafe code involves partially valid data structures. This problem lies underneath "Vec", where the underlying block of storage has preallocated, uninitialized space for later growth. This is harder, because unsafe state persists outside unsafe code sections. To deal with this, it's necessary to somehow attach invariants to the data structure. If you have an array A which is valid from elements 0 to n, you need something like "valid_array(A,0,n)". Then,
when you initialize an element n+1, you change the validity limits. The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this. This is old and completely automatable technology; we did it in the Pascal-F verifier 35 years ago, and Dafny does it today.
Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust. With the support described above, most of the unsafe code in pure Rust could be eliminated.
Outside of those two classes, most unsafe code in Rust involves external interfaces to other languages.
Unsafe code which doesn't fit into any of those classes needs to be looked at very hard.
> The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this.
This is basically dependent types. Rust doesn't really want to go that far from a language level.
> Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust.
Not really. All the other datastructures in the stdlib would still need unsafe. Not really a good metric.
The Rustbelt project at MIP-SWS is doing formal verification of Rust including unsafe code, and that approach will be far better IMO. It won't let you eliminate unsafe from your code, but might make it easier to prove that your unsafe code is performing safe operations.
How would you implement the robin hood Hash{Set,Map} with this? Or BTreeMap?
Maybe I misunderstood your proposal, at first it seemed like dependent types (which makes vectors awesome but really doesn't address other things), but your Pascal thing makes it sound more like complete formal verification, which is what Rustbelt is doing. I don't expect that to become part of the language (maybe? would be cool), but it would help reason about unsafe code.
Rustbelt is a research project, and they're using Coq, which is interactive and requires a lot of user effort. What I'm talking about is doing it within the language and automatically.
Looking at BTreeMap, the extensive use of unwrap_unchecked seems to be unnecessary. That looks like premature optimization - omitting the check saves maybe two instruction on the main path.
The other unsafe code there seems to mostly involve backpointers, which was covered above. "Drop" apparently is unsafe only for backpointer reasons, although the comments don't make this clear.
To me, Rust is very much about doing "premature optimization", in the sense that small overheads add up and should be avoided where possible.
BTW, I do think the language ought to have an answer to backpointers. In fact, there is an interesting crate named 'rental' that tries to make a subset of cases safe; but to do it properly requires more concepts in the type system, especially immovable types.
It's tricky, though. The problem with backpointers is that they destroy uniqueness. You can't have 'node: &mut Node' if Node is the target of backpointers, because you could do 'let node2 = &node.left.parent'; you could then take a reference to some field of node2 and mutate it via the original, leading to use-after-free, etc. But without uniqueness, how do you mutate? One option is to use `Cell`, a.k.a. allowing mutation via shared references - but then if you have a reference to a node, you can't take a reference to its child without worrying it could be freed from under you. So you need reference counting. Another option is to use `RefCell`, dynamically tracking whether a value is borrowed, but that has its own overhead (especially if the code is multithreaded, then you need a mutex!) and generally makes it pretty easy to write code that crashes (safely, but still a crash from a user's perspective).
I'm sure there are constructs that can solve this problem in many instances, but as far as I know, it's hard to make general without a very complex proof system.
I think you can solve the backpointer problem in a relatively straightforward way if Rust had true linear types. Possibly not as simple as it seems, though.
I'm talking about simple type constraints. Think Eiffel, not F* - entry and exit conditions for sections where there is temporary unsafety, and constraints on types for persistent unsafety. Universal quantifiers only, and decidable theories only. Dafny's level of prover.
You can implement a hash map safely on top of an array; indeed you could do it with `Box<[T]>` today. You would just miss the optimization where missing buckets can be represented by zero in the "hash value" field rather than an explicit Option.
> Doesn't Option() on a Rust pointer optimize down to a zero pointer value anyway?
There aren't pointers here though.
But yes, you could use NonZero<T> to mark the hash value as never-zero so that the option gets optimized. You'd still need unsafe code to construct the nonzero, but that's okay.
At least for the Hashmap I don't think the code is premature optimization; it seems to be pretty "idiomatic" unsafe Rust, with unsafe used exactly where needed. I haven't really looked at the btreemap in a while.
It's a bit more complicated than marking the hash value as NonZero because there are currently two parallel arrays: one for hash values, one for (key, value) pairs. So there's nothing in the second array to indicate that an element is missing. (It used to be three arrays, with the keys and values separate, but that "optimization" was actually a terrible layout for the cache. I'd like to see a further improvement where key types that can be hashed trivially, e.g. integers, ditch the hash array altogether.)
I'm fine with the use of `unsafe` here. A hash table can be implemented using safe Rust, but you can squeeze a few percent more performance out of it by using a small amount of unsafe code, in a way that would be very difficult to make safe, without a super fancy dependent type system like ATS plus a ton of annotation overhead. The unsafe code can be vetted relatively thoroughly and is safe from the outside. That kind of practical compromise seems very much in the spirit of Rust...
In my programming experience, there are two kinds of code:
1. Code where my objects form a tree. Rust's ownership model is great for this. 95% of my code looks this way naturally, and maybe another 3% can be rewritten to look like this.
2. Code where my objects form a complex graph. At this point, I need to make a choice between manual pointer management (C++, unsafe Rust) and a garbage collector (lots of languages). Happily, Rust does have regular pointers and 'unsafe'.
If most your code looks like (1), and only a small amount looks like (2), then Rust can be a big win. Personally. I really like the combination of low-level control, performance and safety.
But if I encounter a problem with a lot of (2), my first instinct is to reach for crates.io and look up an appropriate library. I do know how to use pointers in Rust and work with 'unsafe', but it's easier to let somebody else do it for me.
Also, if my graph is going to be short-lived, I've had success just allocing up an Arena<T>, making as many cycles as I want, and then collecting the whole thing at once after my computation is done.
A common approach to 2) is the database approach ("data-oriented design") where you basically have tables and replace pointers by offsets into these tables.
That might not work if the situation is very uncontrolled and objects live and die very quickly. But in most cases you can just let die a few objects, and every once in a while do "garbage collection" manually by renumbering the still-alive objects to be consecutively indexed.
Usually the result is very clean, performant and modular code.
It's clean and modular for all the reasons that E.F. Codd preached all his life.
It's performant because the tables approach is not micro-managing allocations - each table is only one allocation.
You will be hard pressed to detect a difference of (single array + relative index) to raw pointers (= absolute index). There's even machine level support for relative addressing.
You also write most of your code to operate on slices (tables or contiguous subsets of tables) instead of only one row per function call. Mike Acton rightfully says "where there's one, there's many". This approach is obviously great for performance because it avoids function-call overhead and because it's cache-friendly.
By the way, what's Rust's story to avoid referencing dead items in these tables?
I've thought about this issue quite a bit because I had a use case (using scala) with lots of graphical data structures and GC mark phases were totally killing my performance, and I thought about using rust until I found out my graph shenanigans wouldn't work there either.
One of those hairbrained research-y ideas I have bouncing around in the back of my head is to fix this. The thing about graphical structures is that there is a huge body of work in graph theory that could be used to provably and deterministically cover >90% (ballpark) of these data structure use cases, but likely at the cost of compiler performance. And if I were even remotely competent with rust macros, I would totally hook up some annotation macro that would call out to Z3/CVC4 to determine satisfiability and then rewrite the code safely (even if it uses unsafe blocks under the covers) or throw an error. But thats for another time :)
The off-the-shelf standard library solutions to #2 are Rc<RefCell<T>> (single threaded) and Arc<Mutex<T>> (multi-threaded). Those aren't super high performance, and they're a little verbose, but they usually satisfy the borrow checker.
> Rustaceans usually opt for continuous data layout (known in C/C++ lingua as array-of-struct or struct-of-array depending on priorities), which is more cache-friendly than reference-heavy data structures anyway.
Yes! This is a point that I try to hammer home that's missed by many people who write C/C++ on a daily basis. Everyone wants to use the fanciest data structures when most of the times arrays will be faster and simpler to use.
C++ programmers usually prefer contiguous data layout as well. I mean std::vector is just a contiguous array that dynamically reallocates and copies/move-constructs everything as needed.
But many interesting data structures are hard to write in a memory-efficient manner without resorting to non-contiguous nodes. Even a hashtable often will use linked lists within each bucket for collision resolution. You can argue that an open-addressing scheme is more cache-friendly, but it also has downsides, i.e. performance degrades faster as the load factor gets higher.
Many other interesting data structures, especially some lock-free structures, are simply impractical to implement as single contiguous arrays.
Of course, any node-based structure can make use of a memory pool that allocates blocks from one or more larger contiguous buffers, but there will still be pointers interleaved throughout the structure.
All in all, it remains true that Rust doesn't really provide any safety above C++ in regard to writing these kinds of node-based data structures, and saying "don't ever write node-based data structures" is just pointless. Yes, Rust has some downsides and tradeoffs. Is it so bad to just say that out loud?
> All in all, it remains true that Rust doesn't really provide any safety above C++ in regard to writing these kinds of node-based data structures,
Yup! Rust does not fix all your problems, sadly. Maybe some of them. :-)
If you're working with graph-like structures in Rust, then you have two choices:
1. You can work with pointers using 'unsafe' blocks, and then wrap everything inside a nice, safe API. This is a good tradeoff if your data structure is only a tiny portion of your code, or if it's reusable. (Most of the data structures in Rust's 'std' crate are written like this.)
2. You can work with indices instead of pointers. This is the approach taken by petgraph (https://docs.rs/petgraph/0.4.3/petgraph/), which is currently one of the best graph libraries for Rust.
Not all Rust code needs to be safe. It's OK to write "unsafe" code and put it behind a "safe" API. If I can eliminate 98% of the opportunities for pointer bugs, I'm happy to eyeball the remaining 2% manually.
> All in all, it remains true that Rust doesn't really provide any safety above C++ in regard to writing these kinds of node-based data structures
Debatable. It certainly does for the consumers of these structures, and data structures are consumed a lot more often than they are rewritten.
> Yes, Rust has some downsides and tradeoffs. Is it so bad to just say that out loud?
Of course not. But that doesn't mean you shouldn't expect some pushback if you're incorrectly describing those tradeoffs.
Is it really so bad to say "at the end of the day, the difficulty of complex graphs in safe Rust is symptomatic not of a flaw in Rust but of how hard it is to build these things without memory errors even in C or C++" out loud?
How often do you really need those interesting structures(and the memory fragmentation that comes with them)? I've seen countless times where a developer reached for std::hash_map/linked_list when there will never be more than 10 values in their dataset. In that case an array would be at least as fast and much easier on your data layout.
Also if you're trying to implement lockfree data structures then safe/unsafe pointer access are going to be the least of your worries :).
When you need them you really need them. A Patricia Trie for example, which includes back pointers to ancestor nodes, is simply an ideal structure for prefix search.
How about a rephrasing: how many times do you really need to write these?
Rust will make it tricky to implement these, but then you can use the datastructure safely as much as you want. It's a "write once use everywhere" thing.
> Everyone wants to use the fanciest data structures when most of the times arrays will be faster and simpler to use.
I'm not sure I comprehend this, since I don't often encounter code where people use advanced data structures to do simple things. For most collections, you're typically talking about an array of some kind, since you just want to iterate through many objects doing the same thing. However, there is a reason for more complex data structures to exist. KDTrees, for example, are great for performing nearest-neighbour search, and I can't see any situation where I'd want to use a straight array if I knew I was doing nearest-neighbour searches.
In the most broad, general case I agree: array-of-struct or struct-of-arrays is probably the way to go. You can even see this in the data science world where data frames (effectively struct-of-arrays) are the tool of the trade. However, there are algorithms where if you want an advanced data structure, you really want it (e.g. KDTree). Is it really so common in systems programming to see people trying to incorporate all sorts of magic from The Art of Computer Programming into their programs? I can't say I've ever seen it myself (obvious disclaimer that while I'm a programmer by trade I don't read too many large Rust / C++ projects).
Isn't this a bit of a weird formulation? What matters isn't actually layout... what matters is access patterns. If you lay out a 32G array and access it randomly you're not much better of that just doing loads of pointer chasing. (Alright, prefetching might help a little bit, but...)
I think this misses the problem.
Should you write your own data-structures?
Not unless absolutely necessary.
But mostly everyone knows that.
So why are people complaining about data-structures?
For me, writing the data-structures is the canary in the mine.
It's the next step from hello world when trying to pick up a new language.
Most importantly, trying to write a few simple data-structures hints at the difficulty that will be encountered writing a 'real' program.
For people like me, if writing trivial data-structures in Rust is a great challenge, it shows that writing other, less trivial things in Rust will be much more challenging than might be preferred.
Hand waving the comment "writing data-structures in Rust is hard" with "well you shouldn't be doing that anyway" misses, I think, the point that Rust (while very neat) is generally a difficult language to pick-up.
[Edit]
Again, the issue isn't data-structures or about them.
The issue is: Can I write something in scratch in this language 1) at all and 2) with some amount of daily progress.
Writing data-structures is a test case of my ability to thing and program in the context of Rust. It serves to answer the question: Can I write a program that implements a well defined, well understood construct so that I may learn the building blocks of rust?
This leads up to an application, the behavior and design of which may not be extremely well defined in the context of Rust and requires more thinking when working out the data-structure in rust.
If the answer to "can I write a data-structure in Rust" is "Ya totally got this makes sense" then writing an application will be relatively easy.
However if the answer is "Wow I did it but there were a lot of pain points and I still have no idea if I've done it in the right or canonical way" then writing an application is going to be very difficult.
> For people like me, if writing trivial data-structures in Rust is a great challenge, it shows that writing other, less trivial things in Rust will be much more challenging than might be preferred.
I understand the source of the sentiment, but also disagree with it. I've been writing Rust code for years and have never had occasion to write my own custom data structure, and (discounting FFI) I've reached for the `unsafe` keyword like twice (which I think I ended up removing anyway, because I was trying to be too clever).
But then again, my background is in higher-level languages (Java, Python) where writing data structures isn't your usual beginner task. I sympathize with C programmers to whom a linked list is the go-to toy learning program, but the urge to resort to unsafe shenanigans to implement self-referential data structures doesn't reflect the typical experience of using the language.
>if writing trivial data-structures in Rust is a great challenge, it shows that writing other, less trivial things in Rust will be much more challenging than might be preferred.
Data structures are exactly the abstractions built on top of raw memory. Rust is not really geared toward working with raw memory; it's geared toward working with abstractions that hide the raw memory (i.e., data structures). That's why writing data structures in Rust is hard, and it's also why that fact doesn't imply that "other things" (i.e., code that is not part of a data structure implementation) are hard.
In other words, the fundamental flaw with the assumption that there is a relationship between the difficulty of implementing data structures in Rust and the difficulty of writing applications in Rust is that the two involve very different programming models, and Rust has much more ergonomic support for one than the other.
> For me, writing the data-structures is the canary in the mine. It's the next step from hello world when trying to pick up a new language. Most importantly, trying to write a few simple data-structures hints at the difficulty that will be encountered writing a 'real' program.
You're missing the point of the post here, and drawing the exact correlation that the post is warning against.
Just because something is easy in C++ does not mean that it must be easy in other languages. In fact, writing good data structures in C++ is hard too (try writing an actually safe std::vector clone, it gets tricky), just that C++ will happily let you have an unsafe API; Rust will complain.
This is the old "grade the intelligence of a fish by its ability to climb a tree" problem. There is no intrinsic reason why the difficulty of writing datastructures should correlate with the difficulty of writing actual code. It is an easy task in C/C++, and thus it is often an early part of the learning experience.
> For people like me, if writing trivial data-structures in Rust is a great challenge, it shows that writing other, less trivial things in Rust will be much more challenging than might be preferred.
And this is the real issue, because your assumption doesn't really have any real basis. I don't mean to belittle your point, I can completely understand where you're coming from, but assuming that because writing data structures in Rust is hard, that "real" problems will be difficult as well, is wrong. There's nothing to back this up. Data structures are a very specific domain, often very far away from what you'll be doing in the average "real" program. There's a reason common data structures are often implemented and included in the stdlib of most languages.
What I find is that it's more common that writing data structures in other languages is easy, because most of the time, people aren't implementing them correctly, or safely. 99% of the linked lists I see in C++ fail to even implement the copy constructor, meaning they're going to be broken the moment you do a copy of the list.
Again, the issue isn't data-structures or about them.
The issue is: Can I write something in scratch in this language 1) at all and 2) with some amount of daily progress.
Writing data-structures is a test case of my ability to thing and program in the context of Rust. It serves to answer the question: Can I write a program that implements a well defined, well understood construct so that I may learn the building blocks of rust?
This leads up to an application, the behavior and design of which may not be extremely well defined in the context of Rust and requires more thinking when working out the data-structure in rust.
If the answer to "can I write a data-structure in Rust" is "Ya totally got this makes sense" then writing an application will be relatively easy.
However if the answer is "Wow I did it but there were a lot of pain points and I still have no idea if I've done it in the right or canonical way" then writing an application is going to be very difficult.
> If the answer to "can I write a data-structure in Rust" is "Ya totally got this makes sense" then writing an application will be relatively easy.
> However if the answer is "Wow I did it but there were a lot of pain points and I still have no idea if I've done it in the right or canonical way" then writing an application is going to be very difficult.
I feel like one of implicit points of the OP is that this isn't obviously true: writing a data structure is often a very different type of programming to writing a normal application. Most of the code I write isn't like a data structure, and is definitely not like a really good data structure: I can just glue together such code that others have written (or even I personally wrote once, a while ago) without having to worry about the details that it packages up/manages for me. (This is true in both Rust and C++, the latter of which I use day-to-day.)
The issue is: Can I write something in scratch in this language 1) at all and 2) with some amount of daily progress.
Writing data-structures is a test case of my ability to thing and program in the context of Rust.
The point is that it's not the best test case. A better test case would be writing the programs you generally write. It'll still be hard to be honest, but really not quite as hard. I have enough familiarity with Rust to write application code myself, but I'd probably still struggle a lot with writing a data structure if I tried.
I've never interpreted that quote from Linus as being about e.g. choosing between a linked list and a hash table. Instead, I interpret that as saying that your program should focus first and foremost on your data (implying a competent understanding of your problem domain, such that you can model your structs effectively) with functions that serve your data rather than the other way around. IOW, I think he's channeling this quote from Fred Brooks:
"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."
Is it doing fine? There's lots of "cool" stuff, but in my perception it's still a long shot to "fine with real-world problems".
What's a complex real-world program written in Haskell? It seems to be good for trees (compilers like GHC, although it is kind of slow). Is there a non-toy graphics or graph-ical application that is both performant and written in Haskell?
Not denying that it's possible to write C-style in it, of course... only it's not fun.
It's interesting what this issue reveals about Rust culture.
After all, you can write basic (tree-like) data structures without resorting to `unsafe`. It's not that hard: just wrap all your nodes in `Rc`, plus either `RefCell` or `Cell` if you need mutability. Yeah, this adds some overhead, but really not very much. It'll probably still run faster than equivalent code in most safe languages, often using less memory, without GC pauses. Or even if it loses to some language in some tree microbenchmark, in a real application Rust will probably make up for it with better performance elsewhere.
But culturally, the baseline for comparison isn't safe languages. It's C/C++. Rust is supposed to be about zero cost abstractions, so non-zero-cost abstractions are suspect. And so people recommend raw pointers and `unsafe` - which is no less safe than C++, but still leaves newcomers with a bad taste in their mouths.
To some extent this attitude is built into the language itself. `Rc<RefCell<Foo>>` is ugly; needing two nested generics, one with a rather abstruse name, for what in other languages is just `Foo`, i.e. a basic object reference, makes you feel like you're doing something wrong. In the olden days of Rust, `RefCell` was called `Mut`, and there was a type `RcMut<Foo>` which combined `Rc` and `Mut`. Much more appealing to a newbie: `Mut` is what you use to add (inner) mutability, and `RcMut` is the mutable version of `Rc`, for the common case where you need both reference counting and mutability. No need for ugly nesting.
But well before 1.0, `Mut` was renamed and `RcMut` deemed unnecessary, and Rust ended up where it is today. Arguably this is a good thing, as it discourages unnecessary use of both reference counting and inner mutability, each of which comes with a runtime cost. But for newbies, I think it makes Rust look more intimidating than it needs to be.
Okay first of, before I read the rest of the article, I want to complain about the statement:
> While you can hack together a “list” that will be backed by a Vec of nodes with indices to the next / previous item, this approach is quite wasteful – and gains little compared to using the Vec directly.
How is this wasteful or hacky?? This is THE proper way of writing data structures that are not strict trees. Sure, this isn't the correct way to write a list data structure, but there's almost no reason to use a linked list in general.
When most people ask for a linked list data structure, they probably don't mean "a vector of elements and a linked list of indexes into that vector" because it totally defeats the purpose of using a linked list in the first place.
I'm not saying there aren't reasons to use link lists, but often the correct approach is a hybrid one and additionally for cache aware programs a vector of entries with indirection integers is going to be more local
Let's get this out of the way: Rust is great. Rust apologia like this article is not so great.
> As an aside, remember that the only difference to C/c++ is that if you write a “basic linked list” in them, all of your code will be unsafe.
There's a bit of mental gymnastics going on here. The word "unsafe" is performing double duty, since it means "memory safety not guaranteed by the compiler" in Rust and it means something else entirely when you are talking about C++, since memory safety was never guaranteed by the compiler in the first place. The other problem with this statement is that linked lists in C or C++ aren’t really that hard to get right, in fact, they’re easy. Maybe you draw out a diagram on pen and paper before you write the code, but you’re unlikely to be facing segfaults.
I admit I’m biased here, because I’ve been using Haskell for something like 15 years now, but I feel like the Haskell community acknowledges that Haskell’s type system gets in the way and prevents you from doing useful, interesting work, and that even a great library ecosystem isn’t enough to overcome this. That’s how safety generally works. It’s harder to write programs that do useful things, but in exchange, it’s also harder to write programs that behave unpredictably or do dangerous things. Because Rust and Haskell put you in such restrictive type systems, sometimes you have to break out to get real work done.
Haskell’s pitch, in my mind, is, “Let’s make it easy to reason about side effects and value semantics.” From the article, Rust’s pitch could be, “Let’s make it easy to reason about control- and data flow.” These are both evolutionary steps in the development of programming languages, all programming languages being somewhat flawed. Future languages will steal ideas from Rust the same way modern languages have stolen ideas from Haskell.
But apologia still leaves a bad taste in my mouth. The article says, “Is this a problem with Rust? Not at all.” There’s a worrying unwillingness to acknowledge that Rust is flawed, and the article describes Rust users as “Rustaceans” and makes broad generalizations about how they behave. This reminds me of the excesses of 2000s-era object-oriented programming. The comment about “Rust’s facilities for code reuse” could have been taken straight out of a press release for Java back in the late 1990s for all I know.
Rust is great, but this article is further cementing my distaste for the Rust community.
I think he means "unsafe" in the sense of Rust's 'unsafe' keyword, which allows you work with pointers just you normally would in C++. It doesn't mean all C++ is broken! Just that all C++ code has the right to use pointers or invoke undefined behavior without getting smacked down by the compiler.
He clearly states that code which depends on a linked list implementation is "unsafe" for that very reason.
To me that's not convincing argumentation. If code depends on a broken "unsafe" linked list implementation, that's bad no matter what language the code is.
What you say is true under some interpretation of "unsafe", but it doesn't relate with a broken "unsafe" linked list implementation.
Btw. since I'm not a Rust user I might miss something here. And I admit my post was a little short and thus maybe provoking. But just to let everybody know, I got at least 5 downvotes.
Given that answers to my post don't even see that OP implied a causal relationship, that doesn't help with my perception of the "Rust echo chamber on HN" (not my wording).
The point the OP was trying to convey (and the OP could have been clearer) is that Rust will allow you to button up an unsafe core behind a completely safe interface. That is, no use of the safe interface could ever possibly lead to memory unsafety, and this is checked by the compiler.
A core value proposition of Rust is not necessarily that you will never need to utter `unsafe`, but rather, you can build safe abstractions on top of `unsafe`.
While hiding unsafe parts of code behind safe interface works great in Rust,
especially that ownership / borrowing rules expresses much more than is
possible in other mainstream languages (in Java you wouldn't know if you are
the owner of collection returned collection, or maybe it just a view, etc.),
there is one thing that I never seen addressed in this kind of arguments:
Writing unsafe code in Rust is harder than doing it in C/C++, because Rust
introduces a whole new class of undefined behaviours that is just absent from
C/C++. See [0], [1] or [2] in general for examples.
There are a bunch of things that are UB in C and C++ that are well-defined in Rust.
* Signed integer overflow
* Aliased pointers to different types (-fno-strict-alias)
* Probably more, but the formalised memory model is still up in the air. The ones I just listed are just the ones that are already decided...
True, in fact undefined-behaviour-wise I don't think there is any mainstream
language that comes close in this respect to C/C++. Alas, this is well-known
and addressed in various publications.
Converse, things that could be eventually decided to be UB in Rust, but are not
UB in C/C++, is rarely mentioned at all. It is quite interesting what is cost
(in terms of UB behaviour) of making various optimization that are claimed to
be possible in Rust but not yet realized. Reading through some of Niko
proposals it would seems that this cost is quite nontrivial.
That's not the reason why C/C++ code that uses a linked list is "unsafe". It's "unsafe" because it doesn't protect the use. In other words, because the client code is itself "unsafe". If the use was protected (whether dynamically with a lock or statically by some Rust-like language extension, I don't care), there was no more of an issue than in Rust.
And if the linked list is itself broken, Rust can't help you either.
So I still don't see a causal relationship. Is it because I'm drunk?
In the rust world "unsafe" is synonymous with "isn't proven to be safe by the compiler". Under this definition, every C/C++ program that uses pointers is "unsafe" because the languages make no memory safety guarantees.
unique_ptr is unfortunately still unsafe (in the Rust sense): there's nothing in the language stopping use-after-move of the unique_ptr value itself (which is a null-dereference and undefined behaviour), nor references to the interior becoming dangling.
Owned data in Rust (which is all data by default, unlike C++ which must opt-in via unique_ptr) has stronger guarantees than those provided by unique_ptr. It's still possible to deref a unique_ptr after using std::move.
That may be technically true but in practice and rhetoric "unsafe" is often used in contexts with much broader, implicit connotations. For example, putting Rust-compiler-proven-safety as the top, first, or best feature to consider. In that context, saying "C++ isn't Rust-safe" is essentially defining Rust to be "better", not advancing a coherent argument that it is in fact better.
Given that in unsafe block you still have to uphold the same invariants as in
other blocks of code, your job is actually much harder in Rust that it is in
C/C++.
Take a look at examples in [0], they would be perfectly valid in C/C++, but are
potentially considered undefined behaviour in Rust.
Back pointers are a special sort of pointer from an ownership perspective. They don't carry ownership, but are locked in an invariant relationship with the pointer that does. If you could express that in Rust, it would be compile-time checkable. Rust needs a way to say "field Q of struct T2 is a backpointer to field P of struct T1".
During pointer manipulation, that rule will momentarily be broken. But it's still checkable. It just needs some analysis. First, the checker would have to identify a block of code in which a backpointer was being manipulated, and locate the corresponding manipulation of the forward pointer. This is the code block of interest. It's an atomic transaction, in the database sense. Maybe the user would have to identify the code block with something like "atomic", with syntax like "unsafe".
Then, the checker would have to establish that if the invariant held at entry to the code block, it will hold at exit from the code block. This is a simple application of program verification technology. If the atomic section is small, this is not difficult.
Typical uses would be updating doubly-linked lists, rebalancing trees, and pulling part of a DOM-like tree out and moving it somewhere else.
This is one class of unsafe code - transient. It's the easiest to check, because it's a local problem. At the end of the code block, a safe state has been reestablished.
The second class of unsafe code involves partially valid data structures. This problem lies underneath "Vec", where the underlying block of storage has preallocated, uninitialized space for later growth. This is harder, because unsafe state persists outside unsafe code sections. To deal with this, it's necessary to somehow attach invariants to the data structure. If you have an array A which is valid from elements 0 to n, you need something like "valid_array(A,0,n)". Then, when you initialize an element n+1, you change the validity limits. The checker needs a few simple theorems, such as "valid_array(A,0,n) and valid_element(A[n+1)) implies valid_array(A,0,n+1)" to check this. This is old and completely automatable technology; we did it in the Pascal-F verifier 35 years ago, and Dafny does it today.
Handling these two classes of unsafe code takes care of a sizable fraction of the unsafe code really needed in Rust. With the support described above, most of the unsafe code in pure Rust could be eliminated. Outside of those two classes, most unsafe code in Rust involves external interfaces to other languages.
Unsafe code which doesn't fit into any of those classes needs to be looked at very hard.