Hacker News new | past | comments | ask | show | jobs | submit login
Learn Rust with entirely too many linked lists (2019) (rust-unofficial.github.io)
388 points by goranmoomin on Feb 22, 2020 | hide | past | favorite | 170 comments



Learning Rust by writing a linked list is like learning Python by writing a CPython extension. Sure, you'll learn a lot, and it may even be useful, but it's not the typical experience of using the language.

I've seen people completely confused that such simple "CS 101" thing is such a mess in Rust. But nobody actually writes linked lists in Rust:

• The borrow checker wants to have clear single ownership, and doesn't support reference cycles. Lists happen to have mixed ownership. I'm not sure if it's even possible to prove at compile time that a list will never have a cycle, but it seems at least in the Sufficiently Smart Compiler territory.

• Safe linked lists are in the std lib if you really need them, but std has also plenty of other containers that are more efficient on modern hardware.

Compile-time safety checks can't prove all valid programs are valid (halting problem). Instead of going after the enormously complex problem, the borrow checker rules are actually pretty simple. It's a feature, not a defect. Simplifying the problem to single ownership with shared-immutable vs exclusive-mutable makes it much easier to reason about borrow checking. If something doesn't fit these rules, you either use a different approach, or use `unsafe`, and then wrap it in a higher-level safe abstraction that fits the model.


> But nobody actually writes linked lists in Rust

It feels entirely like you didn't even start reading this. He spends a whole long first page bitching about Linked Lists and why nobody uses them in Rust.


I come back to this from time to time as a reference for doing the uglier things in Rust. Writing Iterator<Item=&T> with raw pointers, recursive drops, etc. It's better than reading the std source code, because it's got a guide right next the code. I think I'm using it as intended. The title might be suboptimal/confusing for what it's trying to do. It's less a resource for newcomers than an illustration of someone bumping their head against the wall to do hard things.


Also, the way destructors work make it such that most custom linked lists are prone to stack overflows unless they use unsafe.


For what it's worth, this is also true of C++, where the most "natural" way to write a linked list is `struct Node { unique_ptr<Node> next; }`


I think you missed the point. The guide is fairly upfront about it being a bad idea and a very atypical approach to learning Rust.


Doubly-linked lists are hard with Rust's ownership system. I've argued for Rust having backpointers as a built-in type the borrow checker understands. You have to maintain the invariant that if A points to B, B points back to A. A is an owning pointer, and B is a non-owning pointer locked in a relationship with A. Easy to check if the language lets you say that's what you're doing. Rust lacks that.

If you have safe backpointers, most tree-type data structures with backpointers can be constructed. A nice feature to have.


You can somewhat emulate this with Weak pointers:

https://doc.rust-lang.org/std/rc/struct.Weak.html

They don't count against ownership, but do bring some extra headaches of their own (referencing counting overhead, etc).


> I've argued for Rust having backpointers as a built-in type the borrow checker understands. You have to maintain the invariant that if A points to B, B points back to A. A is an owning pointer, and B is a non-owning pointer locked in a relationship with A.

Or you could simply give it a new type/semantic: owner.

You could even use a familiar unix-shorthand for it: ~. Thus Node<T> will have a Parent: ~Node<T>, which you can pass by ~self.

And graphs would suddenly be nice to work with.

> Rust lacks that.

I can’t be the only one who finds that ironic?

And I say that as someone who likes rust.


It is not particularly ironic; there's no actual design here. It's easy to say that a feature should exist (and your parent has said this, about this, repeatedly) but it's much harder to make sure that it's something that has the correct semantics (and your parent has never actually produced this, even though they've been asked, repeatedly.).

(See the sibling comment by dan-robinson for just some of the issues here)


The absolute simplest semantic I can come up with here (too simple?) would be for a struct containing a non-null owner-pointer to be bound by its owner’s ownership/lifetime-scope.

For most commonly used graph-types this should lead to pretty simple one-directional ownership chains which should be doable (although probably not trivial) for a compiler to enforce.


As a non-Ruster, any particular reason they didn't include this? A lot of the code I've done over the years involve graphs, which have a lot of circular pointer and/or backpointers. I find it kinda weird they'd make such a common thing a PITA in a new language.


Rust's ownership rules aren't for the purpose of making the user's life hard, they are the least restrictive system that could be devised that allow the compiler to uphold Rust's safety guarantees. It was not too long ago that it was common knowledge that a programming language either has a garbage collector or has manual memory management, or possible both. Safe Rust has neither, but not without the effect of making awkward certain kinds of programming that are fundamentally difficult to make safe.

The Rust question to ask here is: Do you really need pointers, specifically? Or would some other reference mechanism work? You could put all of your graph nodes into a linear data structure like an array, and have them point to each other by holding a list of indices instead of a list of pointers. Or you could give all of your nodes unique keys and keep them in a table, and have them hold references to each other by key. The compiler will not try to prove the correctness of your graph algorithm, and in the event of programmer error that leads to dangling references your program will have to handle the scenario of following an index or a key and not finding a value, so bugs will not introduce memory unsafety.

There's also ongoing work on memory arenas in the nightly compiler, and I believe some libraries. Putting a graph into an arena is a good way to appease the compiler, because the entire arena will be freed at the same time, ensuring that hanging pointers will never exist between graph nodes.


> You could put all of your graph nodes into a linear data structure like an array, and have them point to each other by holding a list of indices instead of a list of pointers

Rust practitioners keep proposing this solution. It is like you have never heard of caches or do not understand that modern CPUs have multiple special prefetchers for pointer chasing and no prefetchers for "rust fanatics"

This solution will be SIGNIFICANTLY slower on any modern CPU. By a wide margin! Since you'll keep having to wait for main memory as the cache prefetchers have no idea about this insane scheme and will not prefetch the data for you. They will for actual pointers.


> This solution will be SIGNIFICANTLY slower on any modern CPU. By a wide margin!

I'm not an expert and would know more. Can you point out on any benchmark demonstrating those claim? Would love to see the actual number.


Search for ArrayOfStructures or StructureOfArrays to see benchmarks similar to what you're asking for


> It is like you have never heard of caches

But isn’t that part of the point of putting nodes into an array? So the nodes are guaranteed to sit next to each other in memory, and so are quite likely to be in cache?

> do not understand that modern CPUs have multiple special prefetchers for pointer chasing and no prefetchers for "rust fanatics"

Doesn’t array indexing reduce down to pointer chasing anyways? Or are you saying that the prefetchers can’t see through “nested” array accesses (e.g. nodes[nodes[i].out[0]].data vs node.out[0].data)?


Yes that is what I am saying. See Intel manuals for patterns they can detect. Mostly it is strided accesses and use of pointers at fixed offsets of a structure which the previous pointer pointed to.


So I guess it comes down to whether the graph would fit into cache and the particulars of the access patterns? If the graph fits into cache as an array the prefetchers would be more or less unnecessary since everything is already there. Beyond that, I probably don’t have enough experience or knowledge to make a good guess.


Yes. If the entire thing fits yes, but then it is small enough that you might as well manage it in Basic or JavaScript. No real programming required


I don’t do much graph programming, and the few pieces of graph programming I’ve had to work with that needed to be high-performance fit in L1/L2. But for most non-pathological graphs (treelike, few backlinks from nodes deep in the graph to nodes shallow in the graph, nodes are small) you could arrange a statistical locality property that child nodes are usually located at nearby greater indices. Most operations on the graph would sweep left->right in small strides likely to be either within the a cache page or onto a page that has been prefetched by the linear memory access pattern prefetcher.

Certainly this is more awkward to write than pointer linking, but approximately the same performance should be achievable for sparse or treelike graphs. If you need to work with dense graphs larger than the size of the cache performance will suffer a lot, but this meets the criteria to justify using unsafe, at which point the implementation would look a lot like a C implementation.


Rust has a concept of ownership and borrowing. (I think) everything is fine when you just have immutable references everywhere. But as soon as you start taking mutable references (of which only one can exist at a time) or ownership (stronger than mutable references), then cycles break things.

You can get around it with Interior Mutability, it's just slightly more verbose.

Also, it isn't really that they made an easy thing hard. Properly maintaining the invariants with cyclical references is hard and unsafe. Rust exposes that difficulty in a very literal way


Complete blind guess here, but maybe a safest language could “borrow” from accounting systems. Basic idea is that you don’t need to validate every step as long as the outcome of a specific set of operations is atomically valid. Some SQL have it, like checks on foreign keys and column check-exprs on commit, not on insert (that’s how you make back-refs by {insert; insert}, and not {insert; insert null; update}, which is similar to dl-lists and graph problem). Or you check that sums on both sides match, but not before “committing” ops into a book.

Half of subj code uses sort of atomics (mem-replace), but they are too small to not leak complexity to “userland”. If they just replaced that with

  transaction {
    ...shuffle values...
  }
and checked at “}”, it would be much easier to reason about when programming, instead of building microbridges everywhere. If code is not long and/or threaded, analyzer could calculate “balance” in a reasonable time just by looking at careless source code.

>Properly maintaining the invariants with cyclical references is hard and unsafe. Rust exposes that difficulty in a very literal way

But a solution to this problem is articulated easily: adjust corresponding nodes if this one goes away. It could help with that instead of just exposing, like tagging such types as heavily linked and demand/derive an [unoptimal] algorithm that would ensure correctness or define “corresponding” and “adjust” at least.

ps. I’m not familiar with rust, nor with discussions on it, maybe that was already discussed and refused or proven unreasonable at early design stages.


That's kind of how unsafe rust works, except that the compiler doesn't check that you maintained the variants. Your unsafe code is expected to conform such that safe code interacting with it is safe.

> But a solution to this problem is articulated easily: adjust corresponding nodes if this one goes away. It could help with that instead of just exposing,

Articulating things easily doesn't mean they are easy. Now every value needs a backreference to every value that holds it and, during its destructor (which isn't guaranteed to run), it has to make those held references invalid?

I know just enough to know how difficult the problem is, in the general case. The folks thinking about these problems for their day jobs have looked into a lot of simple solutions and they fall apart in cases that are too common or too valuable to no support.


Unsafe is not any kind of transaction, it is a deal under the bridge, quite the opposite. Actix author fell victim of that recently, iirc. Anyway, I’m not arguing, it is clear how microcontrol works. My concern is why to do it so hard for a developer to reason about referential balance, when a machine exists.

>Now every value needs a backreference to every value that holds it

If it didn’t, wouldn’t that be out of scope of graph/dl-list discussion?

It is interesting that solutions fall apart, because it seems like a warehouse-level problem to me. Any code path is just +1 -1 countable references with some matching-branching in the end. Are these design discussions archived somewhere, like a mailing list / technical rationale talks?

ps. I did not mean anything like “rust magically freeing graphs”. Only that “transaction {shuffle} check” thing. Like in physics, N spin in, N spin out, what’s where is not important, since nothing lost.


Historically, the classic graph structures are adjacency lists and adjacency matrices. Those are quite easy to represent in Rust if you use indices instead of pointers. Node/Pointer type graph structures aren't particularly efficient anyway, even in C or C++.


This is an enforced invariant. This results in proposals that it should be done by adding a system for general invariants and design by contract features. Those are then rejected as too complicated and something for the far future.

So, the usual answer is "use unsafe code." What could possibly go wrong?


It might actually be possible now with Pin. Could anyone actually knowledgeable about this comment?


I don't know much about Rust (and nearly nothing about Pin) but if this was true, don't you think someone would already have written a library providing doubly linked list without using unsafe?


In your graph, the nodes are not owning each other, are they?


No, but they do point to each other. For example, each node can have an array of all the edges, and each edge has two pointers to each node it connects. Via this you can walk around cycles for example. And typically these can change when nodes are inserted or removed.


I would guess backpointers are tricky because they are types that need to care about the record field they are put inside in some other type. I would guess that it would be a lot of work to fit something like that into the language and make sure it’s correct. And it would surely become harder if backpointers were allowed between objects of different types.

One way one can implement a circular doubly linked list is to confine all pointer manipulation to precisely two operations:

1. Creation of a node x such that x.next = x = x.prev

2. Given two double-links A <-> B (ie A.next = B, B.prev = A) and C <-> D, swizzle the pointers so A <-> D and C <-> B. Note that if the links are equal then the operation is trivial. Note also that these links must be in the direction of the circular list (ie if you have a circular list A<->B<->C, you can’t “reverse” the A<->B link to give this operation a B<->A link)

One way to think of this is that every finite permutation (ie disjoint product of cycles ie set of disjoint circular linked lists, but also as permutations act on themselves as a way of rearranging the links to transform sets of disjoint doubly linked lists to other sets of doubly linked lists) is a product of transpositions (which correspond precisely to operation 2; and operation 1 corresponds to the identity element which may be thought of as a product of every cycle of size 1 rather than a product of 0 cycles).

Suppose you construct the type of the “double pointer”. You use some magic or unsafe code to make sure that the creation operation follows these rules (I don’t really know any rust but I think this rule could be enforced with some careful helper functions and linear types. But rust doesn’t have linear types so maybe there isn’t a type system way to enforce that creation is done correctly or even that it is validated at runtime).

But now what happens to the ownership with the magic swizzle operation? If the links are from different lists then it is splicing them together and the nodes should (I guess) now have the same owner/lifetime. If the links come from the same list then the operation splices them into two separate lists, so I guess the ownership should be split. The problem is that you don’t really have a way to do that because there isn’t really a way to know or specify at compile time whether two nodes are definitely in the same list or definitely not in the same list. (I guess you could enforce that every element has a pointer to some “owning object” for the list it’s in but now all your operations that were constant time are linear time).

Adding or removing elements is a special case of these operations.

It isn’t sufficient to always treat the result of the swizzle operation as if the lists have become the same list because if they have separated then you wouldn’t know to delete half of the list.

Maybe the reply is just that somehow the typesystem should allow backpointers but somehow not this swizzling operation. But I don’t see how that helps with the ownership problems of doubly linked lists.

Maybe the answer is that rust should get (or already has) some way to always know whether two things “definitely or definitely don’t alias when you include the transitive closure of all their pointers; but actually only some of their pointers,” but that sounds hard to define and even harder to have the type checker able to resolve.


I really value this guide.

Writing data structures in a language is one of the standard ways that I convince myself that I understand it. Rust really messes hard with that mindset.

This guide talked me through all the things I'm not used to worrying about in garbage collected languages, showed me the error messages that I'd been banging my head against, explained what they meant, and did so with humor.


This Gonzo approach of running headlong into a bad idea is refreshing.


I went through this a few years ago for fun. It's a great guided tour of the compiler errors when working with complex memory safety needs. The most important thing to know about these is that you would almost certainly never do these in a real project: lists are in `std` but even then the vast majority of cases should use `Vec`.


I am learning Rust due to it's memory safety features and comments like this rub me the wrong way for some reason -- they give me an uneasy feeling that I will run into unexpected limitations in the language because these corners are not exercised enough. Because people who know the language are saying that the kind of data-structures that I deal with day in and out in my day-to-day work are not the "preferred" thing to do.

I do system software/networking software and I deal with a lot of linked lists and trees and these comments make me feel that Rust may not be a good language for my use-case in-spite of me wanting the memory safety features.


Don't worry. The language has lots of support for the style you wish to work in.

As for the specific data structures, I recommend setting aside your knowledge of their utility and consider carefully why they are being criticized. Then run some experiments to convince yourself of what is true. If you come to the conclusion that the criticisms make sense, then you can proceed with the knowledge that the people in question have thought about the issue and came to good decisions. This should inspire hope that they have thought about other issues as well and may have valuable insights.

For the record, I concluded many years ago that some variant on arrays/vectors/whatever outperforms linked lists in most situations by a lot. (There are times when arrays/vectors/whatever can't work. But they are less common than most imagine.) And changes in hardware have made that more true over time. Trees, on the other hand, have a flexibility that lends itself to many problems that it is hard to find other structures for many use cases. But even so if performance is critical, it is worth jumping through a lot of hoops to make sure that all of the pointers that make up a tree are physically close together in memory to improve the efficiency of your caches.


The comment applies word for word to C++ as well. Linked lists are just bad datastructures for modern CPUs.

I don’t use rust so I can’t speak to the quality of the std lib, but I don’t think its fair to use the parent comment as evidence either way.


No, it doesn't apply to C++ because the corners of the language required to write lists and trees are exercised enough and don't suffer from unexpected limitations.


The comment originating this discussion provides no evidence in any direction on the quality of the rust standard library. Merely that you wouldn't use these in practice, nor would you in C++.


Do you currently use C? C++ perhaps?

I mean basically no other language that currently comes to mind fiddles so much with the basics.

In Rust, just as in Java, people write highly optimized safe and fast data structures and others build upon that.

In C/C++ it seems every project reinvents the wheel to a large degree.

Or am I mistaken?


>In Rust, just as in Java, people write highly optimized safe and fast data structures and others build upon that.

I'm currently taking a class on API design with Josh Bloch (Java Collections, Effective Java, etc.), who pointed out that this wasn't the case until the late 90s or so - he pulled out an example of a KWIC system [1] described in a paper [2] from 1971:

Parnas 1971: This is a small system [that] could be produced by a good programmer within a week or two.

And then Josh proceeded to show his implementation of the same system, which he had written in <2 hours just by making use of the built-in collections, regular expressions, sorting, string formatting, etc.

It's really cool that these standard data structure implementations exist and are so accessible, making people orders of magnitude faster than before.

[1] https://en.wikipedia.org/wiki/Key_Word_in_Context [2] https://prl.ccs.neu.edu/img/p-tr-1971.pdf


I use C. And you are right, in my field there is a tendency to a large degree to re-invent the wheel of data-structures :-). Sometimes justified, but most times not. But mostly because C doesn't usually have well-known, industry-standard libraries for common data-structures. Or even if they do, it's kind of trivial to implement basic data-structures (not saying they will be bug-free!) instead of relying on some random distribution of a library from the Internet.

BTW, most times, the problem is not performance, but tight control of memory usage.


Yeah, and there is a reason for why there are not many "industry-standard libraries for common data-structures" out there in C. I think the reason (or one of the reasons) is that there are zillions of ways to implement them, and "one size does not fit all". I often have to ditch the standard library's implementation of this and that in other languages, too, because they are not fine-tuned enough for my use case.

That said, there are lots and lots of C libraries installed by default on Linux distributions (via the distribution's own package manager!) that are reused by other projects.


I think your point about control of memory usage is spot on. Most conversations tend toward cycle count and execution speed and fail to consider memory usage at all (I know I’m certainly guilty of it).


With rust and cargo it's often trivial to use a third party lib from a centralized location that has a rich collection of submitted libs to choose from. It's also fairly easy with python/pip and node/npm. There's nothing quite like it on the same scale and as widely used with c/c++, which means it's more difficult to pull in third party libs. It's often easier to just implement the stuff yourself. Or at least that's my experience. I'm not familiar with java.


> There's nothing quite like it on the same scale and as widely used with c/c++

I use my Linux distribution's package manager. It works fine for C, and it is fairly easy, too. You only have to learn to use one package manager; your system's package manager. Plus I am totally fine with "reinventing the wheel" if that wheel is just 2 lines of code. :)

Heck, I would even go out on a limb here and claim that it is on the same scale and as widely used with C as it is with Rust or Python, if not more. A typical Linux distribution contains quite a lot of C libraries alone upon which other projects written in C (or other languages, for that matter) depend. If the program you install via your system's package manager depends on a C library, it will get installed (obviously), and it is often reused by many other programs. Most C developers I know have the tendency to make their program depend on as fewer dependencies as possible. "Zero dependencies" is usually a "feature" or a selling point, and a good one at that, IMO. I prefer this over having a package manager for all programming languages separately, depending on over 300 dependencies of which 80% is just 2-10 lines of code and so forth. Additionally, take for example this: if I want to cargo build two projects that depend on the same crate, it fetches and builds it twice, or at least it did a year ago. I found it to be odd. It is a waste of space and time. There are ways to solve this.


The distro package manager isn't integrated into the prominent c/c++ build systems such as autotools and cmake. There's a good bit more friction involved in making sure the relevant libs are installed and put into the place the build system can figure out since those build systems aren't also integrated into the third party package discovery and installation. I'm not saying cargo makes this perfect, I'm saying it's a lot easier to add a line to Cargo.toml vs forcing the users or other developers to ensure this is done via external mechanisms.

'cargo build' vs './configure; apt install something-missing; ./configure; apt install something-missing2; ./configure; make'


It is definitely not the case for C++. It has a decent standard library - when it comes to collections, richer than many other languages, in fact - and then there's Boost for more exotic needs.


It's just not needed to implement trivial data structures by yourself if it's already implemented before. Also linked lists might be sub-optimal, see other comments.


Linking my comment from elsewhere.

https://news.ycombinator.com/item?id=22393723


Linked lists are inherently niche on modern hardware.


Yes. Lists were fine when memory was random-access. Today, if you're linking all over memory, cache misses dominate performance. So contiguous data structures are preferred.


This isn't a linked list issue at all, but a problem with constructing the nodes. If you construct list nodes in a contiguous location cache misses are a non issue.


But that's fundamentally a function of the usage pattern. If the usage is "construct a bunch of nodes" and then nothing, then you should be using a vector-type structure anyway.

The promise of a linked list is being able to iterate and make constant-time insertions/deletions wherever you want. But the more such mutations occur, the more fragmented your memory will end up, and the more you'll get hit by the cache issues—especially if your list is large, exactly where the purported benefits of a linked list are strongest.


> If the usage is "construct a bunch of nodes" and then nothing, then you should be using a vector-type structure anyway.

Not if the nodes are variable-sized.


Again it depends a lot on the usage pattern. If you're going to be dereferencing a pointer off to some allocated-elsewhere object on every little operation or comparison then sure. But if there are sorts or searches or other operations that can take place against a fixed-size "index" object, then it may still make sense to have a vector-like or pre-allocated pool of those objects and only pay the cache hit cost when you do major operations on a particular instance.

Which is really just the cache doing its job. The linked-list anti-pattern is when you're constantly paging in big speculative chunks of memory only to access a single pointer and then move on.


That's no longer a linked list, it's a vector.


I think the argument is that if the nodes are fixed size then you can preallocate a big block of them slab-style, and for the pointers you just use indexes into that array rather than "real" pointers.

Potentially there's a small memory savings as well if you can use 2 or 4 bytes for the index instead of a full 8 byte pointer. But you're taking on a lot of complexity and tuning by going this route so you'd really have to test and make sure the gains were worth it.


A bigger saving is in the allocation metadata. Every time you allocate something on the heap, you also need to save information about the size, maybe some flags and padding to align on a page. And if the entries are small, you also get better cache locality.

Putting things in an array works around all of that (unless you need to handle tombstones)


A reasonable allocator should already be grouping allocations of similar size together. In a whole lot of cases this means your metadata is no more than a single byte per object, and the padding is no more than you'd have inside an array.


True, it really depends on the allocator.


Yes, along with binary trees, but there are cache-friendly alternatives like unrolled linked lists and B-trees.


Seriously? Are trees also niche?


Yes. For the vast majority of uses, you should probably use a hash table instead.

There are still niche uses for both linked lists and trees, but the sad fact is that the relative time it takes to chase a random pointer compared to doing literally anything else keeps getting worse, and is already at the point where frequently linearly copying kilobytes of arrays is almost always faster than using a linked list.


...time it takes to chase a random pointer compared to doing literally anything else...

When I have had to do serious stuff with linked lists, trees, and so on, I found that it was much, much faster to assign an array of nodes, and allocate the linked list out of that close together. There was a lot of complexity in doing so but colocating data to match my access pattern was a big win.


I wouldn't say niche. B-trees and B+ trees with doubly linked lists connecting the leaf nodes are used in every major database server for indexes. Hash tables are generally used by optimizers to handle unindexed data in queries. Doesn't make hash tables niche.

> and is already at the point where frequently linearly copying kilobytes of arrays is almost always faster than using a linked list.

That's only half the story. Sure reading arrays in sequential order is fast. What about inserting or deleting items within an array? What are the costs to having to constantly resize arrays? It is very expensive.

At the end of the day, it's about finding the right data structure for the data and your needs.


If your data/algorithm calls for a certain dynamic structure, let's say a red-black tree, replacing it with array is pointless.

Last time I looked, kobjects in Linux kernel were still dynamically linked in a bunch of ways, and tree-like data structures are widely used.


I think it's probably fair to call kernel development niche? And the fact that Linux uses linked lists a lot is not necessarily a strong case for linked lists, but just a matter of very specific constraints - like perhaps not wanting to optimize too much for underlying hardware?


I used Linux kernel as an example of easily accessible, extremely well known, performance tuned open source project. Of course dynamic data structures are used in countless other applications, like DBMS, web servers and web browsers. Really in nearly any non-trivial codebase.


Sure, but algorithms that explicitly need a linked structure for performance are very few compared to algorithms that have the same or better asymptotic with a hash and a vector or for which the input sizes are such that the large constants pointer chasing causes don't make up for worse asymptotics.


If the algorithm calls for a balancing tree, you'd end up re-implementing it across of bunch of (linked) hash tables or over an array treated like a heap. Explicitly, or if one does not know what they are doing, implicitly. In either case you would have no performance advantage.


If you know what you are doing, you may well realize a performance advantage.

For example LevelDB (which is conceptually based on Google BigTable's design) looks a lot like a balancing tree in its access patterns, but is a lot faster than a variety of alternatives. And it is faster in part because sequential data is stored sequentially in order instead of using a data structure that results in random access patterns.

And no. It doesn't use linked lists.


A hash table is vulnerable to collision attacks, if keys are derived from any kind of untrusted external input. Trees might be slow, but they're consistently slow.


non sequitur. No one mentioned security, nor do any elementary data structures concern themselves with such higher level things. If you're allowing unlimited untrusted data to control internal data structures, you're going to have a bad time regardless of which data structure you're using.


And rusts default hashing implementation is hardened against these attacks


> nor do any elementary data structures concern themselves with such higher level things

Not explicitly, but they concern themselves with worst-case performance, and tolerable worst-case performance implies security against a certain class of attacks (like HashDoS).


Security is a thing that is implicitly pervasive. If it doesn't apply, that should be explicit.

And yes, elementary data structures concern themselves with such things all the time - because, regardless of how they "must" be used, in practice they do get used on untrusted inputs on the time, simply because it's the easiest thing. Have you noticed how many languages and standard libraries have switched to random seeds for their hash tables lately?

And no, it's not true that you're going to "have a bad time regardless of the data structure". You're not going to have a bad time with an RB tree, regardless of where your inputs for keys come from - because it is a data structure that doesn't have a quirk of extremely bad perf on pathological inputs.


This has been mitigated in almost every language already by starting the hash calculation with a random per-process seed. It's a theoretical issue of hash tables, but we've got known practical solutions.


What kind of tree? A binary tree isn't great for many uses, and doesn't excel at much. But a nice fat B+ tree removes the vast majority of pointer-chasing latency.


I've got quite keen on B+ trees. It seems like the world where we had fast main memory and slow drive storage is not very different to the world where we have fast cache memory and slow main memory.


Trees and Lists are types of data structures. Both are very useful in all contexts.

Linked List is a special kind of List. It typically implies a non-sequential data layout.

With all of the modern layers of abstraction, non-sequential memory access typically means poor cache friendliness, i.e. poor performance.

Hopefully that helps explain why Linked Lists are considered niche, I.e. specific to embedded programming or in very special cases when benchmarks provide hard data to use a Linked List.


Tree is a special case of a linked list.

Linked list is definitely sequential (it's a list!), and it can even be sequentially allocated in memory, depending on allocator implementation.


That's not necessary true. You can implemented binary trees as an array, so are trees a special case of arrays?

There can be sequential and non-sequential implementations of ADTs.


B-trees aren’t too bad.


> The most important thing to know about these is that you would almost certainly never do these in a real project: lists are in `std` but even then the vast majority of cases should use `Vec`.

Not relevant to rust, no?


That is true of Rust. The linked list is bad though, even irrespective of this argument about the utility of lists.


Lots of these comments seem confused about the purpose of this. It's not about using linked lists (that's easy, they're in std::collections), it's about implementing them. So the common non-kernel/embedded use cases are well supported, just use std::collections::LinkedList! But if you're in a no-std context then you might need this.


When I was first getting started with Rust this tutorial was eye-opening. It gave me a clear view into the somewhat impenetrable world of working with Boxes (pointers to the heap) directly, in the context of the borrow-checker.

It also convinced me that you usually just want to use the standard library data structures if you can :P


Can’t wait to follow this. I’ve been going through the official book these past couple weeks. As someone who’s never used systems languages (mainly a js / node person) I was able to write a redis-like database quickly on top of TCP. I’ve also picked up async / await and how it works in rust.

There’s so much to learn but the compiler is surprisingly helpful. I loved Typescript and it is like TS on steroids. When I get stuff compiling I have so much confidence.


>You're doing some awesome lock-free concurrent thing.

I can confirm lists are pretty rarely used. The four times I've used a linked list in 10 years of professional programming were:

-quick&dirty hashmap in C

-threadsafe queue

-keeping track of a set of objects that can't be copied ( threads)

-LRU cache

In C++ at least, the nice thing about a list is you can push/pop in the front/back without a reallocattion or invalidating iterators.


I've used linked lists a little more because I mostly do C rather than C++. Sometimes I end up using it as a quick-and-dirty vector when I don't want to bother writing something or using a library, though I typically use some kind of block/arena allocator vs. just malloc.


Linked lists are used a lot in game programming where the worst-case behaviour of std::vector and similar structures is undesirable. Linked lists may be slow for a variety of reasons but they're simple and predictable which is very nice when you're trying not to miss the 16.67ms per-frame window.


I thought it was the other way around: games using arrays, non arbitrarily growable vectors to precisely be able to iterate over everything quickly in a deterministic fashion. Specially because a lot of what games have to deal with is "visit every node".


An array of pointers you call a virtual method on is indistinguishable in "pointer-chase" time (with current caches, at least), to an embedded linked list. Embedded linked lists have better insertion/deletion costs, and are easier to manage, so they're used quite frequently in games.


That model (bag of polymorphic objects) is avoided nowadays in everything from games to simulation software.


They use arrays for static vertex, texture, etc. data to send to the GPU. They don't use arrays for things that are being created and destroyed all the time, such as units in a strategy game, they use lists for those things.


But do they use linked lists for them? I would have assumed you would use an arena for something like that, precisely because you already need to iterate over all of them and can represent any graph as keys into a generational index, so that reading stale keys is trivially handled.

Again, I'm not a game dev and this is just my understanding after reading up on these topics after watching https://youtu.be/aKLntZcp27M.


They certainly used linked lists all over the place in StarCraft, WarCraft, and Diablo [1]. I don't know that many game developers would use complicated data structures like the one you've described here. Linked lists are fantastic for insertion/removal in arbitrary places.

Game development is not really about showing off with cutting-edge CS research, it's about getting things done. Maybe your arenas with generational indices would be better, but they could take a long time to figure out and lead to a big mess that doesn't go anywhere.

[1] https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...


Arenas aren't "cutting edge CS research," theyre used widely in latency-sensitive applications like gaming. This article [1] describes how to implement one for a game engine. Basic versions are easy to understand and provide good results.

Starcraft/etc. are 20+ year old games and software development patterns as well as gaming hardware have changed in fundamental ways since then. Conventional wisdom nowadays is that cache misses on every list iteration are a lot worse than shuffling a few contiguous bytes around or marking things inactive when removing items.

[1] https://www.gamasutra.com/blogs/MichaelKissner/20151104/2582...


You are wrong in several fronts:

+ Linked lists are a thing of the past.

+ Game engines use complex data structures and algorithms. Some are cutting-edge research implemented from papers, specially in graphics.

+ A generational index is not complex.

Yes, game development (as opposed to game engine development or video/audio rendering) is a mess. That does not mean the actual technical fields involved are simple.


While I don't disagree on your first two points, I disagree with you third: there's nothing complex about generational index.


Those are pretty old games written when caches were small and the CPU-memory speed gap wasn't as large as it is today.


No, they don't. You are talking about the state of affairs 20 years ago.


Used, yes, butbnot implemented. std::collections::LinkedList is a doubly-linked list suitable for games. It's basically just the no-std crowd that might need to implement their own.


This is great as an intro to Rust, I preferred this as a Rust starter tutorial to the main rust book or any other tutorial I tried


I'm loving both. The book is comprehensive and teaches you even the most basic concepts, so it's great for a broader skill range.

But I love this one too because it digs into some CS archaeology and that helps me dig into the theory and history. I doubt I'll ever have to implement a linked list but knowing how they work and are implemented and their advantages and drawbacks is great.

Tangentially, there's a wonderful game called Human Resource Machine which teaches linked lists and other assembly-like programming without you even realising it.


The book is very comprehensive but it didn't click for me as a beginner, I found much of the exposition left me with unanswered questions while it went on to cover more ground. I'm not sure if those questions were answered later but I got lost very quickly. Maybe it's aimed at people with more C++ background?

This linked list tutorial answered every question I thought of almost exactly as I thought of them, which made it a joy to read. I also liked Rust By Example [1] over the book. Based on my experience I'd recommend this tutorial and then implementing something using Rust By Example as reference. But everyone learns differently!

[1] https://doc.rust-lang.org/rust-by-example/


Another great space of exercises in this vein are in-place operations on binary trees.

Specifically, rebalancing [1] proved particularly tricky for my students. I think it's a good litmus test for whether you understand ownership, borrowing, and algebraic data types.

[1] http://cs242.stanford.edu/f19/assignments/assign6/#22-bst-in...


A little more light-hearted but the same style to learn Ruby with Entirely Too Many Fizz Buzzes. See https://yukimotopress.github.io/fizzbuzz


I think the doubly linked list is to Rust's ownership semantics as the double slit experiment is to Quantum Mechanics: each exposes a seemingly counterintuitive behaviour, and truly understanding why requires a depth of understanding indicative of true mastery.


A while ago now, this was the tutorial that got me to the point where I could implement linked structures in Rust. Highly recommended.


I'll try this at some point. My first couple forays into Rust made me miss Java and C.


If you're anything like me, rust was a bit difficult at first due to ownership, lifetimes, traits, and the borrow checker. At some point though, things finally just clicked in my head and now I have the understanding I needed and my difficulties went away. After that point, I began writing safer code in every language because I was using the same ideas that rustc brute forced into my brain. It took probably a few weeks or so before I got it. Now I rarely have (those) issues, and when I do it's probably because I'm trying to do something weird or I was drinking and missed a place where I obviously should have used .clone() or maybe a reference. I'm just a hobby coder though, perhaps at a beginner-intermediate level with a background of writing unsafe python and c. Your mileage may vary.


I guess it just doesn't solve problems for me yet. The problems Rust solves I don't run into. Even with big Java apps with lots of concurrency and crap I hardly shoot myself in the foot. Oh well.


Seems to me that if rust doesn't solve problems you have any better than the other languages at your disposal, it might not be the best choice to solve those problems for you. That is very reasonable to me.


Right! Also I suppose I'm feeling very disagreeable today. :)


Downvotes by the Rust fanatics again! :)

Edit - it's just funny because it always happens. Say anything against Rust - get downvoted. It kind of reminds me of the really hardcore Linux community.


> Mumble mumble kernel embedded something something intrusive.

And this is why kernel/embedded/something/something developers don't take Rust as seriously as you want them to.

You can't simultaneously declare your language the best choice for system software development and treat the long-evolved patterns of those paradigms as a joke.

There are very good reasons for intrusive data structures, not least of which being their ability to operate in contexts where no heap is available. If you don't understand them or don't want to talk about them or want to limit your discussion to situations with different requirements, then say so.


The actual context of your quote:

> Just so we're totally 100% clear: I hate linked lists. With a passion. Linked lists are terrible data structures. Now of course there's several great use cases for a linked list: > > - You're writing a kernel/embedded thing and want to use an intrusive list.

So I’ve got no clue what you’re railing about. The project specifically acknowledges that there is a need in kerneldev for those data structures.

I’m a kernel dev using Rust for my kernel. I use both intrusive linked list and growable vectors in it. The thing is, the sentiment expressed in the article really resonates in me: in most cases, growable vectors are a better choice, performance-wise.


The quote that the GP is talking about is included below, which copy/pasted from the project page, and the Mumble mumble line is the heading for a paragraph:

‘’’Mumble mumble kernel embedded something something intrusive.

It's niche. You're talking about a situation where you're not even using your language's runtime. Is that not a red flag that you're doing something strange?

It's also wildly unsafe.’’’

Also, prior to this author claims the following, where the first line also a section heading:

‘’’ I can't afford amortization

You've already entered a pretty niche space’’’

These fiat rulings based on one an authors generalization of what is ‘niche’ are what I assume GP was commenting on. These are the kinds of dismissals that some developers take issue with, as GP states.


Kernel development is niche. Most of the time you don’t need linked lists.


When all you have is Rust everything starts to look like adjustable array.


Rust has singly linked, double linked ( in its minimal std library) and intrusive linked lists - all with APIs that guarantee a lack of memory errors and data races at compiler. I don’t know any good kernel developer that wouldn’t like those guarantees and I work in the Linux kernel professionally full time.


What’s an adjustable array? A vector?


A vector is a one-dimensional array. Adjustable arrays can have arbitrary number of dimensions, although am not sure if it's a thing in Rust.


I've spent years programming rust and I'm not sure what you mean. Do you mean arrays of tuples, or nested arrays?


I mean a multi-dimensional array.


Thanks! Why does everything look like a multi-dimensional array for you in rust?


An array (an ADT) can be arbitrary dimensional, with one-dimensional case often called vector (and two-dimensional called matrix). An array can also be adjustable, both in one and multi-dimension variants.

So a vector can be both adjustable and non adjustable, and some language do have both versions. Some language have adjustable one-dimensional array/vector as the only dynamic aggregate/ordered datatype.

And to answer the post I replied to originally, a vector in rust is a one-dimensional adjustable array. To answer your question above, no that's not what I meant, sorry for not being clear enough from the beginning.


Thanks for clarifying.

I think we're just using different definitions. The only real definition of an array I've encountered in work and school is that it's just a one dimensional collection of elements, usually of static size. A vector depending on context is usually the same thing as an array but you conveniently change the size dynamically. Lists can whatever you need it to be given the context, just needs to be sequential.

Of course you can represent higher dimension structures by linearizing indices (x + row_size * y, etc).

I think people are getting confused as most don't consider arrays to be arbitrarily dimensional without some scheme.

Completely off topic but you've reminded me of this great article: https://hypirion.com/musings/understanding-persistent-vector...


That just sounds like a graph.


How in the heaven an array can sound like a graph?


Sounds like one to me.


If you can't tell when to break or not break rules then maybe you are not a kernel developer? It's almost like a King only following his own laws instead of being the one making them. Why are you a King again?


(Had to edit in a quote to what I was replying to because you fixed the original):

> The actual context of your fake quote:

Good grief, that was a verbatim quote! Here's a link to the exact text I quoted:

https://rust-unofficial.github.io/too-many-lists/#mumble-mum...


I think it must be stressed that this is an unofficial guide and not "the voice of the project", etc...


It explicitly says that there are good use cases, just that they’re very rare.


It does not. Maybe somewhere else it does. That section, verbatim, says:

It's niche. You're talking about a situation where you're not even using your language's runtime. Is that not a red flag that you're doing something strange?

It's also wildly unsafe.

But sure. Build your awesome zero-allocation lists on the stack.

And this is (1) wildly mistating the requirements and (2) deeply offensive to those of us who work in those regimes.

Obviously, yes, there's room for a reasoned argument about this stuff and whether alternative paradigms can be deployed in heapless contexts. This isn't it.


> (2) deeply offensive to those of us who work in those regimes

You know how sometimes someone who is A Little Too Online gets offended because they think you said a Bad Thing, but it was actually just a typo or a straight misreading, and you try to explain that they’re reacting to something you didn’t even say let alone believe, but because they have already decided you are the Bad Person they interpret that as you “doubling down“ on the bad opinion that they attributed to you, so they get even angrier and even more convinced that you sincerely believe the Bad Thing?

I mean no judgment. We all have such sensitivities. But maybe now you see how easily you can wind up on the wrong side of a public debate by searching for offense where there is in fact none.


> And this is (1) wildly mistating the requirements and (2) deeply offensive to those of us who work in those regimes.

Care to expand how? While not a kernel dev, I've had my share of use cases where I've done exactly this kind of thing in gamedev, and I simply cannot bring myself to disagree with what's been said, or see what's offensive.

It's strange, debugging when the pointers get corrupted by other code exhibiting UB is painful, it's a potential multithreading hazard, and flat contiguous arrays are frequently more appropriate - but it's sometimes useful. It's not arguing that an alternative paradigm can - or even should - be deployed in a heapless context. It's explicitly admitting that intrusive linked lists are an appropriate paradigm.


> It's a fine data structure with several great use cases, but those use cases are exceptional, not common.


https://github.com/Amanieu/intrusive-rs implements what you're looking for with no heap. There's no need to flame or misrepresent what the book says (no one said embedded was a joke). The Rust embedded community is strong.


You're like the sixth person to argue I'm somehow "misrepresenting" what is being said in this book? I'm quoting it directly. The "flame" is in the original, and I'm responding to it.

Take it out. It's bad.


Assuming that you're not trolling, I'll explain how you misinterpreted what you read: the section that you quoted is a sub-heading under:

> An Obligatory Public Service Announcement

which is an argument that:

> Linked lists are as niche and vague of a data structure as a trie.

The author then goes on to state that many people have contacted him to argue that linked lists are not niche and he puts each of those arguments under a heading:

> Mumble mumble kernel embedded something something intrusive.

is one such heading. Inside of it, he continues the argument that linked lists for embedded no-heap scenarios are niche and—to continue his argument—we therefore shouldn't be teaching undergrads linked lists just like we don't teach them tries.


I think what ajross is arguing is that it is the author of the tutorial who is trolling here. It is all true what you and everyone else say against ajross‘ argument on the purely factual layer, it is just that the tone of the PSA is needlessly aggressive and condescending.


To explain what I think this comment means. When working on embedded systems, you can interact with hardware devices by writing directly to specially mapped memory areas.

E.g., if you want to write text to a small screen, the kernel driver gives you a memory region that you write bytes to, and they're shown on the screen immediately, without requiring the CPU.


> To explain what I think this comment means. When working on embedded systems, you can interact with hardware devices by writing directly to specially mapped memory areas.

And why can't you do this in rust?


You can. Rust allows full memory addressing.


There's nothing to prevent you from doing this, although you may have to wrap it in an unsafe block.


They're saying more that intrusive data structures are really nice in situations that restrict heap allocation. The memory overhead outside of the structures linked together is O(1), because all of the per object metadata is stored in the objects themselves. That means consumers generally don't have to be hardcoded for certain number of objects like you would for an array/vector that doesn't have access to a heap.


My feelings, too. I do both embedded programming (in C/C++ and CUDA) and Erlang programming for a living. Erlang, too, doesn't let you (easily) write your own linked list structures, etc.

But for embedded programming with tight memory or performance constraints these data structures are essential so we use C++ or even C. They're well understood and the implementations have simple, elegant solutions.

For "safety" when we don't need absolute control, we'll choose a GC language like C# or F#. No need for the complication of Rust.


Rust gives you just as much control as C when you need it.


I may be mistaken, but if using unsafe does not allow for the ‘borrow-checker’ to be turned off and allow for code which doesn’t abide by the checker’s requirements, then it clearly does not give “as much control as C”. Again, I might have missed some of the subtleties of circumventing Rust’s static checking, but I don’t think I can purposely create some non-deterministic racy-appearing abstractions, which would be trivial to do using global variables in C.


You can't turn off the borrow checker for references, but Rust also provides raw pointers which are not subject to borrow checking (these are exactly like pointers in C, and can be cast to and from references (this is a no-op at runtime since they share the same memory representation)).

https://doc.rust-lang.org/1.30.0/book/first-edition/raw-poin...

You can create and manipulate raw pointers in safe code, but dereferencing them requires an `unsafe` block.


AIUI, there are some hardware architectures where even creating a wild pointer might be undefined behavior, regardless of whether that pointer is subsequently dereferenced, and C inherits these requirements. This means that it might be desirable to restrict creation and manipulation of raw pointers to unsafe code in Rust as well, if this can be done without introducing undue incompatibilities.

(Rust editions would naturally allow for this: Rust 2021 would warn on creating/manipulating raw pointers in Safe Rust, and stop warning for "unnecessary" use of unsafe deriving from these operations; Rust 2024 would make these a hard error ouside `unsafe`.)


> AIUI, there are some hardware architectures where even creating a wild pointer might be undefined behavior, regardless of whether that pointer is subsequently dereferenced

Would you be able to point me to some references for such hardware? Im not sure how that would work (at least based on my admittedly limited amount of experience). Wouldn’t a pointer look like any other integer right up until it’s used as a memory operand? Or would said architecture have some way to distinguish pointers and a “regular” integer in registers?


Allegedly, some platforms have pointer trap representations, where a certain pointer can be created, but may not be used in any operations of that type. No modern systems have such trap representations for pointer types, but the C standard inherits their legacy, and, more importantly, C compilers use it as justification for certain types of optimizations. Since it's not a hardware limitation, Rust can perfectly well take the opposite path and say that the compiler may not use it for those optimizations.

https://stackoverflow.com/questions/6725809/trap-representat...


> No modern systems have such trap representations for pointer types

This may be incidentally true, but "address sanitizer"-like features are becoming more common on modern hardware, and while these do not currently trap on creation/manipulation of a 'wild' pointer (since, strictly speaking, a trap only happens on dereferencing), there's no solid reason to expect this to remain the case in the future.


I don't see how you could trap creation or manipulation, since those pointers are stored in registers and/or memory, and both are fundamentally untyped. How would the hardware even know that something is a pointer, on any architecture that is popular today?


Because you use typed instructions to access them. For example, on ARM with pointer authentication you’ll sign pointers and unsign them right before using them. If you forge a pointer it’ll cause a crash when it’s used because its signature will be incorrect.


But that would still happen at the point of dereference, no? Or does it allow to tag even operations like moves and adds?


> C compilers use it as justification for certain types of optimizations

I believe that the Rust compiler is free to make it's own choices about what is considered valid, and which optimisations it wants to enable. It doesn't need to follow C's lead here.


isn't that what I said


Maybe. Or maybe you'd just change it so 'wild pointers' created in safe code are stored as ints on that architecture.


> if using unsafe does not allow for the ‘borrow-checker’ to be turned off and allow for code which doesn’t abide by the checker’s requirements

It allows the latter. 'Code that doesn't abide by the checker's requirements' uses separate facilities that are only allowed in unsafe code. This means that `unsafe` doesn't have to turn off anything, and further pinpoints the parts of the code where caution is needed in order to maintain the invariants that Safe Rust is based on.


Unsafe doesn’t turn off the borrow checker, but it does give you access to pointers that aren’t checked.


> if using unsafe does not allow for the ‘borrow-checker’ to be turned off and allow for code which doesn’t abide by the checker’s requirements

It does, that’s why it’s there.


Basically anything published about Rust is turning me away from the language.

It could be unfair though: Technical writing (and that includes humorous opinionated pieces) has declined dramatically in the last 10 years.

Or perhaps writing as a whole has declined.


It wouldn't hurt to be specific.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: