To further prove the point about cache locality: https://www.youtube.com/watch?v...

Peaker · on Dec 2, 2013

This is a strawman. This isn't a useful case for lists.

Since he needs to scan the list to find the position to add/remove to/from.

If he used an intrusive list, the removal from a vector-with-list would be faster in a list already for relatively small N's.

What I learn from this video is that Stroustroup is also misguided about linked lists. He thinks you always need to do a linear search to do useful operations. The whole point of lists is the O(1) operations, not O(N) operations.

Indeed, std::list is the culprit here: "We shape our tools and then our tools shape us". std::list users become unaware that list operations are usable on elements without a linear search first.

epsylon · on Dec 2, 2013

Even O(1) operations on lists can break cache locality. Since you're chaining using pointers, all bets that your cells are nearby are off. With vectors, they always are.

Peaker · on Dec 3, 2013

When you need to memmove your whole vector you're going to thrash a lot more of the cache lines than when you touch 2 neighbor cache lines.

dkersten · on Dec 3, 2013

Assuming that your list elements are in the neighboring cache lines (also how it compares to vectors in real life performance, as opposed to time complexity, really depends on how often your vector has to grow for your workload.. often a reasonable size can be chosen up front to minimize this, for example), but your point is of course valid.

Though I have seen vector-like structure implementations that ocmbine arrays and linked lists so that when you run out of space, it allocates a new array and links it to the end of the old array. I've implemented memory pooling systems like this myself. Works well, because as long as you know which bucket/array the element is in, you have random access and you also have good linear scanning performance, and extending the size doesn't require memmove's.

I'm not saying that linked lists have no uses (my original comment about never was not meant quite serious - never is much too strong a word) - they certainly do - but I am saying that I feel they have much more limited use cases than most people seem to think.

I agree that intrusive linked lists have many advantages that non-intrusive linked lists do not have. The main advantage of intrusive data structures is, as you said, that membership in multiple structures is in one place allowing constant time deletion from all structures.

Replying to your other comment:

Deleting from the middle of the list is an extremely common requirement, when you have objects that need to be enumerable in some order[s] and sometimes deleted.

I don't know what work you do in C or C++ (in other languages, I happily use their lists data types without caring if they used linked lists under the hood, because chances are, if I care about absolute performance, I'd have been using C or C++ to begin with), but in my own work, requiring both order and common insertion/deletion in the middle has been rare - or the elements and lists were so small that linear scan and copy was fast. But I guess it depends on your work, your logic here certainly sounds reasonable to me.

When you need indexing, use a vector. When you need quick add/remove, use lists. When you need both, use both.

Sure, you could use, say, a vector of elements that are themselves nodes in an intrusive linked list. Using a random access data structure to index another data structure is something I've done myself. I suppose I'm mostly arguing against the naive use of linked lists that seems to be rampant, because I think that most uses do not do all the interesting things you mention, in which case, IMHO, my points still stand and linked lists have a narrow use case.

You don't need to jump around. Say you have a request object you found via a hash table. After handling it you decide to destroy it. You now want to delete it from various lists it is in. You can do it on all lists in O(1) for each list. This is relatively cache-local (only need 2 extra random accesses, potential misses, for each list).

See, this is the kind of thing that I was looking for when I asked you to elaborate. You're not really using the list as a convenient growable array replacement (as you say yourself, you don't think of them as containers), you're using them as a secondary structure to maintain order in cases where middle removal or insertion is important (or maybe the other data structures are the secondary ones.. whatever). I think I better understand what you meant all along now and your reasoning makes sense to me, I can definitely see the use of this and agree that the problems of linked lists basically boil down to the following: they're seen as containers.

I think in that case, everything I said is totally true. I was definitely somewhat wrong about intrusive lists in the cases where some combination of these are true: another structure is used when lookup is needed, middle insertion/deletion is common, copying is expensive, order is important, elements are members of multiple lists where membership is commonly modified together.

I regularly have my data objects within a hash table, a list, and a search tree simultaneously and efficiently, with no dynamic allocation to sustain any of that.

This sounds good - I don't think most people really do this though. The problem isn't just std::list (or std:: containers in general). Your original reply to me was "If you really think that, you've missed CS 101" but perhaps you missed CS 101, because in CS 101 they (in mine and in any course notes and data structure books I've read) teach the common unobtrusive-linked-list-container data structure for which my points are completely valid. They're not teaching your use cases, or if they are, its certainly not nearly as common.

EDIT: I wanted to add that all of this really depends on what you are storing to begin with. If you are only storing an integer or two, the overhead of the links will kill you and your cache efficiency. If on the otherhand you are storing lots in each node then a few extra pointers isn't going to do much. Also if you're only storing a few bytes (or a few dozen) copying isn't all that expensive and then whether lists or vectors are faster depends on access patterns like I said in my previous comment. A lot of, eg, my python lists are lists of numbers. In high performance c++ code a lot of my vectors simply contained ids, numbers, points (2d or 3d) or similarly simple data.

Peaker · on Dec 3, 2013

It's nice to be able to have productive discussion on the internet!

We can agree on almost all points, I think.

The CS 101 bit you mention is a good point: intrusiveness matters for asymptotics in CS, and should be taught in academia, not (very small portions of) industry.

By the way, when you mention "a vector whose elements are linked list heads", a much more typical scenario is an array/vector whose elements contain one or more list heads.

You were wondering about the kinds of software I was working on that needs this stuff: high-performance storage controllers. We need to maintain a lot of concurrency, handle a lot of kinds of failures, etc. We often require the same objects (e.g: an ongoing I/O request) to be looked up in various ways, associated with failure domains, timed out, etc. So we want it organized by many different data structures, and we need the O(1) of deleting it from the various structures when an I/O request dies.

We also shun dynamic allocations, aside from relatively rare memory pools for large structures that contain the various allocations we need. Intrusive style allows us so many nice things within this style:

* Avoiding the runtime costs of dynamic allocations

* Avoiding the memory costs of extra indirections incurred by dynamic allocations and STL (non-intrusive) style

* Avoiding handling out-of-memory errors in every single code path: there are almost no allocations anywhere, almost all functions become "void" error-free functions!

* Having optimal asymptotics for our operations

* Having reusable generic data structures in C without templates (we dislike C++)

Compared to all this, the STL stuff is just horrible. STL is widely considered to be a superb library, but I find it horrid.

dkersten · on Dec 4, 2013

It's nice to be able to have productive discussion on the internet!

I don't mind being wrong if it means I learn from it :)