Hacker News new | past | comments | ask | show | jobs | submit login
Intrusive lists in Doom 3 (gpfault.net)
148 points by liquidmetal on Dec 25, 2014 | hide | past | favorite | 66 comments



They moved away from intrusive lists to vectors in the BFG edition for performance reasons. More about that here: http://fabiensanglard.net/doom3_documentation/DOOM-3-BFG-Tec...


Very interesting post - it seems that, like many things in computing, intrusive lists make a lot of sense until you start thinking about cache locality. Perhaps on machines of the time, branch prediction was slow enough that conditional code would be a limiting factor... but on modern architectures, it's all about that cache, 'bout that cache, no thrashing.


Anything involving pointers to arbitrary memory locations sounds great until data locality becomes an issue. Although if such data structures (especially sorted trees) are sufficiently useful, an object pool can help.

But a linked list of any kind should basically never be used when performance matters.


You can strike the intrusive. Your comment is valid for all kinds of linked lists.


Hell it's valid for everything period. Even vectors! Splitting struct a into hot/cold chunks is pretty common. Minimize the sise of data being iterated on to maximize cache usage. Can result in big wins.


Huh, hadn't heard of hot/cold struct splitting before. This paper seems to indicate it has a nontrivial effect even in Java code: http://research.microsoft.com/en-us/um/people/trishulc/paper...


Instead of allocating each element of the list individually you can do this one better.

Allocate an array of elements. The prev and next don't point to memory but to indexes in the array.

How do you handle empty elements? Have a special flag that means "this element is blank" and use prev and next to point to the empty elements in the list! (So you are essentially storing two interleaved lists as the same time.)

Also store the first empty (and first full) element separately of course.

You will need to initialize the memory before you use it, filling all the prev and next pointers for the empty elements.

When you "allocate" an element you just update the prev and next pointers of all 5 elements, as needed.

If you think about it - malloc also needs to store a list of unused memory areas, so despite being a bit more complex you can speed things up quite a bit by doing this.

An optimization: Since filling all the empties can allocate memory, even if you might never use it, have an optimization that if the next pointer is 0 (or -1) that implicitly means the next array index, but an uninitialized index.

By doing this you can allocate a 1GB array without worry, and allow the OS to only fault in pages as needed (i.e. there is no need to ever resize the array).


That's great until you start caring about address space use or total process commit charge. All you've done is build a dubious special-purpose heap allocator.


This approach is more common in lower level software where each allocation is carefully managed. The Linux kernel uses them, for example:

http://www.makelinux.net/ldd3/chp-11-sect-5

as does the Windows kernel:

http://msdn.microsoft.com/en-us/library/windows/hardware/ff5...


And BSD / OS X. Check out queue(3).


And on AmigaOS, pretty much everything in Exec (the OS kernel) was held together by doubly-linked-lists (of the intrusive kind). Coming from C I'd say that std::list is the 'strange kind' of lists, and intrusive lists are the 'common kind'. On the other hand I haven't been using lists for years, since growable arrays (like std::vector) are almost always faster.


I really prefer intrusive data structures, for the exact reasons outlined in the article.

I've myself written many implementations of intrusive structures. But now I'd like to present the greatest of them, a super-generic implementation of an intrusive AVL tree in C. See [1], and example [2].

- It relies on a user-defined concept of a "link". In particular, this means the structure can be in an array and linked with array indices, and can be copied correctly with a single memcpy. Of course the user may still use pointers as links.

- It supports implementation of user-defined associative operations (e.g. sum). This consists of operations: CAvl_AssocSum: Calculate the sum of all the nodes. CAvl_ExclusiveAssocPrefixSum: Calculate the sum of elements from the first one up to but not including node X. CAvl_FindLastExclusiveAssocPrefixSumLesserEqual: Find the first node where the sum of all preceding nodes is >=X. Of course these all have O(log n) time complexity.

- The implementation is made generic by include files which rely on macros a lot. Other than the macro madness, I think this is better then C++ templates, due to easy #if, where in templates one would need to jump through hoops due to non-existence of static_if.

[1] https://github.com/ambrop72/badvpn/tree/master/structure (see CAvl_* [2] https://github.com/ambrop72/badvpn/blob/master/examples/cavl... (also see cavl_test_tree.h in the same dir defining some parameters for the structure)


> The implementation is made generic by include files which rely on macros a lot. Other than the macro madness, I think this is better then C++ templates, due to easy #if, where in templates one would need to jump through hoops due to non-existence of static_if.

And either throw away type safety completely or end up with a huge macro soup (which is even worse to debug than template soup). I also don't understand your static_if remark.

I'm not saying intrusive lists are never useful but saying that intrusive lists are generally better than the STL is just terrible advice.

I'm not a huge fan of C++ but if there's something I miss dearly when writing C it's generic containers.


Well I don't want to get into a flame-war regarding intrusive/containing or C/C++, but I will say that type safety and intrusive/container are orthogonal things. One can have a type-unsafe container as well as a type-safe (generic!) intrusive structure. See my code for proof :) I think we'll agree that it's not too complicated and easy to use.

Implementation: https://github.com/ambrop72/aprinter/blob/master/aprinter/st...

Usage: https://github.com/ambrop72/aprinter/blob/master/aprinter/sy... (the BusyEventLoop keep track of QueuedEvent objects).

If you wonder what's the purpose of the Accessor, it's for when you can't use the simple member-pointer based interface due to C++'s eager evaluation, and can hack around it with forward declaration of a custom Accessor class, followed by the complete definition later.

I think Boost intrusive structures are type safe too, but I'm not 100% sure. But Boost is another matter altogether.


I like intrusive lists. Another consideration is that a std::vector<FooPtr> is as fast to iterate over as intrusive list of Foos. The memory access pattern is the same. (Note: typed FooPtr instead of Foo Asterisk because the single asterisk italicized everything)

Another perk is that an element can remove itself from the list without actually having to know anything about the list. This makes a lot of shutdown code super easy and clean.

I do like intrusive lists to be inherited I think. That makes the interface for adding or removing objects from lists super clean. It also eliminates the extra obj pointer since the obj root address is known at compile time. Storing one object in multiple intrusive lists gets a little hairy and you need a tag parameter but that's a relatively rare need in my experience so it's fine.

One thing I don't like is lack of good debugging info in visual studio. For c++ at least their customization tools are seemingly nOT powerful enough to handle intrusive lists well. Not being able to clearly see all objects in the watch window is wildly frustrating. Things that make code harder to debug make me very, very cranky.


> Another consideration is that a std::vector<FooPtr> is as fast to iterate over as intrusive list of Foos.

A "normal" linked-list of strings or vectors always makes me sad -- the reasoning goes,

- The list elements need to live on the heap because they might live forever, - The string object has to be fixed-size because objects are always (at least officially) fixed-size. - The string storage has to live on the heap, because it can be arbitrarily large.

So printing out a linked-list of strings gets two pointer dereferences per string instead of one. I guess that's the cost of abstraction, though.

(This actually fits into the C++ zero-cost-abstraction narrative pretty well, though:

- Linked-lists are not zero cost, - C++ programmers don't like non-zero-cost abstractions, - C++ programmers don't like linked lists.

:)


> C++ programmers don't like non-zero-cost abstractions

I think you’re confusing zero cost with zero overhead.

What C++ programmers like is that the compiler does not add overhead, for example calling `a[10]` will read the word at address `a + 10` and return that. No bounds checks are inserted, no function call is done to lookup the element in some implementation defined sparse data structure, etc.

But C++ programmers are generally not opposed to using things that has a cost associated with it (basically everything has a cost) — for example `std::map` is quite popular and certainly not “zero cost”, though it has its running time defined, and the implementation is actually using a self-balancing search tree rather than a hash table because we can give guarantees about the running time of the former, not the latter.


std::map is implemented using a tree and not a hash table because it's specified as an ordered data structure.


I am quite sure (based on interviews I’ve read) that Alexander Stepanov’s primary concern was having a data structure with known (fixed) complexity rather than a sorted container. The latter is just a nice side-effect.


In the worst case scenario the time complexity of a hash table equals that of its bucket, which doesn't have to be a list. So the only advantage of the map over the unordered_map seems to be the ease of use, it only needs a comparer.


unordered_map pretty much has to have linked buckets due to its interface and complexity requirements.


"There can be a lot of projectiles, they appear and disappear quickly, and every time that happens we incur the cost of allocating/deallocating memory multiple times."

I write a lot of shoot-em-ups (both 2d and 3d). I never do this. I preallocate all of the bullet entities on level initialization and never destruct them during runtime. If the number is large enough, I do sort them by status (alive or dead) and exit processing when I encounter a dead projectile, but usually that's unnecessary.


Do you pre-allocate all bullets per shooter? Or have one big bucket that all shooters use?

I take it that having a finite number of bullets in the game at any time is obviously an acceptable trade-off.


Just like that old joke -

  There are two ways to wash dishes -
  after a meal or before it.
Same with data containers, there are two mindsets. Either data is primary and can be optionally organized into containers or containers are primary and they essentially own the data that's put in them. STL model is of latter and "intrusive" lists are of former. I leave it up to you to decide which one is an ass-backward approach in the name of type safety :)


It depends on whether you have generic code (e.g. map, filter, sort etc.) that you want to apply to the list, vs efficiency concerns, or whether your code needs to iterate through the list from an item.

There's no one true way here. I personally find having a library of operations like map and filter more productive for most code that I write, rather than needing to specialize them per structure.


I didn't think these "non-intrusive lists" were actually used much in real world programs. Even if allocations weren't a big issue performance-wise, the sheer amount of work required manage the allocations themselves would put me off using them.


Just use Boost.Intrusive: http://www.boost.org/doc/libs/1_57_0/doc/html/intrusive.html

It makes the job much easier.


Except: boost.


You can also make these using this structure:

    template <class T>
    struct node { node<T> *next, *prev; };

    template <class T>
    struct intrusive_list : private node<T> { };
Your type foo then subclasses the node<foo> type, and in some places intrusive_list<foo> has to statically cast them back down.

This means you don't need that messy "T *owner;" pointer anywhere.

The downside is that if you want your type to be a member of multiple intrusive lists, you have to make spurious parent classes and inherit from those.


One possible alternative is having your objects be members of "lists of slightly different type". For example,

    template <ptrdiff_t offset, class T>
    struct node {
        node<offset> *next, *prev;
        T& operator*() { return *(this-offset); }
    };
Maybe? It has been a while since I've written C++. I guess you have a big problem setting up your base class in a portable way, though, because the types of its member nodes depend on their offsets, and I don't think you can do that.

    class Foo {
        int data;
        node<offsetof(Foo, list1_elem), Foo> list1_elem;
        node<offsetof(Foo, list2_elem), Foo> list2_elem;
    };
:/


I'm not sure if that would work, because computing the offset of list1_elem could depend on the template instantiation of list1_elem because that could (ostensibly) affect the alignment. But you can drop the template altogether on the list element type and put the offsetof expression in the list type declaration. An example of that is linked by some other comment in this thread: http://www.codeofhonor.com/blog/avoiding-game-crashes-relate...

The base class option is, I think, best only for the case where you have only one kind of list you want the object to be part of.

Also your pointer should be cast to char before you do offsetof arithmetic on it.


I don't understand. Is it really well explained ?

Aren't ListNode just a way to index a list ? Isn't this Point2D just "marked" to be on a certain list ?

Why does he say "less dereferencing" ?

> Dereference the appropriate "next" pointer, get the next object from memory.

the prev/next pointer is in the ListNode, not in the object itself, so it's obviously a dereference...

I did not understand it very well... I can understand that a linked list will be faster thanks to branch prediction, but what does it have to do with an intrusive list ?


> the prev/next pointer is in the ListNode, not in the object itself, so it's obviously a dereference...

The ListNode is embedded in the Point, so their storage is actually combined. The memory layout would be something like

  0x0  x
  0x4  y
  0x8  next
  0xc  prev
  0x10 owner
with the total structure being 20 bytes if pointers are 32-bit.

> Aren't ListNode just a way to index a list ? Isn't this Point2D just "marked" to be on a certain list ?

The list itself is formed entirely by the ListNodes themselves. There is no backing array to index into; the storage for points (and nodes, which are contained in points) is created with an individual malloc per point. The way you reference a list is by holding a reference to the head, which is just a regular ListNode.


If you use a std::List<T>, doesn't it uses nodes like

    T data
    p next
    p previous
    p owner
anyway? It shouldn't create a pointer to the data, just like a std::vector<T> isn't an array of pointers to T.


Let's say I have a bunch of Bullet objects, and each Bullet belongs to multiple lists. In that case, std::list<Bullet> doesn't work and I have to use std::list<Bullet*> instead, thus the extra pointer dereference.


Only without the owner pointer. But intrusive lists don't need an owner pointer either.


so it's a preallocation technique, a little like an object pool


The thing about linked lists is their Big O time looks nice, in theory, but in reality they do not work well with how most CPU's are designed. (Because it makes the CPU cache mostly useless, and clock speeds aren't really improving, so the CPU cache is pretty much the driver of better performance at this point). I'd rather just use a vector/array and take the hit when I need to resize it, most of the time.


Oh, look, it's my post :)


Here's another example [1] of the same. Blizzard games from WC2 to SC2 use this code nearly verbatim (testing 1 instead of INT_MAX for 64-bit compat in newer titles).

1. http://www.codeofhonor.com/blog/avoiding-game-crashes-relate...


Wayland also uses intrusive circular linked lists - but it doesn't need owner pointer, since that pointer can be computed from pointer to link, and offset of that link in owner structure. I wonder if doom 3 really needed it.


It's funny how this technique looks like something coming from a newbie programmer. This reminds me parallel areas. Looks like done by a profane, but hey, it's actually cache-efficient these days.


Does anybody know what would the performance gain be in this case?


There isn't one. std::list<T> has the same space requirements, memory locality and capabilities of an intrusive list. Period.

The naivety of an 'intrusive' list, where all your objects are always 'list ready', is that it allows the insertion and removal of elements without copying them, which is beneficial if your lists experience a lot of churn. The thing is, you can achieve this without intrusion anyway by keeping all your 'not in a list' items... well, in a list. You then get memory management for free and can use splice()[0] to efficiently move nodes around.

My philosophy on lists is: if you've never used or needed to splice then you probably don't really need a linked list. Splicing is what linked lists are all about.

[0] http://en.cppreference.com/w/cpp/container/list/splice


What about modifying the elements while keeping them on multiple lists?

With intrusive list you need only modify any element on a single place (A bullet location as an example).

How could you achieve this with std::list?


I don't know what you mean. How do you keep an object on multiple lists with only one pair of pointers?


That's the point of intrusive lists. You add a pair of pointers for every list the object is in. So when you do your physics pass using the "moving objects" list the value of the bullet is automatically updated in the "renderable objects" list without having to go trough renderable objects separately.

It was explained in the original article "If you want your structure to be in two lists at the same time, you have to add another field: " with an example.


For games and other micro-optimised uses yeah... super-fine control over where you put your pointer pairs with respect to other data members and cache lines could be warranted. If you're optimising at that level then fine, I'd do it (with Boost Intrusive). Otherwise I'd pick up Boost Multi_Index with a couple of 'sequenced' (linked node) or random_access (vector of pointers) indices. This is non-intrusive and has the construction and overhead you want.

I just wanted to spell out that in most use cases intrusion is a sucky trade-off with no benefit.


If you're storing pointers to elements in the list, you only have to change the pointed object.


OK, then.

I'll give you a pointer to an element and you remove it from all containers that it is currently on, in O(1).

The only way to do that is to keep a bunch of iterators in the item itself, which is virtually identical to the intrusive container model except it involves more pointers, more heap allocations and introduces an extra invariant into the data model (the iterator stored in the item must point at a list that actually contains said item). And the only benefit you get for all this pain and suffering is that you don't have to do any "funny" casting via container_of() contraptions.

Point being is that once data items sit in more than one container, intrusive container model generally results in a simpler code that is also more efficient.


Then "Less Dereferencing" is not achieved as there are 2 pointer dereferences per access to an element instead of just one, and that goes against the claim "std::list<T> has the same space requirements, memory locality and capabilities of an intrusive list. Period."

I have no idea how to implement something that has same memory locality and capabilities as intrusive list with std::list and that's why I'm asking how it's done. Storing a pointer does not have same memory locality.


Doesn't std::list<T> generate different code for each type T, at least naively? The generated code could end up taking up more space even if the data structures on which it operates look the same in memory.


13.7% on average


73.6% of statistics are made up. I had to look that up: http://www.businessinsider.com/736-of-all-statistics-are-mad...


I think I remember Turbo Pascal's container library using a similar approach.


> " Every time a new projectile is created it has to be added to a bunch of lists. There can be a lot of projectiles, they appear and disappear quickly, and every time that happens we incur the cost of allocating/deallocating memory multiple times." Lists can be pre-allocated though


I am somewhat confused by this article since to me it sounds like the writer has wanted the reader to think intrusive lists are "trickier and weird", when they have been my bread and butter during all my C work... And I guess I am not alone.


You're not alone. I spent almost 20 years programming with intrusive lists before I ever heard about non-intrusive lists. Of course, they weren't known as "intrusive lists", everybody called them "linked lists".


They are trickier though because you either have to create a new type for each list and all all accompanying functions to work on that type (the PointListNode in the article) or you can make it more generic by discarding type safety and using tricks like "container_of" (that's actually my preferred approach when writing C, at least it lets me use the linux kernel's list.h and I don't have to reinvent the wheel).

In C you don't have much of a choice to make a generic container but in C++ the templated std::list is much safer.

I'm a bit wary of that article to be honest, it's interesting but it oversells the intrusive list a bit IMO and I can definitely imagine inexperienced C++ coders deciding to use intrusive lists instead of the STL because of it. But at any rate that would be very premature optimization in my book. Even if intrusive lists happen to be faster in a certain use case (and I'd like to see benchmarks before I believe it) I'd feel much more comfortable using std::list while developing/debugging.


The problem with std::list is that it's almost always worse than the alternatives. I would recommend using std::vector as the "default" container, and then use another container if the performance of std::vector is not good enough. Or, put another way, I think std::list is a premature optimization, because you are saying that insertions and deletions are frequent enough that you need them to be O(1), even at the expense of extra memory usage and tons of cache misses for simple traversal.

The std::vector container will outperform std::list surprisingly often. Just think of std::vector as your default container.


Yeah, I was expecting some secret magic and it was just a basic linked-list like was used to weed out incoming freshman at my university who couldn't understand pointers.


I think they are more likely to feel "tricky and weird" when coming from a higher level language like c++ or Java that has linked list data structures in their standard library.


Higher- or lower- level domains, I feel, would be more accurate; when I code my c-with-templates embedded code there's no STL, so I'm quite aware of intrusive containers, even though I rarely code straight-C. My favorite variant is multiply-included linked lists---which I've seen in a number of one-off embedded JITs.


The article contains many false claims and its main statement that intrusive lists are better than non-intrusive ones is simply wrong.

Outright wrong:

1) There is no additional dereferencing needed when using his 'ListNode' implementation, since the object is stored directly in the ListNode.

2) If you want objects to be non-pointer members of ListNodes of several lists, you can achieve that with both intrusive and non-intrusive lists in exactly the same way (contrary to the authors claim).

The article is a disgusting example of someone trying to proof a point he heard somewhere else. So it is not his original idea and he is failing hard.

Why are there no comments on HN (except for jokoon) addressing that? Instead everybody uses the comment system as a platform to show off his own bad ass C++ skills... stupid. You deserve each other


Your contribution would be very welcome here and upvoted, if you kept it factual (your points #1 and #2 are interesting and fine; your speculation that the author heard this elsewhere is almost okay, if you had elaborated.)

But in your last two paragraphs you switch to using the language of trolls ('disgusting example', 'stupid', 'you deserve each other'). no need - you went from being an obvious upvote to an obvious downvote and probably risk being hellbanned. (Where you see your contributions but nobody else does - then nobody can see you no matter how good your points are.)

see commenting guidelines: https://news.ycombinator.com/newsguidelines.html

I hope you will read that and be a positively contributing part of this community.


How exactly can the same object (not a copy) be an element of two non-intrusive lists?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: