If I'm reading this right, their array resizing was linear rather than geometric -- definitely a radical difference from the usual implementations! That is, each time they called realloc, they added a constant amount of capacity instead of doing the usual thing, which is multiplying it by some constant (e.g. doubling it). Consider what happens in the most naive implementation, where realloc calls malloc+memcpy+free, taking time proportional to the size of the buffer to be copied: appending n times to such an array would take O(n^2) time, rather than O(n) as you'd get from a linked list or from an array that grows geometrically.
Of course they can get away with it on most systems, because most memory allocators don't have that kind of naive implementation. They may do a geometric growth policy automatically; they may switch to grabbing memory from the OS's virtual memory system for allocations above a certain size; they can do all sorts of things that have the convenient side-effect of making the constant-growth algorithm here more efficient. But it shouldn't come as any great surprise that things can get weird on some systems with different memory management, when they've taken such an unusual approach to their array resizing.
Nice data supporting why I was about an evidence-based approach to data structure or algorithm selection with realistic measurements. There was a lot of work in automating this with special compilers in 90's and early to mid 2000's. Not sure what they're doing now.
Another thing I used to bring up was using in-order execution and compiler-directed scratchpads (w/ option for manual) to get more predictable performance. Esp for real-time or preventing covert channels.
That is, in practice, results from complexity theory seems to be totally neglected.
In our opinion, what happens is that software is so complex, and there are so many
layers of software underneath the application code, that it is not even
clear which data structures are better; at least for the simple case we describe
here.
In principle, I applaud the initiative to put the sacred theories given from above to the test. We need more of this and much earlier. Every CS 101 course should be presenting something similar.
On the other hand, what where this guys expecting? They are testing a LinkedList(TM) for goodness sake!!! Even if you assume hardware that follows the abstract theoretical machine of the C standard to the letter, all the toying around with pointers should be a good predictor that things are going to be slow regardless of what Complexity Theory says. And, speaking of Complexity Theory, with the samples sizes they were using, all the Big Oh analysis does not apply. The limits of the equations are irrelevant, the one with the lowest fixed cost wins!
So, I find extremelly annoying all that mumbo-jumbo about how the so called complexity of software and the processor caches, etc, make analysis impossible. It is not that difficult; the guy who does not overthink stuff and packs everything close enough to cause the least cache misses wins (and the idiot that needlessly spreads data around and uses indirection everywhere looses).
I follow up article that compares slight variants of the same data structure to take advantage of modern hardware would provide great insights!!!
I spent the entire last weekend trying to figure why a Linked List is slow no matter the operation or data type, compared to a flat array.
In the end I put my Big O knowledge in the drawer and stripped linked lists from the program I was working on, and got the 100x improvement in overall performance I was looking for.
I've spent the last year wondering why compiling and linking took so long. Now I've been looking at the source of our toolchain and see that not only does it use linked lists for everything, it also appends everything at the end -- without storing a pointer to the last element in the head. Fixing the O(n) append improves performance a lot.
I was using linked-list to append things to the head, which was supposed to be it's optimal use case. I've refactored the function it's calling to take an n argument, which means to leave first n holes in the returned vector for the calling function to fill.
I had also used a linked list of characters to represent strings to iterate through them one by one. This was very slow compared to iterating through an array.
Linked-lists were supposed to be O(1) inserting at the head, and O(n) iterating through it. However, it's no match against array O(1) assignment and O(n) array iteration.
Big-O doesn't care about constant factors, but in practice you do. Linked lists are one of the shittiest data structures on modern computers because random memory access is one of the most expensive operations you can do. In fact, on modern machines a random access is not an O(1) operation, it's O(log n), due to the TLB.
I am not 100% sure I understand your description, but if you want standard drop-in-replacement for linked lists, you want the array deque. It can do fast insertion and deletion at both ends.
The only operation it cannot do is O(1) deletion of any element, even in the middle, that you have an iterator to. But in my experience it is rare that you need that.
I dunno, I've seen enough split-memory architectures(helllllo PS3) that you can't just say that cache-lines are the end-all-be-all. Although paying attention to them are certainly better than the current crop of dynamic languages.
Of course they can get away with it on most systems, because most memory allocators don't have that kind of naive implementation. They may do a geometric growth policy automatically; they may switch to grabbing memory from the OS's virtual memory system for allocations above a certain size; they can do all sorts of things that have the convenient side-effect of making the constant-growth algorithm here more efficient. But it shouldn't come as any great surprise that things can get weird on some systems with different memory management, when they've taken such an unusual approach to their array resizing.