And that was also my initial implementation of it in Sciter Engine (https://sciter.com) but after some testing I’ve found that these two structures work better:
struct node {
element* parent;
uint node_index; // its index in parent->children
}
struct element: node {
vector<node*> children;
}
a) better suits DOM manipulation/traversal needs,
b) more compact and faster,
c) more CSS friendly (for things like :nth-child(2)),
e) yet the structure ensures that there are no loops in the tree.
A problem with the vector representation is that it makes some common DOM operations (removeChild and insertBefore) O(N) in the number of children. This can be a major problem in some cases. Gecko actually used to have that sort of representation and we have been trying to move away from it, for precisely that reason.
If your DOM is more static, I agree it's a better representation.
removeChild and insertBefore on DOM tree are always O(N) operations or at least causing full scan of children later. No matter how children are stored.
Think about rules :nth-child(4) { border:1px solid; } and the like.
> removeChild and insertBefore on DOM tree are always O(N) operations
It depends. If you insert only one node, I agree you end up doing some O(N) work later in many cases (though not nearly always). If you remove/insert k nodes at the front, in the vector case you do O(kN) work while in the doubly-linked-list case you do O(k+N) work at worst. In the case when k == N the difference is between O(N) and O(N^2).
And yes, this is a problem in practice. We've seen popular web libraries doing this to remove all the kids of an element:
while (foo.firstChild)
foo.removeChild(foo.firstChild);
which will be O(N^2) in a vector implementation.
So the real question is what the amortized cost of an insert/remove is. Unless your web page is flushing out layout after every DOM modification (which is sadly too common), you will only redo selector matching every however many DOM mutations. In many cases "however many" is "thousands".
The cases where I've run into this being a problem involved thousands to tens of thousands of children. At 100 children there's really no problem, I agree.
> So we will have 5000 a[i] = a[i-1] operations in total.
Plus the index updates, unless you do those lazily, right? Again, not a problem at 100 kids, but at thousands of kids the cache misses start being pretty annoying.
> That's why I have Element.clear() and Element.prepend|append}insert([array of nodes]) methods.
Well, sure. Browsers could expose better APIs for this stuff. And web pages could use those APIs. And then browsers may face different tradeoffs in terms of internal representation.
As things stand, though, we don't have those better APIs, and if we added them right now it would take a decade or more for people to use them consistently enough that browsers could optimize based on that...
> As in any case calling of DOM method (as any other function ins script) is comparable to 100 of a[i] = a[i-1]
Last I measured, the overhead of a call to a DOM method (so not counting the work it actually does) in Firefox was on the order of 30-40 clock ticks. a[i] = a[i-1] is likely at least four instructions (two lea, two mov on x86 for example) unless you vectorize things. But probably about comparable in the vectorization case, yes.
> I mean that such O(N) problems are better to be solved by specialized native functions.
In an ideal world, I agree. I wish we lived in that world!
And just to be clear, the use case where you are doing repeated lookups/traversals of a linked list is uncommon?
If not, you could always go with a hybrid implementation, e.g., using a vector with a changelog and applying the modifications only when you traverse the structure, or using some persistent sequence data structure with high branching degree. Either one would use less memory than a linked list, and frankly, I've never encountered a situation where a performance critical data structure was best implemented as a doubly linked list... The memory overhead for large structures was always prohibitive.
No, that's common too. Think childNodes traversal.
So you do need something in addition to the linked list to make sure that at least in-order childNodes traversals are O(N), not O(N^2), in common cases.
Memory use is a concern, but not the overriding one. A very large web page (e.g. the HTML spec) has maybe 500k nodes. An extra 4 words of memory per node, on 64-bit, then translates to ~16MB of RAM. If you look at actual memory consumption on that page, you will see that 16MB is not _that_ huge a deal, unfortunately.
The big problem with a doubly linked list is loss of memory locality and the resulting cache misses on traversal. That's something browsers are still working on figuring out, honestly; there is various work around trying to arena-allocate nodes and whatnot, but it has various drawbacks as well (e.g. arenas being kept alive by a single node, which you can fix by making your nodes relocatable in memory, but then you need to use handles instead of pointers, etc, etc).
Sounds like it might be interesting to try JIT-style techniques for data here, to help cache footprint and memory usage. Strawman idea: a chameleon container with a opportunistic "fast path" implementation for the common case, and then a generic "deopt" implementation that would kick in on demand, per node. The "fast path" container could maybe have compressed pointers in 16-bit fields too.
Would there be any value in using something like a gap-buffer to store children? I know it's usually used with text, but I can't think of any reason it couldn't be used here. I have no idea though if it would be a good idea.
One element removal means invalidation of container's layout. That means you will need to relayout container. That is a scan of children again. And so on.
At a gut shot, only concern for this would be highly dynamic trees that are not either appending or prepending elements to children. I'd be curious to see benchmarks checking that.
With this, I'm always bemused that there are not more discussions on algorithms for doing tree traversal that don't use a queue or a stack. I think you have everything you need here to accomplish that.
Even prepending... One would have to touch every sibling to update the index. So prepends are now a linear operation on the count of siblings. Prepend a few hundred items to a node, and I'm sure one would see the pain in profiling.
Now, whether that's a realistic use case is another question. But at the same time, you know someone on the Internet is doing it that way...
I was assuming you'd use a standard deque that could hide the complexity of prepending. I didn't consider that the ordinal actually matters from the parent, though. So great point! :)
And yes, this was essentially my question. What are the realistic numbers in either scenario.
Right. If you don't cache the ordinal, than "next sibling" and "previous sibling" operations have to scan the array.
But it's a relatively easy performance enhancement to have a validity flag for the ordinals, and only rewrite them when a read requires them. That way, write-heavy workloads won't be punished.
Well, almost all HTML/CSS layout algorithms are O(N) complex so number of immediate children of an element is naturally limited by, say, 500-1000 sub-nodes. Otherwise the whole thing will not move. Note that GMail uses pagination with 500-800 elements on the page.
So we are speaking about vectors of pointers that contain 1000 elements. memmove() performance (used for quite rare operation of insertion of elements in the middle) on modern CPU is comparable with the need to access left/right elements in DL-lists for insertion. Consider cache misses, etc.
Unrelated question: How would you compare sciter to Electron? If I'm seeing this right, your thing is just for UI whereas electron has the whole application running in a browser-like environment, yes?
How is communication handled between business and HTML?
When you say pointers, do you actually mean pointers?
I've implemented something like you describe before in a game engine for a managed language.
Instead of using pointers though I used integers throughout.
The actual state was maintained in a flat vector of element blocks and were stable.
The nice thing about integers on a GC'd platform are that they are opaque to the GC so zero time was spent on pointer chasing.
The other nice thing is you could literally dump the entire state to disk or restore pretty much in one go.
Looks good though, and I agree (as you mention further down) that state can often be part of multiple list/trees simultaneously, so it makes better sense to keep them separate.
This is more of a general comment and may not necessarily be true in your implementation, but I see this argument brought up a lot: "vectors are arrays, therefore smaller (because only one pointer is needed per element) and faster than linked lists". However, one must remember that vectors are not statically sized arrays --- they dynamically size to hold their contents, and for the resizing to be (amortised) constant-time, there will on average always be some amount of unused space in the dynamic array, and you still need the (one) pointer to this dynamic array in the node itself, plus additional overhead to keep track of size/used. That's the "smaller" argument debunked. As for faster, that's not always true either, since simple forward/backward scans will require two indirections, one to get the dynamic array's address and another to get the pointer to the node itself. That's two memory accesses to areas which might not be close together at all. Compare this to one indirection (follow the next pointer) for a linked list, that is also made to an address in the vicinity of the rest of the node.
In summary: the linked-list will be faster for forward/backward scans and inserts/deletes and slower at indexing, while the vector will be faster for indexing and slower for forward/backward scans. The linked list will be smaller too.
If no insertions or deletions happen between accesses, a vector scan can also be done with only a single indirection, by simply incrementing the pointer to the node. This also means better locality than a linked list, where the next pointer may be close to the current node, while the next node can be anywhere. In a vector, they are simply next to each other.
That's true when you have a `vector<node>`, but the structure was described as a `vector<node*>`. So what's adjacent to your current position is the pointer to the next node, not the node itself.
Which one is optimal depends a lot on the size of a `node`, as moving elements around in the DOM will require memory copies in the first case.
struct Node does not need first_child pointer when it is TextNode/CommentNodes/etc. as they are just terminals.
Yet text node here
<div>text</text>
does not need next/previous pointers as it is lone child.
In case of DOM tree children-vector case takes always less than "children-dl-list" case. With only one exception - element with single child node when size of memory needed is the same as in "children-dl-list".
> As for faster, that's not always true either, since simple forward/backward scans will require two indirections, one to get the dynamic array's address and another to get the pointer to the node itself.
Can this be stored in a register when you're accessing multiple locations in the array in sequence?
> The new tree structure is simple and straightforward; it uses four pointers instead of the usual five to maintain connectivity: parent, first-child, next, and previous sibling
That straightforward four pointers structure is not the best for HTML DOM. Neither memory wise nor for typical algorithms applied to it.
With all due respect (it looks like some seriously impressive work you've done), how do you know they're using "typical algorithms?" Maybe they've devised something new or uncommon, or significant new optimizations of the "typical algorithms." In general I'd give a company like Microsoft the benefit of the doubt that they know what they're doing, and have considered and rejected alternative approaches — at least when it comes to objectively-measurable technical goals like "create a fast data structure to represent a DOM tree," if not always subjective big-picture goals like "make a good web browser" (even in that case, IE was, for its time, a good web browser — the best — until Ballmer let it languish).
Your data structure looks to me like it's roughly optimal for the case where the DOM tree is generated once and then is never or rarely modified, and nodes have several children relatively frequently and no children relatively rarely. But in the age of single-page-apps and other Javascript-heavy jQuery/Angular/React-type sites filled with DOM manipulation and AJAX, it's more important (especially in the exponentially-increasingly-common pathological case) to make it inexpensive to insert, delete and move nodes within the DOM tree.
With your data structure, every time you create a new `Node`, even one without any children, you have to allocate a new `vector`, which is a lot more expensive than just three pointers. Every time you insert or move a Node to an arbitrary location means an extremely expensive random-access insert into a vector (especially if it triggers a realloc), and every insertion, move or deletion in an arbitrary location requires updating the `node_index` of every sibling (old and new, in the case of a move), which also defeats the advantage of vectors over linked lists (avoiding indirection).
"Your data structure looks to me like it's roughly optimal for the case where the DOM tree is generated once and then is never or rarely modified"
Check gmail for example. It has just one list - element that has 100 children. Other elements have from 0 to 10 children.
And that 100 list is normally populated as a whole.
"even one without any children, you have to allocate a new `vector`"
You don't need to allocate anything if vector is empty ...
Vectors's data buffer gets allocated only when you put something in the vector. At least in my case. So empty vector is just one pointer - nothing to write home about it.
"every insertion, move or deletion in an arbitrary location requires updating the `node_index` of every sibling"
On any DOM insertion you need to scan all siblings almost always anyway. For quite many reasons, e.g. to reset styles that could use `this ~ that` or something like `:nth-child(odd)`
This does not work, element/node shall have stable address so to be O(1) addressable.
Elements/nodes can participate in multiple lists at the same time. Rendering lists (a.k.a. rendering tree) and things like Document.querySelectorAll() that returns NodeList.
Can I conclude that your earlier statement "yet the structure ensures that there are no loops in the tree." isn't correct? I could easily create a loop like this:
It is C++, and std::vector has an operator[] that makes a vector behave like a convention C 'pointer to data' for array accesses (in fact, the compiler likely will optimize the access to exactly what you would get if the declaration were of a C array of pointers to node)
I can see now why DOM manipulation was so catastrophic for performance on IE5-8, all that bookkeeping of character positions must have been a killer once the DOM tree was of any significant size.
Makes me wonder. If edits to DOM nodes on legacy IE were focused on attribute values only, and the string-lengths of those edited attributes never changed, whether one could bypass the bookkeeping and get good performance gains.
I see something like this and think about how much an ensemble approach to the data structure would help. This keeps popping up in places where you have a very general abstraction (like the DOM) and want to support a broad set of operations and use cases (read-only, insert heavy, traversal heavy, etc.) and so you often choose the structure which supports most of these cases pretty well. But you sacrifice a lot of perf by doing so.
What I'm wondering is how well you could do perf-wise by having the DOM be comprised of a heterogeneous set of structures, each of which is chosen based on the history of what operations have been performed on it (ie. we're appending a lot here, let's use a vector of children). This is all similar in spirit and goals of:
- JIT compiling, but for the data structures, and not code
This is interesting and something I've suspected for a long while. I remember struggling big-time with older IE versions giving some really strange results when doing to perform DOM node manipulations on improperly nested HTML (which in turn often came from copy-pasting from Word), erroneous results that definitively hinted towards DOM tree operations actually happening on flat substrings.
What MS is explaining in the article is that they've merely started using the same data structure as all the other engines.
It was probably a huge amount of work, but that's more like not dragging the web backwards than pushing it forward.
I used a cheap regex to look for these tags instead of parsing the DOM properly to avoid unnecessary allocations and parsing overhead. So after reading the article, I guess I'm still in the 90s!
> TreeWriter::AppendChild(parent, child);
That's soo the right approach. OO must die. And the argument why method calls are (typically) wrong already on a syntactic level can be found in the article.