Hacker News new | past | comments | ask | show | jobs | submit login

It's a little trickier when you do it left to right.

Suppose you have a DOM tree like this:

    div #alert - p, p, p
           - div #alert - p [lots of children]
                        - p - div #alert [lots of children]
[right to left]

In order to find all Ps with an #alert parent, you first get a set of all paragraph elements. Most likely a list of all paragraphs is kept in memory, so it can be enumerated quickly. Since DOM trees are generally well balanced the number of parents of any one paragraph is going to be roughly O( 2log(|DOMSIZE|) ) and it can be enumerated in C, so this is basically nothing (compared to how grossly inefficient javascript is). Total complexity: O( |P NODES| * 2log( |DOMSIZE| ) ). Memory complexity: 1 linked list of size of result set.

[left to right]

If we were to look at it from left to right, it gets trickier. You have to find all #alert nodes (easy & fast), but then you have to take all children of all #alert nodes. Every sub-tree of the DOM can be a significant fraction of the entire DOM tree, or perhaps the _entire_ dom tree. Not only that, but you CANNOT filter for all "p" children, because then you'd need to have lookup tables for every level in the tree (hugely inefficient). So you're left with one solution, that is to enumerate all children (and a DOM node can have a few thousand child nodes easily). But, wait, one #alert contains another #alert. We're already looking at O( |DOMSIZE| * |DOMSIZE| ), and now we have to consider uniqueness as well. We cannot alloc a hash map to keep track of uniqueness, because that would completely fragement your memory space; you'd end up allocating&deallocating hashmaps for every selector call. So you end up having to traverse the result set linked list for every node you add, to see if it isn't already in there. This, needless to say, is another O(N^2) detour. Total worst case complexity for M selectors:

    O( M * |DOMSIZE|^2 + M * |RESULTSET|^2 ).
Memory complexity: M + 1 linked list.

Difference becomes more profound as complexity increases:

    #uniq p p span b
(R2L) find all "b" nodes, append those with matching parents to linked list. Return linked list

(L2R) find all #uniq nodes, enumerate all "p" children. Make sure all "p" children are unique. Take all children of these "p" nodes, filter for "p" children, make sure result set is unique. Take all children of these "p" nodes, filter for "span", make sure.... and so it goes.

So that's why.




Your examples aren't valid HTML, so that approach is fine for quirks mode. But what about valid pages that declare a DOCTYPE? Shouldn't those be parsed from left to right, so the first matching ID that is encountered wins (with any other matching IDs considered errors and disregarded)?

OT, this was one of the problems XHTML was supposed to solve, since it was originally assumed by many that pages that weren't well-formed XML would not be rendered at all, forcing the developers to fix the code. This didn't happen, and developers are still at the mercy of how individual browsers implementat quirks mode.


You have to write R2L parsing logic anyway, because it's what you want most of the time (e.g. "div span a.quote"). Yes, in the special case where #uniq identifiers are used and you know the page to be valid (X)HTML and you know that you get a substantial speedup by filtering on the #uniq node first L2R parsing is preferable. But how realistic is that, really?

Isn't it just easier to teach web developers to write their DOM selectors in a specific way? The R2L approach is (a) easy to understand (b) has predictable (and stable) performance (c) doesn't malloc (d) is easy to implement. I see this as a simple case of "good enough".


But then the browser would have to validate the page, and then render or re-render the document, wouldn't it?


that approach is fine for quirks mode

And it is also fine in standards mode - double win.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: