Hacker News new | past | comments | ask | show | jobs | submit login
A Shocking Truth About CSS (twoalex.com)
141 points by _pius on March 9, 2010 | hide | past | favorite | 66 comments



Please, using more CSS selectors than DOMs isn't going to kill a site. In fact, it isn't even close to being a bottleneck. We are talking milliseconds. If you want speed gains, properly compress your images!


I've known about how selectors are processed for some time, I discovered it when I was looking into optimization strategies for JavaScript and figured I should look into how CSS is processed. I made a variety of tweaks, including finally just removing nearly all the CSS, and (somewhat disappointed) did not see any visible changes in the speed at which the browser rendered the page. Not even on IE6/7. Anecdotal evidence only, but to Jim72's point, I wouldn't waste time on it.


Wait, are you saying that a computer capable of executing millions of instructions per second doesn't take very long to do a few comparisons on a dozen or so elements? It takes me a long time to do by hand, so I figured the computer would be really slow too!


> a computer capable of executing millions of instructions per second

Billions, not millions.


Billions of cycles per second. But executing an instruction frequently takes 20 cycles or so. That reduces instructions to hundreds of millions.


The core i7 can issue 4 instructions, and execute 6 micro-ops per cycle. As long as your code is not a single huge dependency chain, you are mostly reading data out of L1 cache, and you don't use complex instructions like division, you can expect to get at least 2 instructions per clock. I have profiled real code (crypto stuff) that got a tad bit over 3 instructions per cycle. That's 10 billion instructions a second on the processor it ran on.

Most instructions have not taken over 2 cycles on x86 since P4. This is a mesofact, update your worldview. :)


Except that pipelining, multiple cores, and hyperthreading then bring it back up. So, let's not get overly pedantic here.


Well, it doesn't happen in real life, but an Intel i7 is capable of issuing 4 instructions per clock cycle (sometimes 5, with a little microcode cheating). A 3.2 GHz quad-core could issue 51.2 billion instructions per second.


Presumably any modern processor would use some form of pipelining, which I think would take instructions back up into the billions. (The pentium 4, if I recall correctly, had a 20+ stage pipeline)


From CSS to processor design in just 5 comments! Only on HN ;)


Long pipelines cause issues when branches get mis-predicted. The half processed instructions have to be thrown away.

What you want is a CPU with a short pipeline, a fast clock rate and if possible instruction reordering and multiple dispatch.

Large caches and good branch prediction help a lot too.


Scroll to the botom, and:

11:44:33 AM Alex K: that makes it sound like it may not be worth huge optimization?

11:44:46 AM Alex M: no, not really


Steve's final conclusion:

“For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. ”

http://www.stevesouders.com/blog/2009/03/10/performance-impa...


Exactly. I have an obsession with "optimized" markup, that means avoiding unnecessary elements, ids and classes. Let's not forget that all this stuff sits in markup and gets downloaded on each request. Even if that does not add up to much, I still want to have my code "clean". There is one particular site that inspires me: http://camendesign.com/ — take a look, not a single id or class.


He did this, however, at the expense of convoluted selectors, pseudo-elements, and pattern matching. Personally, I don't find his implementation to be very clean. I guess it's better to have the mess in your closet than wide out in the open, though.


It's not every day you can experience all the astonishment of overturning a fundamental assumption without any of the pain of forced changes. I enjoyed that conversation. (I'm Alex K, btw.)

Thanks for all the comments, everyone! Very cool to have that post picked up by Hacker News.


most of the time this certainly falls under pointless microptimisation, but if you are doing a complex web application with cpu intensive operations, it can be really handy to know how css is applied (and how reflow / redraw works) to help the user experience.


Google describes this and a bunch of other great stuff about how to optimize page rendering: http://code.google.com/speed/page-speed/docs/rendering.html#...


Can anyone point to where this is required by the css specification? Or is this an implementation detail? The reality may be that all the browsers do it this way, but why couldn't someone add some more smarts to choose selectors based on selectivity, just as a SQL query planner would?


It’s quicker to walk up a DOM tree than it is to walk down - that’s the root reason why browsers have all implemented it this way.


Yeah, but if we do something like this:

    #alert > p { background: yellow }
Surely the intelligent thing for the CSS engine is to look for the one-and-only #alert, rather than work backwards from all the <p>'s? (I am assuming that nodes with ids have their own fast index.)


It's a little trickier when you do it left to right.

Suppose you have a DOM tree like this:

    div #alert - p, p, p
           - div #alert - p [lots of children]
                        - p - div #alert [lots of children]
[right to left]

In order to find all Ps with an #alert parent, you first get a set of all paragraph elements. Most likely a list of all paragraphs is kept in memory, so it can be enumerated quickly. Since DOM trees are generally well balanced the number of parents of any one paragraph is going to be roughly O( 2log(|DOMSIZE|) ) and it can be enumerated in C, so this is basically nothing (compared to how grossly inefficient javascript is). Total complexity: O( |P NODES| * 2log( |DOMSIZE| ) ). Memory complexity: 1 linked list of size of result set.

[left to right]

If we were to look at it from left to right, it gets trickier. You have to find all #alert nodes (easy & fast), but then you have to take all children of all #alert nodes. Every sub-tree of the DOM can be a significant fraction of the entire DOM tree, or perhaps the _entire_ dom tree. Not only that, but you CANNOT filter for all "p" children, because then you'd need to have lookup tables for every level in the tree (hugely inefficient). So you're left with one solution, that is to enumerate all children (and a DOM node can have a few thousand child nodes easily). But, wait, one #alert contains another #alert. We're already looking at O( |DOMSIZE| * |DOMSIZE| ), and now we have to consider uniqueness as well. We cannot alloc a hash map to keep track of uniqueness, because that would completely fragement your memory space; you'd end up allocating&deallocating hashmaps for every selector call. So you end up having to traverse the result set linked list for every node you add, to see if it isn't already in there. This, needless to say, is another O(N^2) detour. Total worst case complexity for M selectors:

    O( M * |DOMSIZE|^2 + M * |RESULTSET|^2 ).
Memory complexity: M + 1 linked list.

Difference becomes more profound as complexity increases:

    #uniq p p span b
(R2L) find all "b" nodes, append those with matching parents to linked list. Return linked list

(L2R) find all #uniq nodes, enumerate all "p" children. Make sure all "p" children are unique. Take all children of these "p" nodes, filter for "p" children, make sure result set is unique. Take all children of these "p" nodes, filter for "span", make sure.... and so it goes.

So that's why.


Your examples aren't valid HTML, so that approach is fine for quirks mode. But what about valid pages that declare a DOCTYPE? Shouldn't those be parsed from left to right, so the first matching ID that is encountered wins (with any other matching IDs considered errors and disregarded)?

OT, this was one of the problems XHTML was supposed to solve, since it was originally assumed by many that pages that weren't well-formed XML would not be rendered at all, forcing the developers to fix the code. This didn't happen, and developers are still at the mercy of how individual browsers implementat quirks mode.


You have to write R2L parsing logic anyway, because it's what you want most of the time (e.g. "div span a.quote"). Yes, in the special case where #uniq identifiers are used and you know the page to be valid (X)HTML and you know that you get a substantial speedup by filtering on the #uniq node first L2R parsing is preferable. But how realistic is that, really?

Isn't it just easier to teach web developers to write their DOM selectors in a specific way? The R2L approach is (a) easy to understand (b) has predictable (and stable) performance (c) doesn't malloc (d) is easy to implement. I see this as a simple case of "good enough".


But then the browser would have to validate the page, and then render or re-render the document, wouldn't it?


that approach is fine for quirks mode

And it is also fine in standards mode - double win.


id attributes are unique in well formed pages, but then a lot of HTML is not well formed. Browsers do maintain a hashmap (or equiv) of id to node though, so in that case you’re right, that should be faster. Maybe the overhead of parsing and choosing multiple code paths is slower than just walking the tree?


Can you please explain why this is so?


A DOM node can have multiple children but only one parent. To go up the tree and check for a hit you simply call (in pseudocode) node.parent.isOK?.parent.isOK? etc. To go down the tree you have to not only choose your search strategy (depth-first, breadth-first) but also check a whole lot more nodes in the hope that you’ll find what you want.


(from the header:)

The Two Alexs

Alexes? Alex's? Alexi? Alexim?

In hackanonical form, Alexen. As in oxen, vaxen, boxen.


Hackanonical is my new favorite word. Thank you for that.


IIRC oxen is a vestigial dual (not plural) form, so quite appropriate.


According to http://www.etymonline.com/index.php?term=ox -- "Oxen is the only true survival in Mod.Eng. of the O.E. weak plural."

From what I can tell, it looks like OE nouns could have either "strong" or "weak" declensions, and this is different from the dual form. Someone on another forum claims that "brethren" is another example.


may well be true, my source is an old State of the Onion talk from Larry Wall, which although a linguist is probably not the most qualified in the world :)


So four ox = oxenen?


"Alices", by analogy with "codex".


It's a Greek name, so I'd go with that for plurals, rather than German or Latin templates.


Alexes is the best of the bunch.



Sorry about that, just made it up on the fly. Hacker(-speak) + canonical (form) portmanteau.


Alexen is awesome. I just added it to the header. (Hackanonical is even more awesome, but it sadly doesn't fit there.)

Thanks for the words!

Alex K

PS I appreciate how strong codex/codices example is, but, well, Alices, not so much.


Take Alex's, Alexi, Alexim, Alexen, and boxen out back to be euthanized along with the awful virii and anything else that would lead Cory Doctorow to chortle while typing.

Hackanonical, however, is without flaw. It should be in the name of this site!


As in regexen. :)


And Socksen.


micro optimization... blah.

Why do I really doubt Alex's yoga site needs to worry about this.


John Resig (of jQuery fame) talks about why this approach was taken in the design of the sizzle engine that powers jQuery. Its towards the end of this talk (which is quite good by the way), I would find the exact time but my work internet thinks its inappropriate content and won't let me watch the video :(


could you post a link to it?


Reminds me of how you may optimize your javascript DOM/style manipulations to avoid re-calculating the styles too often. It was mentioned in this Google I/O session on GWT performance: http://www.youtube.com/watch?v=q9hhENmVTWg#t=32m38s (slides located here: http://code.google.com/events/io/2009/sessions/MeasureMillis... )

This is where Speed Tracer comes in handy ( http://code.google.com/webtoolkit/speedtracer/ )


If you want to know how much CSS selector matching (sometimes called Style Recalculation) is affecting your site you can try Speed Tracer:

https://chrome.google.com/extensions/detail/ognampngfcbddbfe...

It's Chrome-only but you can sometimes get cross-browser insights. It's also a lot easier to conduct a cursory investigation using this tool than it is to start changing your selectors.


I do not like their blog gimmick.


FWIW, the post is a slightly cleaned up version of an actual conversation I had with my husband. (I'm K, he's M.) It's not a format I'm likely to reuse (or have used before), but it fit the occasion and captured very well our feelings of surprise at learning how CSS parses selectors.

Plus, it's always fun to try out new formats, even -- or maybe especially -- if they don't stick.


The definitive guide to this was written by then-Mozilla dev Dave Hyatt:

https://developer.mozilla.org/en/Writing_Efficient_CSS

That's from April 21st, 2000 - almost a decade ago.


11:44:33 AM Alex K: that makes it sound like it may not be worth huge optimization? 11:44:46 AM Alex M: no, not really 11:44:56 AM Alex M: I’m a purist though 11:45:18 AM Alex M: so this makes my world

Last two lines scares the hell out of me.


I don't want to read several pages of badly spoken chat english to find the conclusion. Could someone be so kind and explain what the shocking truth is?


That tables really are the best, CSS is good for changing fonts all at once though. I read it all for you so don't worry about double checking.


Thanks!


The "Two Alexes" are just discovering (Feb 2010) what Steve Souders explained in detail at http://www.stevesouders.com/blog/2009/06/18/simplifying-css-... [which is probably the site you want to visit on "How CSS strategies affect site performance"] (June 2009).


Being "scooped" by a mere 6 months, on a 10 year old technology doesn't really make their realization dated.

It's new to me, and I bet it's new to most news.yc-ers.


Scooped by 10 years on a 10 year old technology, not 6 months. http://news.ycombinator.com/item?id=1180429


/me mumbles something about Reading The Fine Source. Then you don't have to guess what your browser is doing.


Trying to figure out HTML from reading the Firefox source is kind of insane.


Yeah, OK. So just be ignorant instead.


do you read the source of the linux kernel to understand how to use bash? how about the source of a wifi driver before connecting to a network?

your premise is rediculous.


I don't do those things, but that's not because it's a bad method, but because I don't care. If there was some bug in some system call, or if I needed to investigate why hardware is misbehaving, of course I'd read the Linux source. That's one of the great things about open source software; you don't have to treat it like a black box. You can just look at the internals and see how they work.

(And I have proof that I do actually do this; I had a misbehaving joystick, and the linux fork in my github contains the fix.)

Anyway, in conclusion, if you want to know what algorithm Firefox uses to process CSS selectors, just read the Firefox source code. If you don't give a damn, then don't do that instead. But if you care and you do nothing, that's just silly.


No no, his premise is delicious!

yummy trollspam.


Don't think about how fast it is unless you have solid evidence that it is slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: