A Shocking Truth About CSS

Jim72 · on March 9, 2010

Please, using more CSS selectors than DOMs isn't going to kill a site. In fact, it isn't even close to being a bottleneck. We are talking milliseconds. If you want speed gains, properly compress your images!

flatline · on March 9, 2010

I've known about how selectors are processed for some time, I discovered it when I was looking into optimization strategies for JavaScript and figured I should look into how CSS is processed. I made a variety of tweaks, including finally just removing nearly all the CSS, and (somewhat disappointed) did not see any visible changes in the speed at which the browser rendered the page. Not even on IE6/7. Anecdotal evidence only, but to Jim72's point, I wouldn't waste time on it.

jrockway · on March 9, 2010

Wait, are you saying that a computer capable of executing millions of instructions per second doesn't take very long to do a few comparisons on a dozen or so elements? It takes me a long time to do by hand, so I figured the computer would be really slow too!

Tuna-Fish · on March 9, 2010

> a computer capable of executing millions of instructions per second

Billions, not millions.

btilly · on March 9, 2010

Billions of cycles per second. But executing an instruction frequently takes 20 cycles or so. That reduces instructions to hundreds of millions.

Tuna-Fish · on March 9, 2010

The core i7 can issue 4 instructions, and execute 6 micro-ops per cycle. As long as your code is not a single huge dependency chain, you are mostly reading data out of L1 cache, and you don't use complex instructions like division, you can expect to get at least 2 instructions per clock. I have profiled real code (crypto stuff) that got a tad bit over 3 instructions per cycle. That's 10 billion instructions a second on the processor it ran on.

Most instructions have not taken over 2 cycles on x86 since P4. This is a mesofact, update your worldview. :)

kscaldef · on March 9, 2010

Except that pipelining, multiple cores, and hyperthreading then bring it back up. So, let's not get overly pedantic here.

sp332 · on March 9, 2010

Well, it doesn't happen in real life, but an Intel i7 is capable of issuing 4 instructions per clock cycle (sometimes 5, with a little microcode cheating). A 3.2 GHz quad-core could issue 51.2 billion instructions per second.

jacoblyles · on March 9, 2010

Presumably any modern processor would use some form of pipelining, which I think would take instructions back up into the billions. (The pentium 4, if I recall correctly, had a 20+ stage pipeline)

bruceboughton · on March 9, 2010

From CSS to processor design in just 5 comments! Only on HN ;)

gridspy · on March 9, 2010

Long pipelines cause issues when branches get mis-predicted. The half processed instructions have to be thrown away.

What you want is a CPU with a short pipeline, a fast clock rate and if possible instruction reordering and multiple dispatch.

Large caches and good branch prediction help a lot too.

goodside · on March 9, 2010

Scroll to the botom, and:

11:44:33 AM Alex K: that makes it sound like it may not be worth huge optimization?

11:44:46 AM Alex M: no, not really

jbox · on March 9, 2010

Steve's final conclusion:

“For most web sites, the possible performance gains from optimizing CSS selectors will be small, and are not worth the costs. ”

http://www.stevesouders.com/blog/2009/03/10/performance-impa...

rimantas · on March 9, 2010

Exactly. I have an obsession with "optimized" markup, that means avoiding unnecessary elements, ids and classes. Let's not forget that all this stuff sits in markup and gets downloaded on each request. Even if that does not add up to much, I still want to have my code "clean". There is one particular site that inspires me: http://camendesign.com/ — take a look, not a single id or class.

jqueryin · on March 9, 2010

He did this, however, at the expense of convoluted selectors, pseudo-elements, and pattern matching. Personally, I don't find his implementation to be very clean. I guess it's better to have the mess in your closet than wide out in the open, though.

arsduo · on March 9, 2010

It's not every day you can experience all the astonishment of overturning a fundamental assumption without any of the pain of forced changes. I enjoyed that conversation. (I'm Alex K, btw.)

Thanks for all the comments, everyone! Very cool to have that post picked up by Hacker News.

daleharvey · on March 9, 2010

most of the time this certainly falls under pointless microptimisation, but if you are doing a complex web application with cpu intensive operations, it can be really handy to know how css is applied (and how reflow / redraw works) to help the user experience.

stanleydrew · on March 9, 2010

Google describes this and a bunch of other great stuff about how to optimize page rendering: http://code.google.com/speed/page-speed/docs/rendering.html#...

bwill · on March 9, 2010

Can anyone point to where this is required by the css specification? Or is this an implementation detail? The reality may be that all the browsers do it this way, but why couldn't someone add some more smarts to choose selectors based on selectivity, just as a SQL query planner would?

robin_reala · on March 9, 2010

It’s quicker to walk up a DOM tree than it is to walk down - that’s the root reason why browsers have all implemented it this way.

neilk · on March 9, 2010

Yeah, but if we do something like this:

    #alert > p { background: yellow }

Surely the intelligent thing for the CSS engine is to look for the one-and-only #alert, rather than work backwards from all the <p>'s? (I am assuming that nodes with ids have their own fast index.)

gizmo · on March 9, 2010

It's a little trickier when you do it left to right.

Suppose you have a DOM tree like this:

    div #alert - p, p, p
           - div #alert - p [lots of children]
                        - p - div #alert [lots of children]

[right to left]

In order to find all Ps with an #alert parent, you first get a set of all paragraph elements. Most likely a list of all paragraphs is kept in memory, so it can be enumerated quickly. Since DOM trees are generally well balanced the number of parents of any one paragraph is going to be roughly O( 2log(|DOMSIZE|) ) and it can be enumerated in C, so this is basically nothing (compared to how grossly inefficient javascript is). Total complexity: O( |P NODES| * 2log( |DOMSIZE| ) ). Memory complexity: 1 linked list of size of result set.

[left to right]

If we were to look at it from left to right, it gets trickier. You have to find all #alert nodes (easy & fast), but then you have to take all children of all #alert nodes. Every sub-tree of the DOM can be a significant fraction of the entire DOM tree, or perhaps the _entire_ dom tree. Not only that, but you CANNOT filter for all "p" children, because then you'd need to have lookup tables for every level in the tree (hugely inefficient). So you're left with one solution, that is to enumerate all children (and a DOM node can have a few thousand child nodes easily). But, wait, one #alert contains another #alert. We're already looking at O( |DOMSIZE| * |DOMSIZE| ), and now we have to consider uniqueness as well. We cannot alloc a hash map to keep track of uniqueness, because that would completely fragement your memory space; you'd end up allocating&deallocating hashmaps for every selector call. So you end up having to traverse the result set linked list for every node you add, to see if it isn't already in there. This, needless to say, is another O(N^2) detour. Total worst case complexity for M selectors:

    O( M * |DOMSIZE|^2 + M * |RESULTSET|^2 ).

Memory complexity: M + 1 linked list.

Difference becomes more profound as complexity increases:

    #uniq p p span b

(R2L) find all "b" nodes, append those with matching parents to linked list. Return linked list

(L2R) find all #uniq nodes, enumerate all "p" children. Make sure all "p" children are unique. Take all children of these "p" nodes, filter for "p" children, make sure result set is unique. Take all children of these "p" nodes, filter for "span", make sure.... and so it goes.

So that's why.

qjz · on March 9, 2010

Your examples aren't valid HTML, so that approach is fine for quirks mode. But what about valid pages that declare a DOCTYPE? Shouldn't those be parsed from left to right, so the first matching ID that is encountered wins (with any other matching IDs considered errors and disregarded)?

OT, this was one of the problems XHTML was supposed to solve, since it was originally assumed by many that pages that weren't well-formed XML would not be rendered at all, forcing the developers to fix the code. This didn't happen, and developers are still at the mercy of how individual browsers implementat quirks mode.

gizmo · on March 9, 2010

You have to write R2L parsing logic anyway, because it's what you want most of the time (e.g. "div span a.quote"). Yes, in the special case where #uniq identifiers are used and you know the page to be valid (X)HTML and you know that you get a substantial speedup by filtering on the #uniq node first L2R parsing is preferable. But how realistic is that, really?

Isn't it just easier to teach web developers to write their DOM selectors in a specific way? The R2L approach is (a) easy to understand (b) has predictable (and stable) performance (c) doesn't malloc (d) is easy to implement. I see this as a simple case of "good enough".

isleyaardvark · on March 9, 2010

But then the browser would have to validate the page, and then render or re-render the document, wouldn't it?

DougBTX · on March 10, 2010

that approach is fine for quirks mode

And it is also fine in standards mode - double win.

robin_reala · on March 9, 2010

id attributes are unique in well formed pages, but then a lot of HTML is not well formed. Browsers do maintain a hashmap (or equiv) of id to node though, so in that case you’re right, that should be faster. Maybe the overhead of parsing and choosing multiple code paths is slower than just walking the tree?

niyazpk · on March 9, 2010

Can you please explain why this is so?

robin_reala · on March 9, 2010

A DOM node can have multiple children but only one parent. To go up the tree and check for a hit you simply call (in pseudocode) node.parent.isOK?.parent.isOK? etc. To go down the tree you have to not only choose your search strategy (depth-first, breadth-first) but also check a whole lot more nodes in the hope that you’ll find what you want.

ableal · on March 9, 2010

(from the header:)

The Two Alexs

Alexes? Alex's? Alexi? Alexim?

In hackanonical form, Alexen. As in oxen, vaxen, boxen.

pohl · on March 9, 2010

Hackanonical is my new favorite word. Thank you for that.

riffraff · on March 9, 2010

IIRC oxen is a vestigial dual (not plural) form, so quite appropriate.

bdr · on March 9, 2010

According to http://www.etymonline.com/index.php?term=ox -- "Oxen is the only true survival in Mod.Eng. of the O.E. weak plural."

From what I can tell, it looks like OE nouns could have either "strong" or "weak" declensions, and this is different from the dual form. Someone on another forum claims that "brethren" is another example.

riffraff · on March 9, 2010

may well be true, my source is an old State of the Onion talk from Larry Wall, which although a linguist is probably not the most qualified in the world :)

jamesbritt · on March 9, 2010

So four ox = oxenen?

goodside · on March 9, 2010

"Alices", by analogy with "codex".

davidw · on March 9, 2010

It's a Greek name, so I'd go with that for plurals, rather than German or Latin templates.

Torn · on March 9, 2010

Alexes is the best of the bunch.

barredo · on March 9, 2010

Damn Google http://www.google.com/search?ie=UTF-8&q=Hackanonical

ableal · on March 9, 2010

Sorry about that, just made it up on the fly. Hacker(-speak) + canonical (form) portmanteau.

arsduo · on March 9, 2010

Alexen is awesome. I just added it to the header. (Hackanonical is even more awesome, but it sadly doesn't fit there.)

Thanks for the words!

Alex K

PS I appreciate how strong codex/codices example is, but, well, Alices, not so much.

blasdel · on March 10, 2010

Take Alex's, Alexi, Alexim, Alexen, and boxen out back to be euthanized along with the awful virii and anything else that would lead Cory Doctorow to chortle while typing.

Hackanonical, however, is without flaw. It should be in the name of this site!

techiferous · on March 9, 2010

As in regexen. :)

eru · on March 9, 2010

And Socksen.

CoryMathews · on March 9, 2010

micro optimization... blah.

Why do I really doubt Alex's yoga site needs to worry about this.

ihumanable · on March 9, 2010

John Resig (of jQuery fame) talks about why this approach was taken in the design of the sizzle engine that powers jQuery. Its towards the end of this talk (which is quite good by the way), I would find the exact time but my work internet thinks its inappropriate content and won't let me watch the video :(

todd3834 · on March 9, 2010

could you post a link to it?

eneveu · on March 18, 2010

Reminds me of how you may optimize your javascript DOM/style manipulations to avoid re-calculating the styles too often. It was mentioned in this Google I/O session on GWT performance: http://www.youtube.com/watch?v=q9hhENmVTWg#t=32m38s (slides located here: http://code.google.com/events/io/2009/sessions/MeasureMillis... )

This is where Speed Tracer comes in handy ( http://code.google.com/webtoolkit/speedtracer/ )

kellegous · on March 9, 2010

If you want to know how much CSS selector matching (sometimes called Style Recalculation) is affecting your site you can try Speed Tracer:

https://chrome.google.com/extensions/detail/ognampngfcbddbfe...

It's Chrome-only but you can sometimes get cross-browser insights. It's also a lot easier to conduct a cursory investigation using this tool than it is to start changing your selectors.

smackfu · on March 9, 2010

I do not like their blog gimmick.

arsduo · on March 9, 2010

FWIW, the post is a slightly cleaned up version of an actual conversation I had with my husband. (I'm K, he's M.) It's not a format I'm likely to reuse (or have used before), but it fit the occasion and captured very well our feelings of surprise at learning how CSS parses selectors.

Plus, it's always fun to try out new formats, even -- or maybe especially -- if they don't stick.

jamesr · on March 10, 2010

The definitive guide to this was written by then-Mozilla dev Dave Hyatt:

https://developer.mozilla.org/en/Writing_Efficient_CSS

That's from April 21st, 2000 - almost a decade ago.

scotty79 · on March 10, 2010

11:44:33 AM Alex K: that makes it sound like it may not be worth huge optimization? 11:44:46 AM Alex M: no, not really 11:44:56 AM Alex M: I’m a purist though 11:45:18 AM Alex M: so this makes my world

Last two lines scares the hell out of me.

aw3c2 · on March 9, 2010

I don't want to read several pages of badly spoken chat english to find the conclusion. Could someone be so kind and explain what the shocking truth is?

gscott · on March 10, 2010

That tables really are the best, CSS is good for changing fonts all at once though. I read it all for you so don't worry about double checking.

aw3c2 · on March 10, 2010

Thanks!

FluidDjango · on March 9, 2010

The "Two Alexes" are just discovering (Feb 2010) what Steve Souders explained in detail at http://www.stevesouders.com/blog/2009/06/18/simplifying-css-... [which is probably the site you want to visit on "How CSS strategies affect site performance"] (June 2009).

tjic · on March 9, 2010

Being "scooped" by a mere 6 months, on a 10 year old technology doesn't really make their realization dated.

It's new to me, and I bet it's new to most news.yc-ers.

jamesr · on March 10, 2010

Scooped by 10 years on a 10 year old technology, not 6 months. http://news.ycombinator.com/item?id=1180429

jrockway · on March 9, 2010

/me mumbles something about Reading The Fine Source. Then you don't have to guess what your browser is doing.

smackfu · on March 9, 2010

Trying to figure out HTML from reading the Firefox source is kind of insane.

jrockway · on March 9, 2010

Yeah, OK. So just be ignorant instead.

etherealG · on March 9, 2010

do you read the source of the linux kernel to understand how to use bash? how about the source of a wifi driver before connecting to a network?

your premise is rediculous.

jrockway · on March 9, 2010

I don't do those things, but that's not because it's a bad method, but because I don't care. If there was some bug in some system call, or if I needed to investigate why hardware is misbehaving, of course I'd read the Linux source. That's one of the great things about open source software; you don't have to treat it like a black box. You can just look at the internals and see how they work.

(And I have proof that I do actually do this; I had a misbehaving joystick, and the linux fork in my github contains the fix.)

Anyway, in conclusion, if you want to know what algorithm Firefox uses to process CSS selectors, just read the Firefox source code. If you don't give a damn, then don't do that instead. But if you care and you do nothing, that's just silly.

threepointone · on March 9, 2010

No no, his premise is delicious!

yummy trollspam.

scotty79 · on March 10, 2010

Don't think about how fast it is unless you have solid evidence that it is slow.