Why do browsers match CSS selectors from right to left?
Asked Answered
G

3

597

CSS Selectors are matched by browser engines from right to left. So they first find the children and then check their parents to see if they match the rest of the parts of the rule.

  1. Why is this?
  2. Is it just because the spec says?
  3. Does it affect the eventual layout if it was evaluated from left to right?

To me the simplest way to do it would be use the selectors with the least number of elements. So IDs first (as they should only return 1 element). Then maybe classes or an element that has the fewest number of nodes — e.g. there may only be one span on the page so go directly to that node with any rule that references a span.

Here are some links backing up my claims

  1. http://code.google.com/speed/page-speed/docs/rendering.html
  2. https://developer.mozilla.org/en/Writing_Efficient_CSS

It sounds like that it is done this way to avoid having to look at all the children of parent (which could be many) rather than all the parents of a child which must be one. Even if the DOM is deep it would only look at one node per level rather than multiple in the RTL matching. Is it easier/faster to evaluate CSS selectors LTR or RTL?

Gaddy answered 26/4, 2011 at 22:3 Comment(55)
3. No - no matter how you read it, the selector always matches the same set of elements.Context
The parsing way you suggest wouldn't be really effecting since it requires accessing the DOM a lot. I'd parse it from left to right, and presumably, selectors such as jQuery's parse it from left to right too.Frag
@Sime Vidas - Then why is it done right to left?Gaddy
@Frag - But why don't browser engines?Gaddy
@Frag I thought jQuery's sizzle engine traversed right->left and used built-in optimizations (like when the selector begins with an ID)Linders
For what it's worth, a browser can't assume that your IDs are unique. You could stick the same id="foo" all over your DOM, and a #foo selector would need to match all those nodes. jQuery has the option of saying that $("#foo") will always return just one element, because they're defining their own API with its own rules. But browsers need to implement CSS, and CSS says to match everything in the document with the given ID.Luminescence
The question seems to be about selector matching and not about selector parsing.Champion
@Boris Zbarsky — Both the HTML and CSS specifications say that IDs must be unique. Matching everything is a decision made so that error recovery can be performed on bad documents, not because the spec says so.Schmo
@Quentin: The CSS3 spec on ID selectors says an ID selector matches "any element" with that ID rather than "the element" with it.Inconsequent
@Inconsequent — it also says "What makes attributes of type ID special is that no two such attributes can have the same value in a conformant document"Schmo
@Schmo In a "nonconformant" (to HTML) document IDs can be non-unique, and in those documents CSS requires matching all elements with that ID. CSS itself places no normative requirements on IDs being unique; the text you cite is informative.Luminescence
@Boris Zbarsky What jQuery does depends on the code path within jQuery. In some cases, jQuery uses the NodeSelector API (native querySelectorAll). In other cases, Sizzle is used. Sizzle doesn't match multiple IDs but QSA does (AYK). The path taken depends on the selector, the context, and the browser and its version. jQuery's Query API uses what I have termed "Native First, Dual Approach". I wrote an article on that, but it is down. Though you may find here: fortybelow.ca/hosted/dhtmlkitchen/JavaScript-Query-Engines.htmlAntisana
@BorisZbarsky: At no point in the entire spec -- in any version of the spec -- does it mention the result of attempting to apply CSS to an invalid document. (In fact, one of the errata is a note to fix some ill-formed HTML, rather than taking that opportunity to add a note that the styles should still be applied.) As for that bit about IDs being unique? That is normative, as is every other bit of text in the spec unless otherwise stated.Aquarius
@Aquarius There is no such thing as "invalid document" as far as CSS is concerned. CSS just works on an abstract element tree. HTML defines how this tree is constructed, and it defines how to do it even for documents that don't meet the authoring conformance criteria. So the result of applying CSS to a document with duplicated IDs is in fact completely defined by the combination of the two specs. As for it being normative, it sure is: as an authoring requirement. Duplicated ids won't validate, but that doesn't affect how UAs are supposed to process them.Luminescence
@Aquarius Here's an example using the DOM (which also works on an element tree) instead of CSS. Take a look at dom.spec.whatwg.org/#dom-document-getelementbyid and note the use of "the first element, in tree order". Why do you think it says that, exactly? Because ids can in fact be duplicated if the author did not author a validating document, and the DOM has to deal with that.Luminescence
The DOM does. CSS does not. It'd be just as correct to apply #whatever's styles to exactly the one element returned by document.getElementById('whatever'). It is explicitly stated in CSS that IDs are unique. You break that rule, there's no rule about how the implementation has to behave.Aquarius
@BorisZbarsky Because the DOM has to be able to handle invalid documents, and it is specifying how to handle a situation that shouldn't exist without blowing up?Battle
@RobertMcKee Sure, but the same considerations apply to CSS.Luminescence
@Aquarius The thing is, the CSS working group members don't agree with you, browser implementations don't agree with you, and web sites depend on the behavior browser implementations have: it's actually common for sites to have repeated IDs and depend on styling the id to style all the elements. So other than theoretical purity, what is the benefit of your argument, if I might ask?Luminescence
@BorisZbarsky You've made some decent points, however, I don't see how you think the CSS working group members see things your way when the only things I've seen in the spec say otherwise. Some browser implementations can/will style multiple elements with the same ID value the same (not all, but all of the top 4 at least), and it really isn't common for sites to have repeated IDs and depend on it. It's actually pretty rare to see one, and I don't think I've ever seen one depend on it. Can you name 3? I can tell you the most commonly used javascript library does NOT agree with you.Battle
Sorry, I should say 2 of the most commonly used javascript libraries (jQuery and prototype.js) will only return one element when asked to select by id. Prototype: jsfiddle.net/CTZT6 jQuery: jsfiddle.net/CTZT6/1Battle
MooTools also returns only a single element when given an id selector: jsfiddle.net/CTZT6/2Battle
@BorisZbarsky: The benefit is in not having to worry about behavior that hasn't been specified anywhere. The CSS working group doesn't seem to agree with either of us, considering they have not made a single official statement either way -- except for the official statement in the specs that IDs are unique, of course, which lends more weight to my argument than yours. :P No browser on Earth is required to act the way it does, as far as CSS is concerned, and relying on unspecified behavior is simply bad coding.Aquarius
@BorisZbarsky: The benefit is also in promoting guaranteed consistent behavior between CSS and JS. If your IDs aren't unique, you have absolutely no guarantees either way about what will happen -- unless you want to slap a "this site works in..." disclaimer on your page.Aquarius
@RobertMcKee I think they see things my way because I talked to them. Which you can too: just send mail to [email protected]. And as for sites, there are lots of templating things that use ids for styling when they should be using classes.... and no, jQuery doesn't do that when you use its selector stuff, but that doesn't mean sites don't depend on the CSS engine doing it.Luminescence
@Aquarius Not worrying about behavior that "isn't specified anywhere" is pointless if it's de-facto standard. Of course this behavior is specified: CSS says to match things that have a given ID and that the document language defines what it means to have an ID. HTML defines which things have which IDs. The result is that multiple elements can have the same ID. You have to really try to twist the specs around and lean hard on the non-normative parts to come to any other conclusion.Luminescence
@BorisZbarsky: It takes nothing more than a literal read of all the existing normative documentation. No twisting required. It actually takes more, like a dependence on behavior that's not documented anywhere, to reach the conclusion that undefined behavior is somehow defined.Aquarius
@Aquarius Oh, I see the core issue. There is no such thing as "undefined behavior" in web specs (not good ones, at least). We tried that; it was an abysmal failure. Specs now define behavior in all situations. If you're seeing undefined behavior, either it's a spec bug or you're missing something.Luminescence
@BorisZbarsky: Show me the spec that defines how CSS must behave when the UA is presented with an invalid document. The CSS spec certainly doesn't, and in fact goes out of its way not to.Aquarius
@BorisZbarsky: In fact, i'll even settle for de facto standards in this case. Show me where browser makers have documented their respective UAs' behavior in such a case.Aquarius
@Aquarius where does the CSS spec say anything of the sort? It doesn't even have a concept of "invalid document" in normative text. Furthermore, do note that dev.w3.org/csswg/selectors4/#id-selectors explicitly says "It is possible in non-conforming documents for multiple elements to match a single ID selector."Luminescence
@BorisZbarsky: See, now, text somewhat resembling that is basically what needs to be in a finished spec before you can say "CSS requires" anything. It needs to be more explicitly worded before it counts as a requirement, though; something like, "In a non-conforming document, an ID selector must match all elements whose ID attribute is the same as the identifier in the selector." "Possible" is more a "may" than a "must", and it's quite easy to picture a UA dropping illegal ID attributes from the document unless it is explicitly required to keep them around for some reason.Aquarius
@Aquarius The text I cite is informative, just like the rest of this text. There is no change in what the spec requires, just a clearer explanation for people who seemed to be confused about what it requires. Which is why it doesn't say "must": that's nonsense for informative text.Luminescence
@BorisZbarsky: So, then, there is still no normative text describing a requirement for any particular behavior in this case.Aquarius
@Aquarius Sure there is. You're just wilfully ignoring it.Luminescence
@BorisZbarsky: Then show me. Make it obvious enough that even i can't refute it. All it would take is a link to normative text plainly stating the requirement. I've asked a half dozen times now to see such text, and you have yet to produce a link. I've looked over the specs a number of times myself, and can not find words spelling out any such requirement -- and the only text i have seen yet either conspicuously avoids stating one, or explicitly states that IDs must be unique. Usually both.Aquarius
There is no normative text saying this explicitly, just like there is no normative text explicitly saying that two elements with the same class are matched by a class selector and no normative text saying the browser should not crash when parsing CSS. The relevant normative bits are w3.org/TR/css3-selectors/#id-selectors which says " An ID selector represents an element instance that has an identifier that matches the identifier in the ID selector." It does not define how to tell what identifier the element has; that's up to the document language, so HTML.Luminescence
@Aquarius And the document language spec has "The id attribute specifies its element's unique identifier (ID). [DOM]" That's it. That's the full extent of the normative text on the matter for UA implementors. If you just implement it without trying to read between the lines, you get the behavior that every browser has.Luminescence
@BorisZbarsky: And if you do "read between the lines", it is entirely possible to write a UA that follows the spec to the letter and still doesn't do what you claim (without evidence) that the spec "requires". Since by definition, a duplicate unique identifier can not exist, there's no text saying what the browser has to do with documents that have them. It could drop all but the first duplicate, for example, and be 100% HTML5- and CSS3-compliant. It could keep the first, or the last, or a random one of the dupes and be compatible with HTML4 -- and, again, with CSS3.Aquarius
@BorisZbarsky: A browser that did either one of those things wouldn't even be able to style more than one #whatever, cause there could never be more than one #whatever. And as far as every spec i've ever seen is concerned, that browser is still 100% compliant with all of them.Aquarius
@Aquarius Reading between the lines with specs will get you into trouble. In particular, you will not in fact be spec-compliant in that case, because the spec authors assume that you do NOT read between the lines. If they had to assume that you do and cover every possible between-line-reading in the spec the specs would get ridiculously unwieldy.Luminescence
@Aquarius In any case, if you think that the specs need improvement, feel free to submit feedback. [email protected] for CSS, [email protected] for HTML.Luminescence
@BorisZbarsky: I consider it part of my job to read between the lines -- to check my assumptions and differentiate between what is actually required and what is just the most commonly implemented mechanism today. Yes, fully specifying error behavior leads down a very deep rabbit hole, and going all the way makes the spec unwieldy...but any behavior not specified is by definition unspecified, and is thus open to interpretation. IMO HTML5 should never gone down the hole in the first place, for precisely this reason. (Well, that and that going all the way would make valid HTML irrelevant.)Aquarius
@Aquarius Your idea of how web specs work is not how they work, because doing it that way leads to tremendous interop issues. Which is why modern web specs aim to fully specify all behavior. If you see a case where the behavior seems to be unspecified, then either the spec is buggy or you're reading it wrong (and it might still be buggy because it can be read wrong).Luminescence
@BorisZbarsky: The entire point of defining syntax and conformance criteria is to say "If your stuff looks like this, implementations will understand it. If it doesn't, we can't guarantee the desired results." The second sentence is what makes the first sentence worth saying at all. A language spec can not mandate behavior (other than "consider the content not $LANGUAGE") for every non-conforming case. That's far outside its scope, and attempting to do so turns it into a UA spec -- and in HTML's case, would make the valididty of the language itself entirely irrelevant.Aquarius
The "tremendous interop issues" due to invalid HTML were/are because the document doesn't conform to the rules of the language it claims to be using. Garbage in, garbage out. The rules are not secret; the important ones aren't even complicated. There's no excuse for breaking them, and the last thing a worthwhile language spec should be doing is supporting said breakage.Aquarius
@Aquarius No, the entire point of defining web specs is to make sure there is interop. Defining syntax is largely there so that you can define clear extensibility points. CSS specs aim to mandate behavior for all non-conforming content, as does HTML, and have for years. And the "interop issues" I'm talking about were not due to invalid HTML but rather to CSS problems with either UAs failing to implement the spec requirements correctly because they read between the lines or the spec explicitly calling things undefined and then being forced to spec nonsense UAs implemented and sites assumed.Luminescence
@Aquarius in any case, the other confusion here is that the primary purpose of the CSS specs is in fact to be a UA spec, not a language spec. The other was tried, and failed.Luminescence
@BorisZbarsky: The fact that one even can read between the lines means that the spec does not cover all cases. The spec authors have either deliberately avoided doing so or failed miserably. (I'm assuming the former, cause (1) all available evidence points in that direction, notwithstanding your ascriptions of intent to the authors, and (2) these aren't idiots. :P) Either way, a literal read of the specs leaves a not-insignificant chunk of error behavior unspecified. And a literal read is the only option when you're trying to figure out what is actually required in those cases.Aquarius
As for defining syntax, semantics, etc, if you're right, then it's all moot. I can write tag soup and have it work just as reliably for all time. Why care if the HTML actually conforms at all?Aquarius
Why care at all? If all you care about is how it looks in visual UAs, there is no reason; this is why almost no one cares in practice. If you want your HTML to be more maintainable and more semantically meaningful, and perhaps faster to render, there may be reasons to care.Luminescence
I'm not even sure anyone but you two can figure out what's going on with this long of a conversation. We do have chat for these extended discussions. Anything you really want to keep around should be put into the question or an answer, especially if it's clarifying information. Stack Overflow does not handle discussion in comments well.Shrunk
I can, and I can say that most of whatever was being discussed belongs not on SO but in a mailing list.Inconsequent
Maybe I should have commented here earlier, but it's all very simple: either play by the rules, or risk cross-browser inconsistencies. That's all there is to it. If you reason "I don't care for the rules, because it looks the way I want in the browser", then you haven't tested in enough browsers. If you have duplicate IDs in your HTML, different browsers will return different results. Look at this document with IE, Edge, Chrome, Firefox. I rest my case.Pederast
I'm interested in learning at what stage in CRP does this happen ?Dwelling
L
859

Keep in mind that when a browser is doing selector matching it has one element (the one it's trying to determine style for) and all your rules and their selectors and it needs to find which rules match the element. This is different from the usual jQuery thing, say, where you only have one selector and you need to find all the elements that match that selector.

If you only had one selector and only one element to compare against that selector, then left-to-right makes more sense in some cases. But that's decidedly not the browser's situation. The browser is trying to render Gmail or whatever and has the one <span> it's trying to style and the 10,000+ rules Gmail puts in its stylesheet (I'm not making that number up).

In particular, in the situation the browser is looking at most of the selectors it's considering don't match the element in question. So the problem becomes one of deciding that a selector doesn't match as fast as possible; if that requires a bit of extra work in the cases that do match you still win due to all the work you save in the cases that don't match.

If you start by just matching the rightmost part of the selector against your element, then chances are it won't match and you're done. If it does match, you have to do more work, but only proportional to your tree depth, which is not that big in most cases.

On the other hand, if you start by matching the leftmost part of the selector... what do you match it against? You have to start walking the DOM, looking for nodes that might match it. Just discovering that there's nothing matching that leftmost part might take a while.

So browsers match from the right; it gives an obvious starting point and lets you get rid of most of the candidate selectors very quickly. You can see some data at http://groups.google.com/group/mozilla.dev.tech.layout/browse_thread/thread/b185e455a0b3562a/7db34de545c17665 (though the notation is confusing), but the upshot is that for Gmail in particular two years ago, for 70% of the (rule, element) pairs you could decide that the rule does not match after just examining the tag/class/id parts of the rightmost selector for the rule. The corresponding number for Mozilla's pageload performance test suite was 72%. So it's really worth trying to get rid of those 2/3 of all rules as fast as you can and then only worry about matching the remaining 1/3.

Note also that there are other optimizations browsers already do to avoid even trying to match rules that definitely won't match. For example, if the rightmost selector has an id and that id doesn't match the element's id, then there will be no attempt to match that selector against that element at all in Gecko: the set of "selectors with IDs" that are attempted comes from a hashtable lookup on the element's ID. So this is 70% of the rules which have a pretty good chance of matching that still don't match after considering just the tag/class/id of the rightmost selector.

Luminescence answered 28/4, 2011 at 4:36 Comment(9)
As a little bonus, it makes more sense to read it RTL than LTR even in English. An example: https://mcmap.net/q/65661/-css-combinator-precedence/…Inconsequent
Note that RTL matching only applies across combinators. It doesn't drill down to the simple selector level. That is, a browser takes the rightmost compound selector, or sequence of simple selectors and attempts to match it atomically. Then, if there's a match, it follows the combinator leftwards to the next compound selector and checks the element in that position, and so on. There is no evidence that a browser reads each part of a compound selector RTL; in fact, the last paragraph shows precisely otherwise (id checks always come first).Inconsequent
Actually, by the time you're matching selectors, at least in Gecko, the tagname and namespace come first. The id (as well as the tagname and classnames) is considered in a pre-filtering step that eliminates most rules without really trying to match the selectors.Luminescence
That might help depending on what optimizations the UA is doing, but it won't help the pre-filtering step in Gecko I describe above. There's a second filtering step that works on IDs and classes that's used for descendant combinators only that it might help, though.Luminescence
@Benito Ciaro: Not to mention specificity problems as well.Inconsequent
other optimizations browsers already do- it would help if developers didn't treat CSS rules like confetti.Festoon
@Inconsequent Thanks for pointing that out, I think that's a very important detail to note.Urbai
Interesting. Is there a profiler around that suggests changes to a css structure? I'm working on a web based data ware house solution with a manic amount of <tr> and `<td> in a table setup but virtually not much else. How could I optimize this?Severin
Would you mind sharing a link to the source? I've been looking up on mdn, and nothing substantial on this came up.Dwelling
O
38

Right to left parsing, also called as bottom-up parsing is actually efficient for the browser.

Consider the following:

#menu ul li a { color: #00f; }

The browser first checks for a, then li, then ul, and then #menu.

This is because as the browser is scanning the page it just needs to look at the current element/node and all the previous nodes/elements that it has scanned.

The thing to note is that the browser starts processing moment it gets a complete tag/node and needn't have to wait for the whole page except when it finds a script, in which case it temporarily pauses and completes execution of the script and then goes forward.

If it does the other way round it will be inefficient because the browser found the element it was scanning on the first check, but was then forced to continue looking through the document for all the additional selectors. For this the browser needs to have the entire html and may need to scan the whole page before it starts css painting.

This is contrary to how most libs parse dom. There the dom is constructed and it doesn't need to scan the entire page just find the first element and then go on matching others inside it .

Osteal answered 18/2, 2014 at 10:4 Comment(0)
C
20

It allows for cascading from the more specific to the less specific. It also allows a short circuit in application. If the more specific rule applies in all aspects that the parent rule applies to, all parent rules are ignored. If there are other bits in the parent, they are applied.

If you went the other way around, you would format according to parent and then overwrite every time the child has something different. In the long run, this is a lot more work than ignoring items in rules that are already taken care of.

Clop answered 26/4, 2011 at 22:13 Comment(1)
That's a separate issue. You do the cascading by sorting the rules by specificity and then matching against them in specificity order. But the question here is why for a given rule you match its selectors in a particular way.Luminescence

© 2022 - 2024 — McMap. All rights reserved.