How does querySelector works under the hood? [closed]
Asked Answered
I

1

12

Everyone knows what DOM selectors like document.getElementByID(...) and document.querySelector(...) do and how you can use it with classes, attributes, id and so on.

But I was not able to find how does it work under the hood (I can find perf test comparisons but I am interested in theory). I know that the html page is load, parsed by the browser and the DOM tree is constructed. But how does each of the selectors traverses the DOM tree to find the elements.

I have took a look at a spec for parsing algorithm and read really nice explanation how Browsers work, but also it gives excellent explanation about HTML, CSS parsing and rendering flow it does not give explanation how each of these selectors traverses this tree to find the elements.

I assume that in order to find something like .black or span it needs to traverse the whole tree, but to find #id it may be traversing some additional data structure and thus making it much faster. Please do not write your assumptions, I am looking for concrete knowledge with backup to specification or implementation in some browsers.

Ionone answered 7/9, 2014 at 23:2 Comment(7)
I think this would be better asked at programmers.stackexchange.comPiceous
That's an implementation detail, and will vary by which engine you are using. You'll have to read the source code of various implementation if you want to know. See en.wikipedia.org/wiki/List_of_ECMAScript_engines as a starting point.Kymry
@Kymry I do not really think so. This is a pretty basic feature and most probably I will be implemented really similar in major browsers.Ionone
How browsers work can only be specified by the people who write them, and each may be different. The various specifications do not define implementation detail, only how they must appear to work. Sites like MDN and MSDN will not provide enlightenment on browser internals. E.g. I'd guess that browsers create an index of IDs for use with getElementById, and probably something similar for popular CSS selectors like class and tagname, but finding a specification to define that may be a challenge.Poorly
@salvadordali Agreed ... but you said "concrete knowledge", which can only be gained from reading the source. Maybe someone who knows an engine implementation will reply, but even then you only get concrete knowledge about one implementation.Kymry
@Kymry take a look at one of my link how browsers work. They give pretty complete explanation how HTML is parsed, CSS rules are applied and so on. Yes, they derived it by looking at chrome, FF and opera and they all behave pretty much the same. So I assume that query selectors might also behave the same. by "concrete knowledge" I meant that I would not want people just guessing answers and if someone will give me explanation about implementation in any major browser - this will be suffice for me.Ionone
@SalvadorDali - If you would like to know how they do that "under the hood", here's the source code of handling DOM for Webkit: trac.webkit.org/browser/trunk/Source/WebCore/dom and Gecko: lxr.mozilla.org/mozilla/source/content/base/publicQuartile
S
11

Inspecting Firefox's source and reading the related documentation will help get some initial insight.
Once the document is fetched, it's passed to the parser (see: /mozilla/parser/html/) which will chew through the document and generate a content tree. The central parts of the parser are written in Java (/mozilla/parser/html/javasrc/) and then translated to C++ for building, so be ready to have a good time when you want to read the rest of the source.

Looking at the parser's source (/mozilla/parser/html/javasrc/TreeBuilder.java), namely an excerpt from the function startTag:

1579         if (errorHandler != null) {
1580             // ID uniqueness
1581             @IdType String id = attributes.getId();
1582             if (id != null) {
1583                 LocatorImpl oldLoc = idLocations.get(id);
1584                 if (oldLoc != null) {
1585                     err("Duplicate ID \u201C" + id + "\u201D.");
1586                     errorHandler.warning(new SAXParseException(
1587                             "The first occurrence of ID \u201C" + id
1588                             + "\u201D was here.", oldLoc));
1589                 } else {
1590                     idLocations.put(id, new LocatorImpl(tokenizer));
1591                 }
1592             }
1593         }

Turning attention to line 1590 and keeping in mind that earlier in the same file we have:

459     private final Map<String, LocatorImpl> idLocations = new HashMap<String, LocatorImpl>();

We can see that node ids are kept in a simple hash map. Looking up how classes are processed is an exercise left to the reader.

Different DOM methods, for example document.getElementByID(...), are connected to this hash map through glue code and a plethora of object hierarchy, see "How is the web-exposed DOM implemented?" on ask.mozilla.org.

Saintpierre answered 8/9, 2014 at 0:3 Comment(1)
Excellent answer!!Derris

© 2022 - 2024 — McMap. All rights reserved.