How does querySelector works under the hood? [closed]

Everyone knows what DOM selectors like document.getElementByID(...) and document.querySelector(...) do and how you can use it with classes, attributes, id and so on.

But I was not able to find how does it work under the hood (I can find perf test comparisons but I am interested in theory). I know that the html page is load, parsed by the browser and the DOM tree is constructed. But how does each of the selectors traverses the DOM tree to find the elements.

I have took a look at a spec for parsing algorithm and read really nice explanation how Browsers work, but also it gives excellent explanation about HTML, CSS parsing and rendering flow it does not give explanation how each of these selectors traverses this tree to find the elements.

I assume that in order to find something like .black or span it needs to traverse the whole tree, but to find #id it may be traversing some additional data structure and thus making it much faster. Please do not write your assumptions, I am looking for concrete knowledge with backup to specification or implementation in some browsers.

Inspecting Firefox's source and reading the related documentation will help get some initial insight.
Once the document is fetched, it's passed to the parser (see: /mozilla/parser/html/) which will chew through the document and generate a content tree. The central parts of the parser are written in Java (/mozilla/parser/html/javasrc/) and then translated to C++ for building, so be ready to have a good time when you want to read the rest of the source.

Looking at the parser's source (/mozilla/parser/html/javasrc/TreeBuilder.java), namely an excerpt from the function startTag:

1579         if (errorHandler != null) {
1580             // ID uniqueness
1581             @IdType String id = attributes.getId();
1582             if (id != null) {
1583                 LocatorImpl oldLoc = idLocations.get(id);
1584                 if (oldLoc != null) {
1585                     err("Duplicate ID \u201C" + id + "\u201D.");
1586                     errorHandler.warning(new SAXParseException(
1587                             "The first occurrence of ID \u201C" + id
1588                             + "\u201D was here.", oldLoc));
1589                 } else {
1590                     idLocations.put(id, new LocatorImpl(tokenizer));
1591                 }
1592             }
1593         }

Turning attention to line 1590 and keeping in mind that earlier in the same file we have:

459     private final Map<String, LocatorImpl> idLocations = new HashMap<String, LocatorImpl>();

We can see that node ids are kept in a simple hash map. Looking up how classes are processed is an exercise left to the reader.

Different DOM methods, for example document.getElementByID(...), are connected to this hash map through glue code and a plethora of object hierarchy, see "How is the web-exposed DOM implemented?" on ask.mozilla.org.

Recommended topics

Hot tags