When to use NodeIterator
Asked Answered
H

2

20

Benchmark compares QSA & .forEach vs a NodeIterator

toArray(document.querySelectorAll("div > a.klass")).forEach(function (node) {
  // do something with node
});

var filter = {
    acceptNode: function (node) {
        var condition = node.parentNode.tagName === "DIV" &&
            node.classList.contains("klass") &&
            node.tagName === "A";

        return condition ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    }  
}
// FIREFOX Y U SUCK
var iter = document.createNodeIterator(document, NodeFilter.SHOW_ELEMENT, filter, false);
var node;
while (node = iter.nextNode()) {
    // do thing with node    
}

Now either NodeIterator's suck or I'm doing it wrong.

Question: When should I use a NodeIterator ?

In case you don't know, DOM4 specifies what NodeIterator is.

Hiroshima answered 29/10, 2011 at 19:52 Comment(0)
R
16

It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.

If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.

Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.

Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.

Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

Risteau answered 30/10, 2011 at 14:14 Comment(0)
U
38

NodeIterator (and TreeWalker, for that matter) are almost never used, because of a variety of reasons. This means that information on the topic is scarce and answers like @gsnedders' come to be, which fail to mention any of the API's unique features. I know this question is almost a decade old, so excuse my necromancy.

1. Initiation & Performance

It is true that the initiation of a NodeIterator is way slower than a method like querySelectorAll, but that is not the performance you should be measuring.

The thing about NodeIterators is that they are live-ish in the way that, just like an HTMLCollection or live NodeList, you can keep using the object after initiating it once.
The NodeList returned by querySelectorAll is static and will have to be re-initiated every time you need to match newly added elements.

This version of the jsPerf puts the NodeIterator in the preparation code. The actual test only tries to loop over all newly added elements with iter.nextNode(). You can see that the iterator is now orders of magnitudes faster.

2. Selector performance

Okay, cool, caching the iterator after initiation makes this example faster than querying. How you use the API still matters for iteration speed, though. In this version, you can observe another significant difference. I've added 10 classes (done[0-9]) that the selectors shouldn't be matching. The iterator loses about 10% of its speed, while the querySelectors lose 20%.

This version, on the other hand, shows what happens when you add another div > at the start of the selector. The iterator loses 33% of its speed, while the querySelectors got a speed INCREASE of 10%.

Removing the initial div > at the start of the selector like in this version shows that both methods become slower, because they match more than earlier versions. Like expected, the iterator is relatively more performant than the querySelectors in this case.

This means that filtering on basis of a node's own properties (its classes, attributes, etc.) is probably faster in a NodeIterator, while having a lot of combinators (>, +, ~, etc.) in your selector probably means querySelectorAll is faster.
This is especially true for the  (space) combinator. Selecting elements with querySelectorAll('article a') is way easier than manually looping over all parents of every a element, looking for one that has a tagName of 'ARTICLE'.

P.S. in §3.2, I give an example of how the exact opposite can be true if you want the opposite of what the space combinator does (exclude a tags with an article ancestor).

3. Impossible selectors

3.1 Simple hierarchical relationships

Of course, manually filtering elements gives you practically unlimited control. This means that you can filter out elements that would normally be impossible to match with CSS selectors. For example, CSS selectors can only "look back" in the way that selecting divs that are preceded by another div is possible with div + div. Selecting divs that are followed by another div is impossible.

However, inside a NodeFilter, you can achieve this by checking node.nextElementSibling.tagName === 'DIV'. The same goes for every selection CSS selectors can't make.

3.2 More global hierarchical relationships

Another thing I personally love about the usage of NodeFilters, is that when passed to a TreeWalker, you can reject a node and its whole sub-tree by returning NodeFilter.FILTER_REJECT instead of NodeFilter.FILTER_SKIP.

Imagine you want to iterate over all a tags on the page, except for ones with an article ancestor.
With querySelectors, you'd type something like

let a = document.querySelectorAll('a')
a = Array.prototype.filter.call(a, function (node) {
  while (node = node.parentElement) if (node.tagName === 'ARTICLE') return false
  return true
})

While in a NodeFilter, you'd only have to type this

return node.tagName === 'ARTICLE' ? NodeFilter.FILTER_REJECT : // ✨ Magic happens here ✨
       node.tagName === 'A'       ? NodeFilter.FILTER_ACCEPT :
                                    NodeFilter.FILTER_SKIP

In conclusion

NodeIterators and TreeWalkers shouldn't be instantiated for loads of loops and definitely shouldn't replace one-off loops. For all intents and purposes, they are just alternative methods for keeping track of lists/trees of nodes, with the latter also having a handy bit of sugar added with FILTER_REJECT.

There's one main advantage both NodeIterators and TreeWalkers have to offer:

  • Live-ishness, as discussed in §1

However, don't use them when any of the following is true:

  • Its instance is only going to be used once/a few times
  • Complex hierarchical relationships are queried that are possible with CSS selectors
    (i.e. body.no-js article > div > div a[href^="/"])
Uigur answered 3/10, 2019 at 14:45 Comment(7)
This is a first-rate answer. It covers multiple cases with the pros and cons of each, explains why things work like they do, and includes an excellent summary at the end. More importantly, this was written towards the end of 2019. A LOT has happened to browsers and the web in the 8 years since the official “answer”. If possible, this should become the official answer for 2020 and the foreseeable future.Visualize
"NodeIterator (and TreeWalker, for that matter) are almost never used" - I've had the pleasure of using it in a few projects. I agree it is rare to need it, BUT sometimes you do need a to go deep into a node tree, and this really helps. I would say a developer might use these ~5 times total in their life.Illdisposed
@Gust jsperf.com/qsa-vs-node-iterator/17 doesn't actually benchmark the same things: the node iterator benchmark only iterates over the iterator on the first iteration of the benchmark loop (the setup code is only run once, not before each run); for all further runs nextNode returns null immediately. Creating the iterator is actually really cheap, because it doesn't actually have to do much at all!Risteau
More generally, at this point the original benchmarks are rather worthless: JS VMs nowadays are smart enough (and know enough about the DOM objects) to realise the loops are no-ops and avoid the iteration entirely(!).Risteau
I've tried contacting JSPerf's creator regarding the timing inconsistenties. I'd noticed something was off when I moved things around, but he never responded and I couldn't find much useful information myself. Thanks for looking into this; will edit sometime soon!Uigur
@GustvandeWal mrale.ph/blog/2012/12/15/microbenchmarks-fairy-tale.html is old but still fundamentality accurate; #36160294 is a question here about benchmarking.Risteau
If we go by the knowledge that creating the iterator is really cheap, then this would appear to be a better benchmark. Note that this takes the filter function out of the benchmark loop and shares it with the manual iteration.Risteau
R
16

It's slow for a variety of reasons. Most obviously is the fact that nobody uses it so quite simply far less time has been spent optimizing it. The other problem is it's massively re-entrant, every node having to call into JS and run the filter function.

If you look at revision three of the benchmark, you'll find I've added a reimplementation of what the iterator is doing using getElementsByTagName("*") and then running an identical filter on that. As the results show, it's massively quicker. Going JS -> C++ -> JS is slow.

Filtering the nodes entirely in JS (the getElementsByTagName case) or C++ (the querySelectorAll case) is far quicker than doing it by repeatedly crossing the boundary.

Note also selector matching, as used by querySelectorAll, is comparatively smart: it does right-to-left matching and is based on pre-computed caches (most browsers will iterate over a cached list of all elements with the class "klass", check if it's an a element, and then check if the parent is a div) and hence they won't even bother with iterating over the entire document.

Given that, when to use NodeIterator? Basically never in JavaScript, at least. In languages such as Java (undoubtedly the primary reason why there's an interface called NodeIterator), it will likely be just as quick as anything else, as then your filter will be in the same language as the filter. Apart from that, the only other time it makes sense is in languages where the memory usage of creating a Node object is far greater than the internal representation of the Node.

Risteau answered 30/10, 2011 at 14:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.