Why does the new "matchAll" in Javascript return an iterator (vs. an array)?
Asked Answered
E

1

8

ES2020 contains a new String.prototype.matchAll method, which returns an iterator. I'm sure I'm missing something dumb/obvious, but I don't see why it doesn't just return an array instead.

Can someone please explain the logic there?

EDIT: Just to clarify something from the comments, I'm operating on the assumption that iterators haven't simply replaced arrays as the new way all JS APIs going forward will return multiple values. If I missed that memo, and all new JS functions do return iterators, a link to said memo would 100% qualify as a valid answer.

But again, I suspect that such a blanket change wasn't made, and that the makers of Javascript made a specific choice, for this specific method, to have it return an iterator ... and the logic of that choice is what I'm trying to understand.

Enneagon answered 12/4, 2020 at 15:32 Comment(14)
i guess it's an replacement of exec, which we used to use instead of matchAll to get the same functionality, since with g flag in match you won't be able to get all the matches as you can get in exec to mimic same functionality it is proposed.Daguerre
But exec returns an Array, not an iterator. From the MDN: "The exec() method executes a search for a match in a specified string. Returns a result array, or null."Enneagon
it's not like normal array it keeps track of lastindex of match and on next iteration it searches from there, the array holds value of current match and capture group.Daguerre
If the match succeeds, the exec() method returns an array (with extra properties index and input; see below) and updates the lastIndex property of the regular expression object. MDNDaguerre
If you return an array the complete result has to be known when the function call finished. Returning an iterator allows the evaluation of the next result at the time when it is requested. Depending on the use-case this can have benefits for memory and/or responsivnes.Samp
Isn't it to protect memory? An array has to be precomputed/allocated while iterator can be implemented lazily. This also means that if only few iterations are made the rest doesn't even need to be computed.Kickstand
Just to focus things, I agree there are obviously pros/cons to iterables vs. arrays. I'm not questioning that. But at the same time look: iterables haven't replaced arrays in the JS language. The API is still full of things that return arrays. So my question isn't "why are iterables better/worse?", it's "why in this specific method, seemingly going against the trend of previous regex stuff, did they decide an iterator was what matchAll should return, instead of an array?"Enneagon
Protecting app from extensive memory usage is just safe. If there are means in the language to make apis safer why not use it?Kickstand
By that logic, no new function should ever return an array now that iterators exist, because iterators are just safer arrays ... but again, iterators are not "arrays 2.0" in Javascript. The makers of JS did not just decide "all methods in ES versions after the iterator one will return iterators instead of arrays, because they are the new superior array" ... so saying (more or less) "iterators have clear advantages to arrays" (while 100% accurate) doesn't answer the question.Enneagon
@Enneagon going against the trend of previous regex stuff just because old functions that were defined before iterators exist, does not mean that new functions should/must not utilize iterators. And changing exec to return an iterator is not possible. no new function should ever return an array now that iterators exist, for certain tasks you can estimate well how large the result will be, and how long it will take to calculate. For matchAll it depends on the regular expression and the input, so having a function that allows to not fully parse the result can indeed be helpful.Samp
:) Are you saying that old regex methods did return an array, but one with an iterator-like structure; they only didn't didn't use iterators because they didn't exist? Now that iterators do exist, they are specifically a good solution for this specific problem because of specific reasons, and because of those reasons iterator was the more natural option here? Because something like that (if a person were to outline those specifics) almost sounds like an answer ...Enneagon
Yes, that would be my guess for a reasonable explanation. But to know why they decided to do it that way would be a question to ask the committee members of the specification team ;)Samp
Not at all. If there's an obvious logical reason, and it's explained well, you 100% don't need a quote from a committee member to get your answer upvoted/accepted.Enneagon
I don't think this should be the answer. It's just guessing based on reasonable arguments.Kickstand
D
10

This is described in the proposal document:

Many use cases may want an array of matches - however, clearly not all will. Particularly large numbers of capturing groups, or large strings, might have performance implications to always gather all of them into an array. By returning an iterator, it can trivially be collected into an array with the spread operator or Array.from if the caller wishes to, but it need not.

.matchAll is lazy. When using the iterator, the regex will only evaluate the next match in the string once the prior match has been iterated over. This means that if the regex is expensive, the first few matches can be extracted, and then your JS logic can make the iterator bail out of trying further matches.

For a trivial example of the lazy evaluation in action:

for (const match of 'axxxxxxxxxxxxxxxxxxxxxxxxxxxxy'.matchAll(/a|(x+x+)+y./g)) {
  if (match[0] === 'a') {
    console.log('Breaking out');
    break;
  }
}
console.log('done');

Without the break, the regular expression will go on to attempt a 2nd match, which will result in a very expensive operation.

If matchAll returned an array, and iterated over all matches immediately while creating the array, it would not be possible to bail out.

Depalma answered 26/10, 2020 at 23:40 Comment(2)
it is working, but my typescript says ESLint: iterators/generators require regenerator-runtime, which is too heavyweight for this guide to allow them. Separately, loops should be avoided in favor of array iterations.(no-restricted-syntax) Any idea how to adapt it?Sperm
You can iterate through the matches with a traditional for (let i = 0; to avoid the iterator. That will still require a "loop", but that can't be avoided easily, so don't worry about it.Depalma

© 2022 - 2024 — McMap. All rights reserved.