Within a RegExp character set, a hyphen-minus character (your standard keyboard dash) denotes a range of character codes between the two characters it separates. The exceptions are when it is escaped (\-
) or when it does not separate two characters because it is either the final character of the class or it is the first character (after the optional caret that inverts the class).
Three examples of character ranges: a simple example, an advanced example, and a bug:
[a-z]
is pretty straightforward because it works the way we expect it to, though this is actually because the character codes happen to be sequential. Another way of writing this is [\x61-\x7a]
[!-~]
is not at all straightforward, at least until you look at a character map and learn that !
is the first printable ASCII character and ~
is the last (of "lower ASCII"), so this is a way of saying "all printable lower ASCII characters" and it is the equivalent of [\x21-\x7e]
[A-z]
has a switched case in it. You may dislike the fact that there are six non-letter characters accepted by this range (which is [\x41-\x7a]
)
Now let's examine your regex of /[\w-+]/u
. Regex101 has a more informative error:
You can not create a range with shorthand escape sequences
Since \w
is not itself a character (but rather a collection of characters), an abutting dash must either be taken literally or else an error. When you invoke it with the /u
flag to trigger fullUnicode
, you enter a more strict mode and therefore get an error.
The error I get from "foo".match(/[\w-+]/u)
in Firefox 64.0 is:
SyntaxError: character class escape cannot be used in class range in regular expression
This is slightly more informative than the error you got since it actually tells you the problem is with the escape (though not why it's a problem).
According to ECMAScript 2015's RegExBuiltinExec()
logic:
- If fullUnicode is true, then
- e is an index into the Input character list, derived from S, matched by matcher. Let eUTF be the smallest index into S that corresponds to the character at element e of Input. If e is greater than or equal to the length of Input, then eUTF is the number of code units in S.
- Let e be eUTF.
This seems to be explicitly building its own range-parsing logic.
The solution is to either escape your hyphen-minus or else put it last (or first):
/[\w\-+]/u
or /[\w+-]/u
or /[-\w+]/u
. I personally always put it last.
[\w-+]
doesn't make any kind of sense. It looks like one of the engines is too lenient. – Carmina[\w-+]
gets accepted at all. The range "from any word character to the plus symbol" makes absolutely no sense, so if you want to match the minus symbol, escape it:[\w\-+]
, and that'l work in simple as well as unicode matching. – Pookapattern
attribute of an HTML input element. I can't figure out what the person who wrote the code was trying to do, and I am not implying that using character classes in ranges is a smart idea. – Spurge-
is a literal character or range delimiter inside[]
. Looks like JS doesn't treat it as a range delimiter when it's after an escape sequence, since it wouldn't make sense. – Unpractical-
was only treated as a literal character if it is the last character in the range./[\w-+]/.test('-')
returnstrue
so you might be right, but that is not definite proof. – Spurge-
is treated as a literal if it appears in a position where it cannot be interpreted as indicating a range. I'm searching in the normative document for confirmation. That doesn't explain why it doesn't work withu
, though. – Drambuieu
modifier makes the regex engine parse the regex expression in a more strict way. All chars that do not have to be escaped must not be escaped and those that should must be escaped. All ambiguity must be avoided. – Misinterpret-
in the allowed places, or escape it. – Unpracticalu
flag and reject it with theu
flag. The only real differences are the details of how they handle it and the error messages they produce. – Spurges
modifier, infinite length lookbehind. As foru
, in Chrome, you may use Unicode property classes like\p{L}
. – Misinterpret-
character can be treated literally or it can denote a range. It is treated literally if it is the first or last character ofClassRanges
, the beginning or end limit of a range specification, or immediately follows a range specification". – Drambuie\w
) or if the first ClassAtom's character value is greater than the second ClassAtom's character value. (link) – Misinterpretu
flag. – Spurge