I had been bothered about this for long, so I finally researched this and give you this long winded reason for why things are the way they are.
From the spec:
Section 11.9.4 The Strict Equals Operator ( === )
The production EqualityExpression : EqualityExpression === RelationalExpression
is evaluated as follows:
- Let lref be the result of evaluating EqualityExpression.
- Let lval be GetValue(lref).
- Let rref be the result of evaluating RelationalExpression.
- Let rval be GetValue(rref).
- Return the result of performing the strict equality comparison
rval === lval. (See 11.9.6)
So now we go to 11.9.6
11.9.6 The Strict Equality Comparison Algorithm
The comparison x === y, where x and y are values, produces true or false.
Such a comparison is performed as follows:
- If Type(x) is different from Type(y), return false.
- If Type(x) is Undefined, return true.
- If Type(x) is Null, return true.
- If Type(x) is Number, then
...
- If Type(x) is String, then return true if x and y are exactly the
same sequence of characters (same length and same characters in
corresponding positions); otherwise, return false.
That's it. The triple equals operator applied to strings returns true iff the arguments are exactly the same strings (same length and same characters in corresponding positions).
So ===
will work in the cases when we're trying to compare strings which might have arrived from different sources, but which we know will eventually have the same values - a common enough scenario for inline strings in our code. For example, if we have a variable named connection_state
, and we wish to know which one of the following states ['connecting', 'connected', 'disconnecting', 'disconnected']
is it in right now, we can directly use the ===
.
But there's more. Just above 11.9.4, there is a short note:
NOTE 4
Comparison of Strings uses a simple equality test on sequences of code
unit values. There is no attempt to use the more complex, semantically oriented
definitions of character or string equality and collating order defined in the
Unicode specification. Therefore Strings values that are canonically equal
according to the Unicode standard could test as unequal. In effect this
algorithm assumes that both Strings are already in normalized form.
Hmm. What now? Externally obtained strings can, and most likely will, be weird unicodey, and our gentle ===
won't do them justice. In comes localeCompare
to the rescue:
15.5.4.9 String.prototype.localeCompare (that)
...
The actual return values are implementation-defined to permit implementers
to encode additional information in the value, but the function is required
to define a total ordering on all Strings and to return 0 when comparing
Strings that are considered canonically equivalent by the Unicode standard.
We can go home now.
tl;dr;
To compare strings in javascript, use localeCompare
; if you know that the strings have no non-ASCII components because they are, for example, internal program constants, then ===
also works.
JavaScript case insensitive string comparison
on #2141127 – DativeJavascript : remove accents/diacritics in strings
on #991404 – Dative