Note that this issue only seems to affect outdated versions of Firefox, so unless you explicitly need to support those old versions, you could choose to just not bother at all. The behavior for your example is the same in all modern browsers (since the change in Firefox). This can be verified using jsvu + eshost:
$ jsvu # Update installed JavaScript engine binaries to the latest version.
$ eshost -e '"\xDF".normalize("NFKC").toLowerCase().toUpperCase().toLowerCase()'
#### Chakra
ss
#### V8 --harmony
ss
#### JavaScriptCore
ss
#### V8
ss
#### SpiderMonkey
ss
#### xs
ss
But you asked how to solve this problem, so let’s continue.
Step 4 of https://tc39.github.io/ecma262/#sec-string.prototype.tolowercase states:
Let cuList
be a List where the elements are the result of toLowercase(cpList)
, according to the Unicode Default Case Conversion algorithm.
This Unicode Default Case Conversion algorithm is specified in section 3.13 Default Case Algorithms of the Unicode standard.
The full case mappings for Unicode characters are obtained by using the mappings from SpecialCasing.txt
plus the mappings from UnicodeData.txt
, excluding any of the latter mappings that would conflict. Any character that does not have a mapping in these files is considered to map to itself.
[…]
The following rules specify the default case conversion operations for Unicode strings. These rules use the full case conversion operations, Uppercase_Mapping(C)
, Lowercase_Mapping(C)
, and Titlecase_Mapping(C)
, as well as the context-dependent mappings based on the casing context, as specified in Table 3-17.
For a string X
:
- R1
toUppercase(X)
: Map each character C
in X
to Uppercase_Mapping(C)
.
- R2
toLowercase(X)
: Map each character C
in X
to Lowercase_Mapping(C)
.
Here’s an example from SpecialCasing.txt
, with my annotation added below:
00DF ; 00DF ; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S
<code>; <lower>; <title> ; <upper> ; (<condition_list>;)? # <comment>
This line says that U+00DF ('ß'
) lowercases to U+00DF (ß
) and uppercases to U+0053 U+0053 (SS
).
Here’s an example from UnicodeData.txt
, with my annotation added below:
0041 ; LATIN CAPITAL LETTER A; Lu;0;L;;;;;N;;;; 0061 ;
<code>; <name> ; <ignore> ; <lower>; <upper>
This line says that U+0041 ('A'
) lowercases to U+0061 ('a'
). It doesn’t have an explicit uppercase mapping, meaning it uppercases to itself.
Here’s another example from UnicodeData.txt
:
0061 ; LATIN SMALL LETTER A; Ll;0;L;;;;;N;; ;0041; ; 0041
<code>; <name> ; <ignore> ; <lower>; <upper>
This line says that U+0061 ('a'
) uppercases to U+0041 ('A'
). It doesn’t have an explicit lowercase mapping, meaning it lowercases to itself.
You could write a script that parses these two files, reads each line following these examples, and builds lowercase/uppercase mappings. You could then turn those mappings into a small JavaScript library that provides spec-compliant toLowerCase
/toUpperCase
functionality.
This seems like a lot of work. Depending on the old behavior in Firefox and what exactly changed (?) you could probably limit the work to just the special mappings in SpecialCasing.txt
. (I’m making this assumption that only the special casings changed in Firefox 55, based on the example you provided.)
// Instead of…
function normalize(string) {
const normalized = string.normalize('NFKC');
const lowercased = normalized.toLowerCase();
return lowercased;
}
// …one could do something like:
function lowerCaseSpecialCases(string) {
// TODO: replace all SpecialCasing.txt characters with their lowercase
// mapping.
return string.replace(/TODO/g, fn);
}
function normalize(string) {
const normalized = string.normalize('NFKC');
const fixed = lowerCaseSpecialCases(normalized); // Workaround for old Firefox 54 behavior.
const lowercased = fixed.toLowerCase();
return lowercased;
}
I wrote a script that parses SpecialCasing.txt
and generates a JS library that implements the lowerCaseSpecialCases
functionality mentioned above (as toLower
) as well as toUpper
. Here it is: https://gist.github.com/mathiasbynens/a37e3f3138069729aa434ea90eea4a3c Depending on your exact use case, you might not need the toUpper
and its corresponding regex and map at all. Here’s the full generated library:
const reToLower = /[\u0130\u1F88-\u1F8F\u1F98-\u1F9F\u1FA8-\u1FAF\u1FBC\u1FCC\u1FFC]/g;
const toLowerMap = new Map([
['\u0130', 'i\u0307'],
['\u1F88', '\u1F80'],
['\u1F89', '\u1F81'],
['\u1F8A', '\u1F82'],
['\u1F8B', '\u1F83'],
['\u1F8C', '\u1F84'],
['\u1F8D', '\u1F85'],
['\u1F8E', '\u1F86'],
['\u1F8F', '\u1F87'],
['\u1F98', '\u1F90'],
['\u1F99', '\u1F91'],
['\u1F9A', '\u1F92'],
['\u1F9B', '\u1F93'],
['\u1F9C', '\u1F94'],
['\u1F9D', '\u1F95'],
['\u1F9E', '\u1F96'],
['\u1F9F', '\u1F97'],
['\u1FA8', '\u1FA0'],
['\u1FA9', '\u1FA1'],
['\u1FAA', '\u1FA2'],
['\u1FAB', '\u1FA3'],
['\u1FAC', '\u1FA4'],
['\u1FAD', '\u1FA5'],
['\u1FAE', '\u1FA6'],
['\u1FAF', '\u1FA7'],
['\u1FBC', '\u1FB3'],
['\u1FCC', '\u1FC3'],
['\u1FFC', '\u1FF3']
]);
const toLower = (string) => string.replace(reToLower, (match) => toLowerMap.get(match));
const reToUpper = /[\xDF\u0149\u01F0\u0390\u03B0\u0587\u1E96-\u1E9A\u1F50\u1F52\u1F54\u1F56\u1F80-\u1FAF\u1FB2-\u1FB4\u1FB6\u1FB7\u1FBC\u1FC2-\u1FC4\u1FC6\u1FC7\u1FCC\u1FD2\u1FD3\u1FD6\u1FD7\u1FE2-\u1FE4\u1FE6\u1FE7\u1FF2-\u1FF4\u1FF6\u1FF7\u1FFC\uFB00-\uFB06\uFB13-\uFB17]/g;
const toUpperMap = new Map([
['\xDF', 'SS'],
['\uFB00', 'FF'],
['\uFB01', 'FI'],
['\uFB02', 'FL'],
['\uFB03', 'FFI'],
['\uFB04', 'FFL'],
['\uFB05', 'ST'],
['\uFB06', 'ST'],
['\u0587', '\u0535\u0552'],
['\uFB13', '\u0544\u0546'],
['\uFB14', '\u0544\u0535'],
['\uFB15', '\u0544\u053B'],
['\uFB16', '\u054E\u0546'],
['\uFB17', '\u0544\u053D'],
['\u0149', '\u02BCN'],
['\u0390', '\u0399\u0308\u0301'],
['\u03B0', '\u03A5\u0308\u0301'],
['\u01F0', 'J\u030C'],
['\u1E96', 'H\u0331'],
['\u1E97', 'T\u0308'],
['\u1E98', 'W\u030A'],
['\u1E99', 'Y\u030A'],
['\u1E9A', 'A\u02BE'],
['\u1F50', '\u03A5\u0313'],
['\u1F52', '\u03A5\u0313\u0300'],
['\u1F54', '\u03A5\u0313\u0301'],
['\u1F56', '\u03A5\u0313\u0342'],
['\u1FB6', '\u0391\u0342'],
['\u1FC6', '\u0397\u0342'],
['\u1FD2', '\u0399\u0308\u0300'],
['\u1FD3', '\u0399\u0308\u0301'],
['\u1FD6', '\u0399\u0342'],
['\u1FD7', '\u0399\u0308\u0342'],
['\u1FE2', '\u03A5\u0308\u0300'],
['\u1FE3', '\u03A5\u0308\u0301'],
['\u1FE4', '\u03A1\u0313'],
['\u1FE6', '\u03A5\u0342'],
['\u1FE7', '\u03A5\u0308\u0342'],
['\u1FF6', '\u03A9\u0342'],
['\u1F80', '\u1F08\u0399'],
['\u1F81', '\u1F09\u0399'],
['\u1F82', '\u1F0A\u0399'],
['\u1F83', '\u1F0B\u0399'],
['\u1F84', '\u1F0C\u0399'],
['\u1F85', '\u1F0D\u0399'],
['\u1F86', '\u1F0E\u0399'],
['\u1F87', '\u1F0F\u0399'],
['\u1F88', '\u1F08\u0399'],
['\u1F89', '\u1F09\u0399'],
['\u1F8A', '\u1F0A\u0399'],
['\u1F8B', '\u1F0B\u0399'],
['\u1F8C', '\u1F0C\u0399'],
['\u1F8D', '\u1F0D\u0399'],
['\u1F8E', '\u1F0E\u0399'],
['\u1F8F', '\u1F0F\u0399'],
['\u1F90', '\u1F28\u0399'],
['\u1F91', '\u1F29\u0399'],
['\u1F92', '\u1F2A\u0399'],
['\u1F93', '\u1F2B\u0399'],
['\u1F94', '\u1F2C\u0399'],
['\u1F95', '\u1F2D\u0399'],
['\u1F96', '\u1F2E\u0399'],
['\u1F97', '\u1F2F\u0399'],
['\u1F98', '\u1F28\u0399'],
['\u1F99', '\u1F29\u0399'],
['\u1F9A', '\u1F2A\u0399'],
['\u1F9B', '\u1F2B\u0399'],
['\u1F9C', '\u1F2C\u0399'],
['\u1F9D', '\u1F2D\u0399'],
['\u1F9E', '\u1F2E\u0399'],
['\u1F9F', '\u1F2F\u0399'],
['\u1FA0', '\u1F68\u0399'],
['\u1FA1', '\u1F69\u0399'],
['\u1FA2', '\u1F6A\u0399'],
['\u1FA3', '\u1F6B\u0399'],
['\u1FA4', '\u1F6C\u0399'],
['\u1FA5', '\u1F6D\u0399'],
['\u1FA6', '\u1F6E\u0399'],
['\u1FA7', '\u1F6F\u0399'],
['\u1FA8', '\u1F68\u0399'],
['\u1FA9', '\u1F69\u0399'],
['\u1FAA', '\u1F6A\u0399'],
['\u1FAB', '\u1F6B\u0399'],
['\u1FAC', '\u1F6C\u0399'],
['\u1FAD', '\u1F6D\u0399'],
['\u1FAE', '\u1F6E\u0399'],
['\u1FAF', '\u1F6F\u0399'],
['\u1FB3', '\u0391\u0399'],
['\u1FBC', '\u0391\u0399'],
['\u1FC3', '\u0397\u0399'],
['\u1FCC', '\u0397\u0399'],
['\u1FF3', '\u03A9\u0399'],
['\u1FFC', '\u03A9\u0399'],
['\u1FB2', '\u1FBA\u0399'],
['\u1FB4', '\u0386\u0399'],
['\u1FC2', '\u1FCA\u0399'],
['\u1FC4', '\u0389\u0399'],
['\u1FF2', '\u1FFA\u0399'],
['\u1FF4', '\u038F\u0399'],
['\u1FB7', '\u0391\u0342\u0399'],
['\u1FC7', '\u0397\u0342\u0399'],
['\u1FF7', '\u03A9\u0342\u0399']
]);
const toUpper = (string) => string.replace(reToUpper, (match) => toUpperMap.get(match));
toLocaleLowerCase()
? developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/… – VicegerenttoLowerCase()
: ecma-international.org/ecma-262/6.0/… vs ecma-international.org/ecma-262/5.1/#sec-15.5.4.16 – VicegerentString.fromCodePoint(223)
can be written as a string literal to simplify the example a bit:'\xDF'
. – Heeler