When a string is not a string? Unicode normalization weirdness in Javascript
Asked Answered
B

1

18

I have run into what is, to me, some serious weirdness with string behavior in Firefox when using the .normalize() Unicode normalization function.

Here is a demo, view the console in Firefox to see the problem.

Suppose I have a button with an id of "NFKC":

<button id="NFKC">NFKC</button>

Get a reference to that, easy enough:

document.querySelector('#NFKC')
// <button id="NFKC">

Now, since this button has an id of NFKC, which we can get at that string as follows:

document.body.querySelector('#NFKC').id
// "NFKC"

Stick that string in a variable:

var s1 = document.body.querySelector('#NFKC').id

By way of comparison, assign the very same string to a variable directly:

var s2 = 'NFKC'

So of course:

s1 === s2
// true

And:

s1 == s2
// true

Now’s the part where my head explodes.

To normalize a string, you pass one of NFC, NFD, NFKC, or NFKD to .normalize(), like this:

'á'.normalize('NFKC')
// "á"

Of course, depending on the normalization form you choose, you get different codepoints, but whatever.

'á'.normalize('NFC').length == 1
// true
'á'.normalize('NFD').length == 2
// true

But whatever. The point is, pass one of four strings corresponding to normalization forms to .normalize(), and you'll get a normalized string back.

Since we know that s1 (the string we retrieved from the DOM) and s2 are THE SAME STRING (s1 === s2 is true), then obviously we can use either to normalize a string:

'á'.normalize(s2)
"á"
// well yeah, because s2 IS 'NFKC'. 

Naturally, s1 will behave exactly the same way, right?

'á'.normalize(s1)
 // RangeError: form must be one of 'NFC', 'NFD', 'NFKC', or 'NFKD'

Nope.

So the question is: why does it appear that s1 is not equal to s2 as far as .normalize() is concerned, when s1 === s2 is true?

This doesn’t happen in Chrome, the only other browser I’ve tested so far.

UPDATE

This was a bug in Firefox and has been fixed.

Bin answered 19/3, 2015 at 19:39 Comment(13)
and your question is?Gowan
Why s1 isn't === s2. I will make the question more obvious.Bin
It might help if you can set up a demo that reproduces the issue.Gongorism
I will set up a demo that reproduces the issue.Bin
@Gongorism Here's a demo: jsfiddle.net/qbxm49h2/1 An error is produced on the last test ('á'.normalize(s1)), in FF36.Floret
@pat I'd say this is clearly not correct behavior; this looks like a pretty clear bug for the FF bug tracker.Floret
I have added a bug report at bugzilla.mozilla.org/show_bug.cgi?id=1145326Bin
I found the C++ implementation of String.prototype.normalize for Firefox. Looks like the problem is related to however formStr is set.Floret
It's not just a DOM issue. These also fail: var s2 = 'NFKC'.split('').join(''); and var s2= 'NFKCabc'.replace('abc','');. But this doesn't: var s2= 'N'+'F'+'K'+'C';. Weird.Mcgill
Node.id isn't in the mdn docs, or in the WHATWG specRhizo
@cdosborn: it is in that very spec as well as on MDNGongorism
gah.. i was looking at the node api, right you areRhizo
Could you post your update as an answer and accept it? Congrats on finding this bug BTW!Deer
A
1

I'm not sure if this will help, but the documentation states that

This is an experimental technology, part of the Harmony (ECMAScript 6) proposal. Because this technology's specification has not stabilized, check the compatibility table for usage in various browsers. Also note that the syntax and behavior of an experimental technology is subject to change in future version of browsers as the spec changes.

And the compatibility table is

Feature         Chrome  Firefox (Gecko) Internet Explorer           Opera   Safari
Basic support   34      31 (31)         11 on Windows 10 Preview    (Yes)   Not supported

However, the last update to this page was Nov 18, 2014.

Alexina answered 19/3, 2015 at 20:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.