Unicode in Rhino
Asked Answered
I

1

7

For some reason Unicode strings don't behave properly in Rhino, Mozilla's JavaScript engine. If I enter Unicode text in the REPL, or manipulate it, it returns back gibberish.

js> 'тотальная киборгизация'
B>B0;L=0O :81>@3870F8O

ASCII characters work just fine.

js> 'reprap for everyone'
reprap for everyone

Unix commands work fine too:

$ echo 'тотальная киборгизация'
тотальная киборгизация

JVM output is fine too, running class Test { public static void main(String[] args) { System.out.println("тотальная киборгизация"); } } outputs Cyrillic correctly.

Java and Rhino versions are:

$ java -version
java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.10.1)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
$ rhino
Rhino 1.7 release 3 2012 05 18

Locales:

$ echo $LC_TYPE

$ echo $LANG
en_US.UTF-8

Changing LC_ALL to en_US.UTF-8 doesn't help.

Does this problem have to do with this StackOverflow question, Javascript using UCS-2?

What's the problem, and how can I use proper Unicode in Rhino REPL?

Islander answered 13/12, 2012 at 14:19 Comment(6)
I don't see the same problem. I'm using Rhino 1.7 release 2 2009 03 22 and java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-9M3425) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode) on Mac OS X 10.5.8.Kaine
Try eliminating Rhino from environment. What happens when you give the Unix command line the command, echo 'тотальная киборгизация', without running Rhino?Kaine
The output you get, after accounting for control characters, is actually UTF-16, not UTF-8. (Given this, the fact that the plain ASCII works is peculiar.) You might try compiling and running this Java to see if the VM's settings are to blame: class Test { public static void main(String[] args) { System.out.println("тотальная киборгизация"); } }Sealskin
I just figured this out: The plain ASCII is (probably) just as broken as the Cyrillic and only seems to work because the interspersed nulls between the characters are not displayed.Sealskin
Try using Unicorn instead, Rhino should know it better.Orate
I am not sure which Javascript shell are you using developer.mozilla.org/en-US/docs/Web/JavaScript/Shells Is it JS shell developer.mozilla.org/en-US/docs/SpiderMonkey/… or Rhino shell developer.mozilla.org/en/docs/Rhino/ShellCorrosion
D
1

It really should be noted that JavaScript doesn't really handle Unicode properly since it predates UTF16. (It does use another 16 bit encoding system which is similar, but certainly not the same.)

This writeup explains the problem in well and provides libraries and workarounds .

Dragonnade answered 7/11, 2013 at 14:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.