Why do I need to escape unicode in java source files?
Asked Answered
C

2

12

Please note that I'm not asking how but why. And I don't know if it's a RCP specific problem or if it's something inherent to java.

My java source files are encoded in UTF-8.

If I define my literal strings like this :

    new Language("fr", "Français"),
    new Language("zh", "中文")

It works as I expect when I use the string in the application by launching it from Eclipse as an Eclipse application :

enter image description here

But if fails when I launch the .exe built by the "Eclipse Product Export Wizard" :

enter image description here

The solution I use is to escape the chars like this :

    new Language("fr", "Fran\u00e7ais"), // Français
    new Language("zh", "\u4e2d\u6587") // 中文

There is no problem in doing this (all my other strings are in properties files, only the languages names are hardcoded) but I'd like to understand.

I thought the compiler had to convert the java literal strings when building the bytecode. So why is the unicode escaping necessary ? Is it wrong to use use high range unicode chars in java source files ? What happens exactly to those chars at compilation and in what it is different from the handling of escaped chars ? Is the problem just related to RCP cache ?

Conlon answered 27/6, 2012 at 13:4 Comment(14)
It appears that the Eclipse Product Export Wizard is not interpreting your files as UTF-8. Perhaps you need to run Eclipse's JVM with the encoding set to UTF-8 (-Dfile.encoding=UTF8 in eclipse.ini)?Dogged
While this does not really explain why it happens it does suggest an alternative solution and indicates that the export wizard for whatever reason doesn't seem to honor the project's encoding properly: #6891579Kilah
To confirm @Matt Ball's explanation, witch I think is correct, try setting the following option in the wizard: "Use class files compiled in the workspace"Belmonte
@Jiddo: it does explain why it happens: "not interpreting your files as UTF-8", so it's interpreting them as another encoding incompatible with UTF-8.Deakin
@MattBall It works. Please build an answer. But I'd like to understand why Eclipse doesn't know what encoding use when exporting even while UTF-8 is the encoding format defined in preferences/general/workspace and it knows how to compile them. At the very least an option in the export wizard or the .plugin file seems to be needed.Klina
@brunoconde Please can you precise where is this option ?Klina
@Deakin Indeed. What I meant was that it didn't explain why it is not interpreting your files as UTF-8, which I interpreted as what the question was about. Sorry about the confusion.Kilah
@dystroy it is the "Export wizard" > "Options" tabBelmonte
@brunoconde I use the "Eclipse Product Export Wizard" from the .product file. I don't have tabs :\Klina
@dystroy, sorry I have a plugin environment not RCP. I seems the RCP wizard doesn't have this option.Belmonte
OK, thanks for your help. Your observation points to a need similar to the one I was referring at.Klina
@Jiddo: it's not interpreting the file as UTF-8 because that's not their encoding when imported into/created in Eclipse.Deakin
@Deakin those files are generally considered by Eclipse as UTF-8, according to the correct display. This is due to the preference set in preferences/general/workspace.Klina
@dystroy it's probably just a bug in Eclipse's Product Export Wizard. Such things are disturbingly common in a lot of tools. Many developers just don't understand or test encoding issues.Dialectical
D
10

It appears that the Eclipse Product Export Wizard is not interpreting your files as UTF-8. Perhaps you need to run Eclipse's JVM with the encoding set to UTF-8 (-Dfile.encoding=UTF8 in eclipse.ini)?

(Copypasta'd at OPs request)

Dogged answered 27/6, 2012 at 13:26 Comment(0)
P
4

When exporting a plug-in, it gets compiled through a process separate from the normal build process within the IDE. There is a known bug that the build process (PDE.Build) disregards the text encoding used by the IDE.

The export can be made to work properly by specifying the text encoding in the build.properties file of your plugin

javacDefaultEncoding.. =UTF-8
Panthia answered 6/7, 2013 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.