How to compile a java source file which is encoded as "UTF-8"?
Asked Answered
U

11

33

I saved my Java source file specifying it's encoding type as UTF-8 (using Notepad, by default Notepad's encoding type is ANSI) and then I tried to compile it using:

javac -encoding "UTF-8" One.java

but it gave an error message"

One.java:1: illegal character: \65279

?public class One {

^
1 error

Is there any other way, I can compile this?

Here is the source:

public class One {
    public static void main( String[] args ){
        System.out.println("HI");
    }
} 
Undertrump answered 12/11, 2009 at 23:37 Comment(0)
S
49

Your file is being read as UTF-8, otherwise a character with value "65279" could never appear. javac expects your source code to be in the platform default encoding, according to the javac documentation:

If -encoding is not specified, the platform default converter is used.

Decimal 65279 is hex FEFF, which is the Unicode Byte Order Mark (BOM). It's unnecessary in UTF-8, because UTF-8 is always encoded as an octet stream and doesn't have endianness issues.

Notepad likes to stick in BOMs even when they're not necessary, but some programs don't like finding them. As others have pointed out, Notepad is not a very good text editor. Switching to a different text editor will almost certainly solve your problem.

Samathasamau answered 12/11, 2009 at 23:47 Comment(5)
+1: here's the alternative: use notepad++, editplus or consorts or, only if you've a good grasp on coding/building/running java at commandline, an IDE like EclipseOlwena
Notepad is decent for telling if a file is UTF-8 (or UTF-16) but this issue is a pretty serious one IMHO and a lot of people get tripped up by it.Chor
Notepad sticks a BOM in there so that later on it (or anything else that understand BOMs) can figure out that the file is most likely UTF-8.Superimpose
That saved my day! From now on, only Notepad++ in my computer.Airlee
@JohanBoulé: What are you talking about? The answer doesn't even mention Notepad++, it's only mentioned in the comments. When the answer says "Notepad" it's referring to the Windows built-in application "Notepad", not Notepad++.Samathasamau
T
21

Open the file in Notepad++ and select Encoding -> Convert to UTF-8 without BOM.

Tsana answered 10/6, 2012 at 12:18 Comment(0)
H
15

This isn't a problem with your text editor, it's a problem with javac ! The Unicode spec says BOM is optionnal in UTF-8, it doesn't say it's forbidden ! If a BOM can be there, then javac HAS to handle it, but it doesn't. Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.

The proposed solution of removing the BOM is only a workaround and not the proper solution.

This bug report indicates that this "problem" will never be fixed : https://web.archive.org/web/20160506002035/http://bugs.java.com/view_bug.do?bug_id=4508058

Since this thread is in the top 2 google results for the "javac BOM" search, I'm leaving this here for future readers.

Highminded answered 20/1, 2015 at 10:50 Comment(1)
The general Java change for all UTF-8 streams was reverted due to JDK-6378911 impacting code that expects to read the BOM. It would need to be fixed in javac itself.Peccant
E
9

Try javac -encoding UTF8 One.java

Without the quotes and it's UTF8, no dash.

See this forum thread for more links

Etrem answered 12/11, 2009 at 23:44 Comment(1)
Problem is the BOM which notepad adds and which javac doesn't needUndertrump
A
6

See Below For example we can discuss with an Program (Telugu words)

Program (UnicodeEx.java)

class UnicodeEx {  
    public static void main(String[] args) {   
        double ఎత్తు = 10;  
        double వెడల్పు = 25;   
        double దీర్ఘ_చతురస్ర_వైశాల్యం;  
        System.out.println("The Value of Height = "+ఎత్తు+" and Width = "+వెడల్పు+"\n");  
        దీర్ఘ_చతురస్ర_వైశాల్యం = ఎత్తు * వెడల్పు;  
        System.out.println("Area of Rectangle = "+దీర్ఘ_చతురస్ర_వైశాల్యం);  
    }  
}

This is the Program while saving as "UnicodeEx.java" and change Encoding to "unicode"

**How to Compile**

javac -encoding "unicode" UnicodeEx.java

How to Execute

java UnicodeEx

The Value of Height = 10.0 and Width = 25.0

Area of Rectangle = 250.0

Alesha answered 28/8, 2014 at 5:59 Comment(1)
I had trouble with UTF-8 encoded source files with a UTF-8 BOM. Converting to UTF-16 LE (with corresponding BOM) and adding -encoding unicode to the javac command line compiled fine.Unrig
G
4

I know this is a very old thread, but I was experiencing a similar problem with PHP instead of Java and Google took me here. I was writing PHP on Notepad++ (not plain Notepad) and noticed that an extra white line appeared every time I called an include file. Firebug showed that there was a 65279 character in those extra lines.

Actually both the main PHP file and the included files were encoded in UTF-8. However, Notepad++ has also an option to encode as "UTF-8 without BOM". This solved my problem.

Bottom line: UTF-8 encoding inserts here and there this extra BOM character unless you instruct your editor to use UTF8 without BOM.

Graveyard answered 3/2, 2012 at 20:21 Comment(0)
N
0

Works fine here, even edited in Notepad. Moral of the story is, don't use Notepad. There's likely a unprintable character in there that Notepad is either inserting or happily hiding from you.

Necking answered 12/11, 2009 at 23:45 Comment(2)
The BOM (Byte Order Mark) is a non-printable character, which means it is meant to be hidden from the editing window. Any good text editor should however be aware of the presence of this mark and honor whatever information it contains. Using a hexadecimal/binary editor show allow you to check how the BOM is constructed. The BOM only causes problems with badly written or non-unicode-compliant tools, and any tool which breaks in the presence of a BOM should be fixed ASAP (it's 2015 for god's sake ...!). Here's more info about the BOM : en.wikipedia.org/wiki/Byte_order_markHighminded
But I totally agree with the whole "not using Notepad" idea :)Highminded
I
0

I had the same problem. To solve it opened the file in a hex editor and found three "invisible" bytes at the beginning of the file. I removed them, and compilation worked.

Ipomoea answered 21/9, 2014 at 11:48 Comment(1)
Those "three invisible bytes" are what is called the BOM (Byte Order Mark) : en.wikipedia.org/wiki/Byte_order_markHighminded
S
0
  • Open your file with WordPad or any other editor except Notepad.

  • Select Save As type as Text Document - MS-DOS Format

  • Reopen the Project

Salutation answered 10/5, 2016 at 16:13 Comment(1)
That is terrible advice… What would happen to the Unicode characters that are already in the document?Knowles
T
0

To extend the existing answers with a solution for Linux users:

To remove the BOM on all .java files at once, go to your source directory and execute

find -iregex '.*\.java' -type f -print0 | xargs -0 dos2unix

Requires find, xargs and dos2unix to be installed, which should be included in most distributions. The first statement finds all .java files in the current directory recursively, the second one converts each of them with the dos2unix tool, which is intended to convert line endings but also removes the BOM.

The line endings conversion should have no effect as it should already be in Linux \n format on Linux if you configure your version control correctly but be warned that it does that as well in case you have one of those rare cases where that is not intended.

Tiatiana answered 26/4, 2017 at 14:31 Comment(0)
D
0

In the Intellij Idea(Settings>Editor>File Encodings), the project encoding was "windows-1256". So I used the following code to convert static strings to utf8

protected String persianString(String persianStirng) throws UnsupportedEncodingException {
    return new String(persianStirng.getBytes("windows-1256"), "UTF-8");
}

Now It is OK! Depending on the file encoding you should change "windows-1256" to a proper one

Donelladonelle answered 10/6, 2019 at 12:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.