I have a file which has characters like: " Joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "
www.unicode.org/charts/PDF/U0B80.pdf
When I use the following code:
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, "UTF8"));
The output is boxes and other weird characters like this:
"�P�^����O֛���;�<�aYՠ؛"
Can anyone help?
these are the complete codes:
File f=new File("E:\\bible.docx");
Reader decoded=new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8);
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, StandardCharsets.UTF_8));
char[] buffer = new char[1024];
int n;
StringBuilder build=new StringBuilder();
while(true){
n=decoded.read(buffer);
if(n<0){break;}
build.append(buffer,0,n);
bufferedWriter.write(buffer);
}
The StringBuilder value shows the UTF characters but when displaying it in the window it shows as boxes..
Found the Answer to the problem!!! The Encoding is Correct (i.e UTF-8) Java reads the file as UTF-8 and the String characters are UTF-8, The problem is that there is no font to display it in netbeans' output panel. After changing the font for the output panel (Netbeans->tools->options->misc->output tab) I got the expected result. The same applies when it is displayed in JTextArea(font needs to be changed). But we can't change font the windows' cmd prompt.
docx
file, you need adocx
reader. You cannot read it as if it were plain text. The problem is not the language, it is the file format. – Predestinate