I'm struggling with a strange file name encoding issue when listing directory contents in Java 6 on both OS X and Linux: the File.listFiles()
and related methods seem to return file names in a different encoding than the rest of the system.
Note that it is not merely the display of these file names that is causing me problems. I'm mainly interested in doing a comparison of file names with a remote file storage system, so I care more about the content of the name strings than the character encoding used to print output.
Here is a program to demonstrate. It creates a file with a Unicode name then prints out URL-encoded versions of the file names obtained from the directly-created File, and the same file when listed under a parent directory (you should run this code in an empty directory). The results show the different encoding returned by the File.listFiles()
method.
String fileName = "Trîcky Nåme";
File file = new File(fileName);
file.createNewFile();
System.out.println("File name: " + URLEncoder.encode(file.getName(), "UTF-8"));
// Get parent (current) dir and list file contents
File parentDir = file.getAbsoluteFile().getParentFile();
File[] children = parentDir.listFiles();
for (File child: children) {
System.out.println("Listed name: " + URLEncoder.encode(child.getName(), "UTF-8"));
}
Here's what I get when I run this test code on my systems. Note the %CC
versus %C3
character representations.
OS X Snow Leopard:
File name: Tri%CC%82cky+Na%CC%8Ame
Listed name: Tr%C3%AEcky+N%C3%A5me
$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
KUbuntu Linux (running in a VM on same OS X system):
File name: Tri%CC%82cky+Na%CC%8Ame
Listed name: Tr%C3%AEcky+N%C3%A5me
$ java -version
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.1) (6b18-1.8.1-0ubuntu1)
OpenJDK Client VM (build 16.0-b13, mixed mode, sharing)
I have tried various hacks to get the strings to agree, including setting the file.encoding
system property and various LC_CTYPE
and LANG
environment variables. Nothing helps, nor do I want to resort to such hacks.
Unlike this (somewhat related?) question, I am able to read data from the listed files despite the odd names
.java
files? I think you can use thefile
command to determine that. – Isaacs.java file
from thefile
command: UTF-8 Unicode Java program text – Canorous"AB"
(Latin script), another anmed"ΑΒ"
(Greek script), and a third named"АВ"
(Cyrillic script). Talk about security through gosh-that’s-hard-to-type-ness. :) I once had a machine names wraeththu, whom nobody could ever type the name of right to log into. Coulda been worse: could’ve spelt it like the original, which was wrǽþþu in Old English. :) – Emmalinejava.nio.file.Paths.get(...)
to return a valid path, callingexists()
on an element returned fromjava.io.File.listFiles()
may returnfalse
! Any ideas? – AscocarpPaths.get(URI)
to work... If the filename is in NFD it is reported as NFC, callingPaths.get
still returns an NFC name. I'm really struggling to understand how this is not a bug (granted, we are off topic now as the OP was concerned with HFS). – Adna