php readdir problem with japanese language file name
Asked Answered
A

6

4

I have the following code

<?php
if ($handle = opendir('C:/xampp/htdocs/movies')) {
    while (false !== ($file = readdir($handle))) {
        if ($file != "." && $file != "..") {
            echo $file."<br />\n";
        }
    }
    closedir($handle);
}
?>

When it does have mb language such as japanese, it doesn't display properly instead it display like kyuukyoku Choujin R ?????~? rather then kyuukyoku Choujin R 究極超人あ~る

Anyway to make it display the correct name or make it still download-able by others?

Thanks for helping me :)

Abreaction answered 27/1, 2009 at 4:50 Comment(1)
This is not possible. See stackoverflow.com/questions/2887909Thirtytwo
M
9

I can't speak definitively for PHP, but I suspect it's the same basic problem as with Python 2 had (before later adding special support for Unicode string filenames).

My belief is that PHP is dealing with filenames using the standard C library ‘open’-et-al functions, which are byte-based. On Windows (NT) these try to encode the real Unicode filename using the system codepage. That might be cp1252 (similar to ISO-8859-1) for Western machines, or cp932 (similar to Shift-JIS) on Japanese machines. For any characters that don't exist in the system codepage you will get a ‘?’ character, and you'll be unable to refer to that file.

To get around this problem PHP would have to do the same as Python 3.0 and start using Unicode strings for filenames (and everything else), using the ‘_wopen’-et-al functions to get native-Unicode access to the filenames under Windows. I expect this will happen in PHP6, but for the moment you're probably pretty much stuffed. You could change the system codepage to cp932 to get access to the filenames, but you'd still get ‘?’ characters for any other Unicode characters not in Shift-JIS, and in any case you really don't want to make your application's internal strings all Shift-JIS as it's quite a horrible encoding.

If it's your own scripts choosing how to store files, I'd strongly suggest using simple primary-key-based filenames like ‘4356’ locally, putting the real filename in a database, and serving the files up using rewrites/trailing path parts in the URL. Keeping user-supplied filenames in your own local filenames is difficult and a recipe for security disasters even without having to worry about Unicode.

Monteith answered 1/3, 2009 at 9:36 Comment(1)
+!: "Keeping user-supplied filenames in your own local filenames is difficult and a recipe for security disasters even without having to worry about Unicode."Cowpox
H
2

As @bobince mentioned, PHP returns filenames in the specified encoding for System Locale, which is used by non-Unicode aware applications. If the character doesn't exist in the current system encoding, the filename will contain '?' instead and will not be accessible.

You can try installing php-wfio.dll at https://github.com/kenjiuno/php-wfio, and refer to files via the wfio:// protocol.

Hives answered 29/1, 2015 at 17:45 Comment(1)
This function is absolutely solve the scandir with the directory that contain japanese file name problem. It return the full japanese filename without "?"!!Bechler
B
0

You missed two other references to the $file variable, mate, but that's for the better as I think I may've discovered a slightly more efficient method; give this a try:

<?php
if ($handle = opendir('C:/xampp/htdocs/movies')) {
    while (false !== ($file = readdir($handle))) {
        $file = mb_substr($file, mb_strrpos($file, '/') + 1);
        if ($file != "." && $file != "..") {
            echo $file . "<br />\n";
        }
    }
    closedir($handle);
}
?>
Beefcake answered 27/1, 2009 at 8:49 Comment(0)
P
0

sorry :)

tries this:

<?php if ($handle = opendir('C:/xampp/htdocs/movies')) { while (false !== ($file = readdir($handle))) { $filename_utf16 = iconv( "iso-8859-1", "utf-16", $file); if ($filename_utf16 != "." && $filename_utf16 != "..") { echo $filename_utf16 . "<br />\n"; } } closedir($handle); } ?>

Petroglyph answered 1/3, 2009 at 8:52 Comment(0)
B
-1

Replace any instance of $file with mb_substr($file, mb_strrpos($file, '/') + 1) and you should be good to go. Huzzah for multi-byte encoding!

Beefcake answered 27/1, 2009 at 6:27 Comment(0)
W
-1

I think Windows uses UTF-16 for file names. So try the mb_convert_encoding function to convert from the internal encoding to your output encoding:

// convert from UTF-16 to UTF-8
echo mb_convert_encoding($file, 'UTF-8', 'UTF-16');

Maybe you have to change some settings first (see mb_get_info).

Williamson answered 28/1, 2009 at 13:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.