UTF-8 problems while reading CSV file with fgetcsv
Asked Answered
G

7

41

I try to read a CSV and echo the content. But the content displays the characters wrong.

Mäx Müstermänn -> Mäx Müstermänn

Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).

This is the content of the CSV file:

"Mäx";"Müstermänn"

My PHP script

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
</body>
</html>

I tried to use setlocale(LC_ALL, 'de_DE.utf8'); as suggested here without success. The content is still wrong displayed.

What I'm missing?

Edit:

An echo mb_detect_encoding($data[$c],'UTF-8'); gives me UTF-8 UTF-8.

echo file_get_contents("specialchars.csv"); gives me "Mäx";"Müstermänn".

And

print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))

gives me

Array ( [0] => Mäx [1] => Müstermänn )

What does it mean?

Gaona answered 16/1, 2012 at 15:23 Comment(1)
What happens when you do echo file_get_contents("specialchars.csv")? What happens when you do print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))?Wag
G
2

Now I got it working (after removing the header command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.

Furthermore, I used SET NAMES 'utf8' for the database. Now it is also correct in the database.

Gaona answered 17/1, 2012 at 16:47 Comment(1)
If the imported file is of another charset than your code you may also need setlocale().Aversion
C
76

Try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
Cockfight answered 23/10, 2014 at 13:34 Comment(4)
This totally removed the special characters with space, which is totally dangerous!!!Flanders
@robssanches the above code work for only alphabets type of words(character) but it does not work with other languages for e.g Chinese, Hindi, Hebrew etc.. etcMindexpanding
This worked for me. So sad, that this helpful line is missing in official documentation de.php.net/manual/de/function.fgetcsv.phpC
I am having some trouble with this solution... Some characters as ’ (right single quote mark) and … (ellipsis) are not working with utf8_encodeAdala
E
19

Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...

The following worked fine for me:

To represent the characters correctly on the html page, the header was needed :

header('Content-Type: text/html; charset=UTF-8');

In order to parse every character correctly, I used:

utf8_encode(fgets($file));

Dont forget to use in all following string operations the 'Multibyte String Functions', like:

mb_strtolower($value, 'UTF-8');
Eddaeddana answered 27/1, 2014 at 13:52 Comment(2)
you just saved me a lot of time, thank you! I've been trying to solve this issue for ages..Swob
a full example code where utf8_encode(fgets($file)); is actually used would be niceOverblown
L
10

In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

So this solution helped me a lot:

/**
 * getting CSV array with UTF-8 encoding
 *
 * @param   resource    &$handle
 * @param   integer     $length
 * @param   string      $separator
 *
 * @return  array|false
 */
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
    if (($buffer = fgets($handle, $length)) !== false)
    {
        $buffer = $this->autoUTF($buffer);
        return str_getcsv($buffer, $separator);
    }
    return false;
}

/**
 * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
 *
 * @param   string  $s
 *
 * @return  string
 */
private function autoUTF($s)
{
    // detect UTF-8
    if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
        return $s;

    // detect WINDOWS-1250
    if (preg_match('#[\x7F-\x9F\xBC]#', $s))
        return iconv('WINDOWS-1250', 'UTF-8', $s);

    // assume ISO-8859-2
    return iconv('ISO-8859-2', 'UTF-8', $s);
}

Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

some;nice;value;"and;here;comes;combinated;value";and;some;others

explode will explode string into parts:

some
nice
value
"and
here
comes
combinated
value"
and
some
others

but str_getcsv will explode string into parts:

some
nice
value
and;here;comes;combinated;value
and
some
others
Ludmilla answered 14/7, 2017 at 7:50 Comment(3)
Great answer ! This is the only one that actually deals with wrong character encoding issue when manipulating CSV data with PHP. Either you properly encode your data before manipulating it, otherwise you do it on the fly upon reading. In my case, fgetcsv was returning a broken output (nothing - even NULL nor FALSE - was returned !) without any PHP notice, because of misencoding issue.. you just saved me precious time with fgetcsvUTF8 because I had noway to re-encode original data, I hate encoding issue.. Thanks for sharing !Palpitate
This works really well. I have encountered one use case where it doesn't work. Not sure if you have any thoughts on it: Åland Islands - a row with that text in it will return ?land Islands using your function. Aside from that though I spotted no issuesBedad
Thank you for answering. Please describe how you managed to solve this problem.Peltate
A
8

Try putting this into the top of your file (before any other output):

<?php

header('Content-Type: text/html; charset=UTF-8');

?>
Aurilia answered 16/1, 2012 at 19:11 Comment(3)
If I put this on top I get �.Gaona
Perhaps I should mention that I upload the csv file through a form with enctype="multipart/form-data" accept-charset="utf-8". If I put your code into the example than it seems to work.Gaona
@Gaona that made a difference for me. Had 2 CSV's I was parsing, one had the accept-charset="utf-8" and the other didn't, and it didn't display correctly until I used this.Heliogravure
S
5

The problem is that the function returns UTF-8 (it can check using mb_detect_encoding), but do not convert, and these characters takes as UTF-8. Тherefore, it's necessary to do the reverse-convert to initial encoding (Windows-1251 or CP1251) using iconv. But since by the fgetcsv returns an array, I suggest to write a custom function: [Sorry for my english]

function customfgetcsv(&$handle, $length, $separator = ';'){
    if (($buffer = fgets($handle, $length)) !== false) {
        return explode($separator, iconv("CP1251", "UTF-8", $buffer));
    }
    return false;
}
Scorpius answered 6/10, 2013 at 19:56 Comment(0)
G
2

Now I got it working (after removing the header command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.

Furthermore, I used SET NAMES 'utf8' for the database. Now it is also correct in the database.

Gaona answered 17/1, 2012 at 16:47 Comment(1)
If the imported file is of another charset than your code you may also need setlocale().Aversion
M
0

For anyone finding this post and currently using Phpspreadsheet I personally found @Petr Hladík answer superb and just want to add that to apply this to PHPspreadsheet (I'm using v 2.1),

  1. Open phpoffice/phpspreadsheet/src/PhpSpreadsheet/Reader/Csv.php
  2. Paste in the 2 methods from Petrs answers
  3. Find the loadStringOrFile() function
  4. Replace $rowData = fgetcsv($fileHandle, 0, $delimiter, $this->enclosure, $this->escapeCharacter);
  5. With $rowData = $this->fgetcsvUTF8($fileHandle,1000,$delimiter);

You obviously do miss the enclosure and escapeCharacter so if this is important - amend the str_csv ref and add $this->enclosure, $this->escapeCharacter as 2 final params (see https://www.php.net/manual/en/function.str-getcsv.php)

Marietta answered 22/7, 2024 at 10:23 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.