Unable to retrieve UTF-8 accented characters from Access via PDO_ODBC
Asked Answered
A

3

5

I am trying to get an Access DB converted into MySQL. Everything works perfectly, expect for one big monkey wrench... If the access db has any non standard characters, it wont work. My query will tell me:

Incorrect string value: '\xE9d'

If I directly echo out the rows text that has the 'invalid' character I get a question mark in a black square in my browser (so é would turn into that invalid symbal on echo).

NOTE: That same from will accept, save and display the "é" fine in a textbox that is used to title this db upload. Also if I 'save as' the page and re-open it up the 'é' is displayed correctly....

Here is how I connect:

$conn = new PDO("odbc:Driver={Microsoft Access Driver (*.mdb)};Dbq=$fileLocation;SystemDB=$securefilePath;Uid=developer;Pwd=pass;charset=utf;");

I have tried numerous things, including:

$conn -> exec("set names utf8");

When I try a 'CurrentDb.CollatingOrder' in access it tells me 1033 apparently that is dbSortGeneral for "English, German, French, and Portuguese collating order".

What is wrong? It is almost like the PDO is sending me a collation my browser and PHP does not fully understand.

Apperceive answered 4/2, 2015 at 0:59 Comment(4)
I solved a lot of encoding issues using mssql, mysql, utf8 and latin1. if you provide more of your code i can try to help you. i haven't understood where the error Incorrect string value: '\xE9d' appears exactly for example. so pls - more code!Retardation
Did you try using the utf8_encode() function on the string you retrieved from the Access database? That might work, provided that the accented characters you are retrieving are represented in the ISO-8859-1 character set (like é is).Batwing
Gord Thompson: Thanks, that worked (will run more test tomorrow). Based on a google search the 'ISO-8859-1' supports the 4 languages listed for 1033. So hopefully that will fix everything in all situations.... Not sure how to mark your post as the answer?Apperceive
Northkildonan: That was a mysqli error on a fail query.Apperceive
B
13

The Problem

When using native PHP ODBC features (PDO_ODBC or the older odbc_ functions) and the Access ODBC driver, text is not UTF-8 encoded, even though it is stored in the Access database as Unicode characters. So, for a sample table named "Teams"

Team
-----------------------
Boston Bruins
Canadiens de Montréal
Федерация хоккея России

the code

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr = 
        'odbc:' .
        'Driver={Microsoft Access Driver (*.mdb)};' .
        'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb;' .
        'Uid=Admin;';
$db = new PDO($connStr);
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$sql = "SELECT Team FROM Teams";
foreach ($db->query($sql) as $row) {
    $s = $row["Team"];
    echo $s . "<br/>\n";
}
?>
</body>
</html>

displays this in the browser

Boston Bruins
Canadiens de Montr�al
????????? ?????? ??????

The Easy but Incomplete Fixes

The text returned by Access ODBC actually matches the Windows-1252 character encoding for the characters in that character set, so simply changing the line

$s = $row["Team"];

to

$s = utf8_encode($row["Team"]);

will allow the second entry to be displayed correctly

Boston Bruins
Canadiens de Montréal
????????? ?????? ??????

but the utf8_encode() function converts from ISO-8859-1, not Windows-1252, so some characters (notably the Euro symbol '€') will disappear. A better solution would be to use

$s = mb_convert_encoding($row["Team"], "UTF-8", "Windows-1252");

but that still wouldn't solve the problem with the third entry in our sample table.

The Complete Fix

For full UTF-8 support we need to use COM with ADODB Connection and Recordset objects like so

<?php
header('Content-Type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Access character test</title>
</head>
<body>
<?php
$connStr = 
        'Driver={Microsoft Access Driver (*.mdb)};' .
        'Dbq=C:\\Users\\Public\\__SO\\28311687.mdb';
$con = new COM("ADODB.Connection", NULL, CP_UTF8);  // specify UTF-8 code page
$con->Open($connStr);

$rst = new COM("ADODB.Recordset");
$sql = "SELECT Team FROM Teams";
$rst->Open($sql, $con, 3, 3);  // adOpenStatic, adLockOptimistic
while (!$rst->EOF) {
    $s = $rst->Fields("Team");
    echo $s . "<br/>\n";
    $rst->MoveNext;
}
$rst->Close();
$con->Close();
?>
</body>
</html>
Batwing answered 5/2, 2015 at 10:35 Comment(2)
Hi Gord, I tried this code but it's not working for me. I'm also using the MS Access database and the data is in the Hindi language. I can see it in the database also. But using the above code it is printing ???? only.Fishbolt
it also did not work for me. my language is urdu from Pakistan in arabic like scriptSewell
T
3

A bit more easily to manipulate the data. (Matrix array).

function consulta($sql) {
        $db_path =  $_SERVER["DOCUMENT_ROOT"] . '/database/Registros.accdb';
        $conn = new COM('ADODB.Connection', NULL, CP_UTF8) or exit('Falha ao iniciar o ADO (objeto COM).'); 
        $conn->Open("Persist Security Info=False;Provider=Microsoft.ACE.OLEDB.12.0;Jet OLEDB:Database Password=ifpb@10510211298;Data Source=$db_path"); 
        $rs = $conn->Execute($sql);
        $numRegistos = $rs->Fields->Count;
        $index = 0;
        while (!$rs->EOF){      
            for ($n = 0; $n < $numRegistos; $n++) {
                if(is_null($rs->Fields[$n]->Value)) continue;
                $resultados[$index][$rs->Fields[$n]->Name] = $rs->Fields[$n]->Value;
                echo '.';
            }
            echo '<br>';
            $index = $index + 1;
            $rs->MoveNext();
        }
        $conn->Close();
        return $resultados;
    }

    $dados = consulta("select * from campus");

    var_dump($dados);
Trombone answered 26/8, 2019 at 23:15 Comment(0)
A
1

Found the following solution. True, I did not have the opportunity to test it on php. But I suppose it should work out.

In order for native PHP ODBC features (PDO_ODBC or the older odbc_ functions) and the Access ODBC driver to be able to correctly subtract texts in Unicode encoding, that stored in the Access database as Unicode character, it is need enables "Beta: Use Unicode UTF-8 for worldwide language support" in Region Settiongs of Windows Operetion System.

After I did this at me, many programs using the standard ODBC driver MC Access, began to display correct texts in Unicode encoding.

All Settings -> Time & Language -> Language -> "Administrative Language Settings"

enter image description here

Avar answered 21/9, 2021 at 12:24 Comment(1)
it also did not workSewell

© 2022 - 2024 — McMap. All rights reserved.