How to connect to MySQL using UTF8 within a perl script?
Asked Answered
S

2

20

In a nutshell:

Within a Perl-Script: How do I connect to MySQL in a way that allows to transmit the four-byte unicode character U+1F61C ("😜") from the perl script to a MySQL-Table where this character should be stored?

Using {mysql_enable_utf8 => 1} doesn't solve the problem.

In detail:

I have exactly the same problem as described in the Question ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x9C' for column 'comment' at row 1 and even with the same Unicode character (😜 = U+1F61C = FACE WITH STUCK-OUT TONGUE AND WINKING EYE) which produces the error message

DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9F\x98\x9C' for column ...

But I don't use PHP, I use Perl.

The accepted answer in the other question says:

  • Run MySQL 5.5 or later.
    I check the version:

    mysql> select version();
    +-------------------------+
    | version()               |
    +-------------------------+
    | 5.7.13-0ubuntu0.16.04.2 |
    +-------------------------+
    

    So it is 5.7 which is later than 5.5.
    ✅checked

  • Set table's character to utf8mb4.
    I check the character set of my database, my table and even of the reported column:

    mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA 
        -> WHERE schema_name = "myDatabase";
    +----------------------------+
    | default_character_set_name |
    +----------------------------+
    | utf8mb4                    |
    +----------------------------+
    
    mysql> SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
        ->        information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
        -> WHERE CCSA.collation_name = T.table_collation
        ->   AND T.table_schema = "myDatabase"
        ->   AND T.table_name   = "myTable";
    +--------------------+
    | character_set_name |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    
    mysql>   SELECT character_set_name FROM information_schema.`COLUMNS` 
        -> WHERE table_schema = "myDatabase"
        ->   AND table_name   = "myTable"
        ->   AND column_name  = "myColumn";
    +--------------------+
    | character_set_name |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    

    So my database, my table and the reported column all use the character set utf8mb4.
    ✅checked

  • Enable UTF8 on your MySQL connection.
    This seems to be the problem. The answer to the other question says:

    SET NAMES utf8, or use an option when connecting that similarly enables it.

I don't know how to SET NAMES utf8 within a perl script, so I did it how I did it over the last years. I think that this is "an option when connecting that similarly enables it".
It's at the end of the long line that begins with my $dbh = DBI->connect:

#!/usr/bin/perl -w
use strict;
use warnings;
use utf8;
use Encode;
use DBI;
binmode STDOUT, ":utf8";

#Here I connect using the parameter mysql_enable_utf8 (create database handle):
my $dbh = DBI->connect('DBI:mysql:database=myDatabase;host=localhost','aUser','aPassword',{mysql_enable_utf8 => 1});

#Prepare the statement (create statement handle):
my $sth = $dbh->prepare('INSERT INTO `myTable` (`myColumn`) VALUES(?);');

#This doesn't work: 
$sth->execute('😜');

#This doesn't work either: 
$sth->execute(encode_utf8('😜'));

#end processing:
$dbh->disconnect();
exit(0);

Both executes throw the same error (only the line number at the end changes):

DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9F\x98\x9C' for column 'myColumn' at row 1 at myTestScript.pl line 16.

What am I doing wrong?
How can I do it better?

Salim answered 23/8, 2016 at 8:17 Comment(3)
I wish all questions were written up like this one...Pisces
@Shadow: It did me cost about 1 hour, maybe this is why almost nobody does it.Tungusic
Well, everyone should follow this example. Took me only 5 mins to write the answer because the question was clear and well structured.Pisces
P
19

The problem is with the SET NAMES utf8 command. In MySQL the utf8 character set is not truly utf8, it supports characters up 3 bytes only and the character in question has 4 bytes:

The utf8 character set in MySQL has these characteristics:

• No support for supplementary characters (BMP characters only).

• A maximum of three bytes per multibyte character.

The true utf8 is the utf8mb4 that you use as character set in the field itself. So, use SET NAMES utf8mb4

So from Perl you should use {mysql_enable_utf8mb4 => 1} instead of {mysql_enable_utf8 => 1}.

Pisces answered 23/8, 2016 at 8:37 Comment(2)
Thanks! I just added the note about mysql_enable_utf8mb4 to the end of your answer, becasue this seems to be the way you say SET NAMES utf8mb4 in Perl.Tungusic
I must trust you on this one, since I do not really know perl, only MySQL :)Pisces
N
0

I tried so many times, in so many different ways, to cgi script works correctly to read input from STDIN, read html file, print it to STDOUT and search the inputed text on mysql correctly. The attribute mysql_enable_utf8mb4 and "SET NAMES utf8mb4" after the mysql connection works correctly with "meta charset='UTF-8'".

#!/usr/bin/perl
print "Content-type: text/html; charset=UTF-8\n\n";

#use utf8;
#use open ':utf8';
#binmode STDOUT, ":utf8";
#binmode STDIN , ":utf8";
#use encoding 'utf8';

our $dbh = DBI->connect("DBI:mysql:database=$database;host=$servername;port=$port",$username,$password, {PrintWarn => 0, PrintError => 0, mysql_enable_utf8mb4 => 1}) || die;
$dbh->do("SET NAMES utf8mb4");
Nietzsche answered 19/1, 2023 at 3:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.