How to change dplyr::tbl connection encoding to utf8?
Asked Answered
U

2

9

In a MySQL database, a table is encoded in utf8, but for some reason the connection is in latin1.

res <- RMySQL::dbSendQuery(con,"show variables like 'character_set_%'")
dbFetch(res)
           Variable_name                      Value
1     character_set_client                     latin1
2 character_set_connection                     latin1
3   character_set_database                    utf8mb4
4 character_set_filesystem                     binary
5    character_set_results                     latin1
6     character_set_server                     latin1
7     character_set_system                       utf8
8       character_sets_dir /usr/share/mysql/charsets/

This page explain how to set the connection's character set to utf8 using RMySQL.

RMySQL::dbGetQuery(con,"show variables like 'character_set_%'")
RMySQL::dbGetQuery(con,"set names utf8")

But I actually prefer to use the dplyr::tbl to query the database. Since the connection created by dplyr::src_mysql only has the possibility to send sql statements that create tables. What is the dplyr way to set the connection setting to use utf8 encoding?

Unscreened answered 19/6, 2015 at 7:45 Comment(4)
Have you seen this? #25935252Merl
@RomanLuštrik I have seen it thank you, I also convert character columns to utf8 using an ad-hoc solution with iconv() but it is a hack and I would rather read character vectors directly in utf8. Especially because the database is already in utf8.Unscreened
I feel your pain. Encoding is a bane for everyone working with non-latin1 data.Merl
I actually fixed this by changing the database server configuration settings to utf8, and choosing the option skip-character-set-client in /etc/mysql/my.cfUnscreened
S
4

I ran into the same problem, which I solved as follows:

foo_db <- src_mysql(host='0.0.0.0',user='dbuser',password='a_password',
                    dbname='FlightTimes',port=3336)
dbGetQuery(foo_db$con,'SET NAMES utf8')

I found this was possible by looking at the structure of foo_db via str(foo_db), seeing there was a con attribute of class MySQLConnection, then applying your dbGetQuery mantra.

Stefanysteffane answered 7/1, 2016 at 18:45 Comment(0)
U
2

Edit the server option file (located in /etc/mysql/my.cnf on a Debian system) and add the following options:

collation-server = utf8_unicode_ci
character-set-server = utf8
skip-character-set-client-handshake

The mysql server configuration file can also be edited with mysql-workbench.

After this change, dplyr::tbl fetches character vector encoded in utf-8.

Unscreened answered 24/6, 2015 at 6:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.