Python mysql-connector converts some strings into bytearray
Asked Answered
A

3

7

I am using python3 and pandas to connect to some sql database:

import pandas as pd
import mysql.connector

cnx = mysql.connector.connect(user='me', password='***',
                          host='***',
                          database='***')
df=pd.read_sql("select id as uid,refType from user where registrationTime>=1451606400",con=cnx)
cnx.close()

I am getting 2 columns: id and refType, both of them are of type string (varchar in SQL terminology). However, for some reason, refType column is correctly imported as string, but uid column is imported as bytearray. This is how they look:

df.head()
                                             uid  

0 [49, 54, 54, 57, 55, 54, 50, 55, 64, 97, 110]
1 [49, 54, 54, 57, 55, 54, 50, 56, 64, 105, 111]
2 [49, 48, 49, 53, 51, 50, 51, 50, 57, 53, 57, 5...
3 [57, 53, 52, 52, 56, 57, 56, 56, 49, 50, 57, 5...
4 [49, 54, 54, 57, 55, 54, 50, 57, 64, 105, 111]

                                         refType  

0 adx_Facebook.IE_an_ph_u8_-.cc-ch.g-f.au-ret7.c...
1 adx_Facebook.IE_io_ph_u4_-.cc-gb.g-f.au-toppay...
2 ad_nan_1845589538__CAbroadEOScys_-.cc-ca.g-f.a...
3 ad_offerTrialPay-DKlvl10-1009
4 adx_Facebook.IE_io_ph_u4_-.cc-us.g-f.au-topspe...

And this is how uid column is supposed to look:

[i.decode() for i in df['uid'][1:5]]

['16697628@io', '10153232959751867@fb', '954489881295911@fb', '16697629@io']

I don't understand neither why was it converted to bytearray nor how to choose to convert it to string. I couldn't find anything about it or similar questions in internet or pandas documentation. Of course, I can always convert that column to string after importing, but that is not preferred, because the shown sql query is just an example, and in real table there can be hundreds of columns that would be incorrectly imported as bytearrays. It would be real pain in the ass to manually find those columns and convert to string

The connector itself outputs the same bytearray:

cursor = cnx.cursor()
cursor.execute('select id as uid,refType from user where registrationTime>=1451606400 LIMIT 1')
cursor.fetchall()`

[(bytearray(b'16697627@an'), 'adx_Facebook.IE_an_ph_u8_-.cc-ch.g-f.au-ret7.cr-cys.dt-all.csd-291215.-')

The data types of the columns in SQL database are "Varchar(32)" for the first column (uid) and "Varchar(128)" for the second one (refType)

Aldine answered 22/1, 2016 at 10:53 Comment(7)
Possibly an issue with mysql.connector. See dev.mysql.com/doc/relnotes/connector-python/en/news-2-0-0.html. Can you show what the return value is from a plain execute/fetch: cursor = cnx.cursor(); cursor.execute('SELECT ... LIMIT 1'); cursor.fetchall()?Burge
you are right, the connector also gives [(bytearray(b'16697627@an'), 'adx_Facebook.IE_an_ph_u8_-.cc-ch.g-f.au-ret7.cr-cys.dt-all.csd-291215.-')Aldine
Can you try using SQLAlchemy? (create an engine and pass this to read_sql instead of cnx)Burge
Couldn't get SQLAlchemy to work on python3, but I tried read_sql with pymysql connector, and everything worked fine. I guess for me the problem is solved, although it would still be nice to find out what is wrong with mysql.connector, because it is now an official tool from MySQL themselves to connect via pythonAldine
Well, there is not really something wrong, as it is clearly stated in their docs that they return a bytearray instead of a string (for some py2/py3 compat reason ..). And the reason it looks odd in the output has to do with the formatting of a bytearray a pandas object.Burge
Than how come some strings are returned as bytearray, and some are not? (talking about the connector output, not the pandas)Aldine
True, indeed strange. Is the schema specification for the two columns exactly the same? Can you maybe add that as well to your question? But sorry, I am not familiar with mysql-connector, maybe you can raise it there as a bug report.Burge
M
5

Had the same issue with the package "mysql-connector". Installing "mysql-connector-python" instead did the trick for me.

pip install mysql-connector-python
Magnify answered 2/10, 2019 at 8:51 Comment(1)
Same here, using "mysql-connector" only (without the python) converts strings from database to ByteArrays. pip uninstall mysql-connector pip install mysql-connector-python And everything went back to normal.Kwon
S
0

This is strange indeed. I wonder if passing the parameter "coerce_float=False" to read_sql function would help for this situation.

Solstice answered 22/1, 2016 at 11:45 Comment(1)
Tried that, nothing changesAldine
T
0

Maybe try a different approach. Write SQL to CSV file with Python, and then read CSV file into Pandas.

import pyodbc
import csv
import pandas

cnxn = pyodbc.connect('DRIVER={Server Type};SERVER=YourServer;DATABASE=YourDatabase;UID=UserId;PWD=PassWord') 
cursor = cnxn.cursor()
query = cursor.execute("select id as uid,refType from user where registrationTime>=1451606400")
List = {}
for row in cursor.fetchall():
    List.update({row.uid:row.refType})
cnxn.close()

with open('C:\\file.csv', 'wb') as the_file:
    for key,value in CurrentCommentList.items():
        the_file.write(str(key).encode('utf-8') + ','.encode('utf-8') + 
        str(value).encode('utf-8') + '\n'.encode('utf-8'))

pd.read_csv('C:\\file.csv')
Trichite answered 11/7, 2018 at 13:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.