Python pandas to_sql 'append'
Asked Answered
S

2

8

I am trying to send monthly data to a MySQL database using Python's pandas to_sql command. My program runs one month of data at a time and I want to append the new data onto the existing database. However, Python gives me an error:

_mysql_exceptions.OperationalError: (1050, "Table 'cps_basic_tabulation' already exists")

Here is my code for connecting and exporting:

conn = MySQLdb.connect(host     = config.get('db', 'host'),
                       user     = config.get('db', 'user'),
                       passwd   = config.get('db', 'password'),
                       db       = 'cps_raw') 

combined.to_sql(name            = "cps_raw.cps_basic_tabulation",
               con              = conn,
               flavor           = 'mysql', 
               if_exists        = 'append')

I have also tried using:

from sqlalchemy import create_engine

Replacing conn = MySQLdb.connect... with:

engine = mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>

conn   = engine.connect().connection

Any ideas on why I cannot append to a database?

Thanks!

Sima answered 5/11, 2014 at 19:52 Comment(2)
Which version of pandas are you using?Blackcock
Have you tried passing con=engine instead of con=conn?Setsukosett
B
7

Starting from pandas 0.14, you have to provide directly the sqlalchemy engine, and not the connection object:

engine = create_engine("mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>")
combined.to_sql("cps_raw.cps_basic_tabulation", engine, if_exists='append')
Blackcock answered 5/11, 2014 at 20:12 Comment(2)
Thank you. Also @Blackcock thank you. I updated pandas 'sudo pip install --upgrade pandas', between both of these fixes, everything worked. However, I believe the main fix was updating pandas as I tried combinations of connecting to MySQL with no avail.Sima
It is good possible that updating pandas fixed your issue, but I should note that it is however recommended to use the sqlalchemy in any case, as using MySQL connection object directly is deprecated and will be removed in a future pandas version.Blackcock
G
1

Since I had the same error message and stumbled across this post I leave this here for others to find.

I found two ways to solve the duplicated table creation although I lack the insight as to why this solves it:

  1. Either pass the database name in the url when creating a connection
  2. or pass the database name as a schema in pd.to_sql.

Doing both does not hurt. Also, a few years later it is (again?) possible to pass the pure connection to pandas. My guess would be that in the previous answer by joris the first of my solution cases might have implicitly solved the problem.

```
#create connection to MySQL DB via sqlalchemy & pymysql
user = credentials['user']
password = credentials['password']
port = credentials['port']
host = credentials['hostname']
dialect = 'mysql'
driver = 'pymysql'
db_name = 'test_db'

# setup SQLAlchemy   
from sqlalchemy import create_engine 
cnx = f'{dialect}+{driver}://{user}:{password}@{host}:{port}/' 
engine = create_engine(cnx) 

# create database
with engine.begin() as con:
    con.execute(f"CREATE DATABASE {db_name}")

############################################################
# either pass the db_name  vvvv - HERE- vvvv after creating a database
cnx = f'{dialect}+{driver}://{user}:{password}@{host}:{port}/{db_name}'      
############################################################
engine = create_engine(cnx) 

table = 'test_table'
col = 'test_col'
with engine.begin() as con:
    # this would work here instead of creating a new engine with a new link
    # con.execute(f"USE {db_name}")
    con.execute(f"CREATE TABLE {table} ({col} CHAR(1));")

# insert into database
import pandas as pd
df = pd.DataFrame({col : ['a','b','c']})

with engine.begin() as con:
    # this has no effect here
    # con.execute(f"USE {db_name}")
    df.to_sql(
        name= table,
        if_exists='append',
# passing con = cnx here would equally work
        con=con, 
############################################################
# or pass it as a schema vvvv - HERE - vvvv
        #schema=db_name,
############################################################
        index=False
    )```

Tested with python version 3.8.13, sqlalchemy 1.4.32 and pandas 1.4.2. Same problem might have appeared here and here.

Guth answered 15/6, 2022 at 20:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.