Here's the function that takes df, schemaname of the table, name of the table, the column that you want to use as a conflict in the name of conflict, and the engine created by create_engine of sqlalchemy. It updates the table with respect to conflict column. This is extended solution for the solution of @Ionut Ticus .
Don't use pandas.to_sql() together. pandas.to_sql() destroys the primary key setting. In this case, one need to set primary key by the ALTER query, which is a suggestion by the function below. Primary key does not necessarily be destroyed by pandas, one might haven't been set it. Error would be in that case:
there is no unique constraint matching given keys for referenced table? Function will suggest you to execute below.
engine.execute('ALTER TABLE {schemaname}.{tablename} ADD PRIMARY KEY ({conflictcolumn});
Function:
def update_query(df,schemaname,tablename,conflictcolumn,engine ):
"""
This function takes dataframe as df, name of schema as schemaname,name of the table to append/add/insert as tablename,
and column name that only other elements of rows will be changed if it's existed as conflictname,
database engine as engine.
Example to engine : engine_portfolio_pg = create_engine('postgresql://pythonuser:vmqJRZ#[email protected]/cetrm_portfolio')
Example to schemaname,tablename : weatherofcities.sanfrancisco , schemaname = weatherofcities, tablename = sanfrancisco.
"""
excluded = ""
columns = df.columns.tolist()
deleteprimary = columns.copy()
deleteprimary.remove(conflictcolumn)
excluded = ""
replacestring = '%s,'*len(df.columns.tolist())
replacestring = replacestring[:-1]
for column in deleteprimary:
excluded += "EXCLUDED.{}".format(column)+","
excluded = excluded[:-1]
columns = ','.join(columns)
deleteprimary = ','.join(deleteprimary)
insert_sql = """ INSERT INTO {schemaname}.{tablename} ({allcolumns})
VALUES ({replacestring})
ON CONFLICT ({conflictcolumn}) DO UPDATE SET
({deleteprimary}) = ({excluded})""".format( tablename = tablename, schemaname=schemaname,allcolumns = columns, replacestring= replacestring,
conflictcolumn= conflictcolumn,deleteprimary = deleteprimary, excluded=excluded )
conn = engine.raw_connection()
conn.autocommit = True
#conn = engine.connect()
cursor = conn.cursor()
i = 0
print("------------------------"*5)
print("If below error happens:")
print("there is no unique constraint matching given keys for referenced table?")
print("Primary key is not set,you can execute:")
print("engine.execute('ALTER TABLE {}.{} ADD PRIMARY KEY ({});')".format(schemaname,tablename,conflictcolumn))
print("------------------------"*5)
for index, row in df.iterrows():
cursor.execute(insert_sql, tuple(row.values))
conn.commit()
if i == 0:
print("Order of Columns in Operated SQL Query for Rows")
columns = df.columns.tolist()
print(insert_sql%tuple(columns))
print("----")
print("Example of Operated SQL Query for Rows")
print(insert_sql%tuple(row.values))
print("---")
i += 1
conn.close()