Build a dynamic update query in psycopg2
Asked Answered
P

6

16

I have to construct a dynamic update query for postgresql. Its dynamic, because beforehand I have to determine which columns to update.

Given a sample table:

create table foo (id int, a int, b int, c int)

Then I will construct programmatically the "set" clause

_set = {}
_set['a'] = 10
_set['c'] = NULL

After that I have to build the update query. And here I'm stuck. I have to construct this sql Update command:

update foo set a = 10, b = NULL where id = 1

How to do this with the psycopg2 parametrized command? (i.e. looping through the dict if it is not empty and build the set clause) ?

UPDATE

While I was sleeping I have found the solution by myself. It is dynamic, exactly how I wanted to be :-)

create table foo (id integer, a integer, b integer, c varchar)

updates = {}
updates['a'] = 10
updates['b'] = None
updates['c'] = 'blah blah blah'
sql = "upgrade foo set %s where id = %s" % (', '.join("%s = %%s" % u for u in updates.keys()), 10)
params = updates.values()
print cur.mogrify(sql, params)
cur.execute(sql, params)

And the result is what and how I needed (especially the nullable and quotable columns):

"upgrade foo set a = 10, c = 'blah blah blah', b = NULL where id = 10"
Prop answered 13/1, 2016 at 18:33 Comment(1)
while sleeping? :)Antipathetic
I
12

Using psycopg2.sql – SQL string composition module

The module contains objects and functions useful to generate SQL dynamically, in a convenient and safe way.

from psycopg2 import connect, sql

conn = connect("dbname=test user=postgres")

upd = {'name': 'Peter', 'age': 35, 'city': 'London'}
ref_id = 12

sql_query = sql.SQL("UPDATE people SET {data} WHERE id = {id}").format(
    data=sql.SQL(', ').join(
        sql.Composed([sql.Identifier(k), sql.SQL(" = "), sql.Placeholder(k)]) for k in upd.keys()
    ),
    id=sql.Placeholder('id')
)
upd.update(id=ref_id)
with conn:
    with conn.cursor() as cur:
        cur.execute(sql_query, upd)
conn.close()

Running print(sql_query.as_string(conn)) before closing connection will reveal this output:

UPDATE people SET "name" = %(name)s, "age" = %(age)s, "city" = %(city)s WHERE id = %(id)s
Iconography answered 22/1, 2020 at 8:17 Comment(0)
I
11

There is actually a slightly cleaner way to make it, using the alternative column-list syntax:

sql_template = "UPDATE foo SET ({}) = %s WHERE id = {}"
sql = sql_template.format(', '.join(updates.keys()), 10)
params = (tuple(addr_dict.values()),)
print cur.mogrify(sql, params)
cur.execute(sql, params)
Ilk answered 10/1, 2017 at 9:59 Comment(3)
this is good, but I also recommend using %s for the id value as well so that you can prevent sql injection with psycopg2 - just in case this value is coming from some unknown clientLenticularis
True story phouse!Ilk
Thanks, this is exactly what I needed. Weird that this is so poorly documented. After looking deeper into this, it appears to be part of the main SQL spec and should be usable in any SQL language that conforms to spec.Kutch
C
1

No need for dynamic SQL. Supposing a is not nullable and b is nullable.

If you want to update both a and b:

_set = dict(
    id = 1,
    a = 10,
    b = 20, b_update = 1
)
update = """
    update foo
    set
        a = coalesce(%(a)s, a), -- a is not nullable
        b = (array[b, %(b)s])[%(b_update)s + 1] -- b is nullable
    where id = %(id)s
"""
print cur.mogrify(update, _set)
cur.execute(update, _set)

Output:

update foo
set
    a = coalesce(10, a), -- a is not nullable
    b = (array[b, 20])[1 + 1] -- b is nullable
where id = 1

If you want to update none:

_set = dict(
    id = 1,
    a = None,
    b = 20, b_update = 0
)

Output:

update foo
set
    a = coalesce(NULL, a), -- a is not nullable
    b = (array[b, 20])[0 + 1] -- b is nullable
where id = 1
Cohdwell answered 13/1, 2016 at 19:57 Comment(3)
Unfortunately I have to use a dynamic query as my real table has 30+ columns and I only want to update those columns which have been updated (if any). And of course I have to take care of varchar columns which should be quoted. But I cannot automatically quote every values because for example a value for a nullable column would become 'None' or 'NULL'Prop
@Gabor: I think you do not understand how it works. I will reply later or tomorrow.Cohdwell
OK. I'm looking for the solution. But I do not want to have the complete count of columns within the _set dict or within the update string. And of course I have to take care of the quoted columns, like if it would be a nullable varchar column.Prop
S
1

An option without python format using psycopg2's AsIs function for column names (although that doesn't prevent you from SQL injection over column names). Dict is named data:

update_statement = f'UPDATE foo SET (%s) = %s WHERE id_column=%s'
columns = data.keys()
values = [data[column] for column in columns]
query = cur.mogrify(update_statement, (AsIs(','.join(columns)), tuple(values), id_value))
Silvanus answered 18/12, 2019 at 12:58 Comment(0)
W
0

Here's my solution that I have within a generic DatabaseHandler class that provides a lot of flexibility when using pd.DataFrame as your source.

    def update_data(
        self,
        table: str,
        df: pd.DataFrame,
        indexes: Optional[list] = None,
        column_map: Optional[dict] = None,
        commit: Optional[bool] = False,
    ) -> int:
        """Update data in the media database

        Args:
            table (str): the "tablename" or "namespace.tablename"
            df (pandas.DataFrame): dataframe containing the data to update
            indexes (list): the list of columns in the table that will be in the WHERE clause of the update statement.
                If not provided, will use df indexes.
            column_map (dict): dictionary mapping the columns in df to the columns in the table
                columns in the column_map that are also in keys will not be updated
                Key = df column.
                Value = table column.
            commit (bool): if True, the transaction will be committed (default=False)

            Notes:
                If using a column_map, only the columns in the data_map will be updated or used as indexes.
                Order does not matter. If not using a column_map, all columns in df must exist in table.

        Returns:
            int : rows updated
        """
        try:
            if not indexes:
                # Use the dataframe index instead
                indexes = []
                for c in df.index.names:
                    if not c:
                        raise Exception(
                            f"Dataframe contains indexes without names. Unable to determine update where clause."
                        )
                    indexes.append(c)

            update_strings = []
            tdf = df.reset_index()
            if column_map:
                target_columns = [c for c in column_map.keys() if c not in indexes]
            else:
                column_map = {c: c for c in tdf.columns}
                target_columns = [c for c in df.columns if c not in indexes]

            for i, r in tdf.iterrows():
                upd_params = ", ".join(
                    [f"{column_map[c]} = %s" for c in target_columns]
                )
                upd_list = [r[c] if pd.notna(r[c]) else None for c in target_columns]
                upd_str = self._cur.mogrify(upd_params, upd_list).decode("utf-8")

                idx_params = " AND ".join([f"{column_map[c]} = %s" for c in indexes])
                idx_list = [r[c] if pd.notna(r[c]) else None for c in indexes]
                idx_str = self._cur.mogrify(idx_params, idx_list).decode("utf-8")

                update_strings.append(f"UPDATE {table} SET {upd_str} WHERE {idx_str};")
            full_update_string = "\n".join(update_strings)
            print(full_update_string)  # Debugging
            self._cur.execute(full_update_string)
            rowcount = self._cur.rowcount
            if commit:
                self.commit()
            return rowcount
        except Exception as e:
            self.rollback()
            raise e

Example usages:

>>> df = pd.DataFrame([
    {'a':1,'b':'asdf','c':datetime.datetime.now()}, 
    {'a':2,'b':'jklm','c':datetime.datetime.now()}
])

>>> cls.update_data('my_table', df, indexes = ['a'])
UPDATE my_table SET b = 'asdf', c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1;
UPDATE my_table SET b = 'jklm', c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2;

>>> cls.update_data('my_table', df, indexes = ['a','b'])
UPDATE my_table SET c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1 AND b = 'asdf';
UPDATE my_table SET c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2 AND b = 'jklm';

>>> cls.update_data('my_table', df.set_index('a'), column_map={'a':'db_a','b':'db_b','c':'db_c'} )
UPDATE my_table SET db_b = 'asdf', db_c = '2023-01-17T22:13:37.095245'::timestamp WHERE db_a = 1;
UPDATE my_table SET db_b = 'jklm', db_c = '2023-01-17T22:13:37.095250'::timestamp WHERE db_a = 2;

Note however that this is not safe from SQL injection due to the way it generates the where clause.

Wolgast answered 17/1, 2023 at 22:16 Comment(0)
J
0

Below is update to @Gabor answer if to be run under python v3.10, otherwise will get error 'TypeError: 'dict_values' object does not support indexing' :

create table foo (id integer, a integer, b integer, c varchar)

updates = {}
updates['a'] = 10
updates['b'] = None
updates['c'] = 'blah blah blah'
sql = "upgrade foo set %s where id = %s" % (', '.join("%s = %%s" % u for u in updates.keys()), 10)
params = list(updates.values())
print(cur.mogrify(sql, params))
cur.execute(sql, params)
Jettiejettison answered 15/1 at 7:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.