Pandas and SQLAlchemy: renaming columns during join
Asked Answered
U

1

1

I have table A and table B. Both have a column id and a column name. When I use pd.read_sql() to convert the result of a SQLAlchemy query to a pandas DataFrame, the resulting DataFrame has two columns named id and two columns named name.

The join is executed on the id column, therefore, even if there are two id columns, there won't be any ambiguity since both columns contain the same values. I can simply drop one of the column.

The two columns named name represent an issue because they are not identical: column name of table A represents name of an entity A, while column name of table B represents name of an entity B. At this point I won't know for sure which of the two columns of the DataFrame comes from table A and which from table B. Is there any way to solve this by, for instance, adding a prefix to the column names? More in general, is there any way to exploit the practical pd.from_sql() in this situation?

my_dataframe = pd.read_sql(
            session.query(TableA, TableB)
                .join(TableB)
                .statement,
            session.bind)

Note: in this question I am trying to simplify the structure of a more complex preexisting Postgres database. Therefore, it won't be possible to alter the structure of the database.

Unwish answered 26/6, 2020 at 0:28 Comment(1)
Should be pretty simple to set up a sqlite database with the features you describe. Any chance we could get an MRE?Cloaca
U
1

The solution was actually really simple, but you have to rename each single field:

my_dataframe = pd.read_sql(
            session.query(TableA.field1.label('my_new_name1'),
                          TableA.field2.label('my_new_name2'),
                          TableB.field1.label('my_other_name2'))
                .join(TableB)
                .statement,
            session.bind)
Unwish answered 3/7, 2020 at 23:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.