How can pandas.read_sql_query() query a TEMP table?

Asked 9/10, 2014 at 19:37 Answered 5/3, 2020 at 21:24

Solved sql python-2.7 pandas sqlalchemy netezza

I'm in the process of converting Python code over to the new SQLAlchemy-based Pandas 0.14.1.

A common pattern we use is (generically):

connection = db.connect()  # open connection/session

sql = 'CREATE TEMP TABLE table1 AS SELECT ...'
connection.execute(sql)

... other sql that creates TEMP tables from various joins of previous TEMP tables ...

sql = 'CREATE TEMP TABLE tableN AS SELECT ...'
connection.execute(sql)

result = connection.query('SELECT * FROM tableN WHERE ...')

connection.close()

Now, once the connection is closed the TEMP tables are purged by the DB server. However, as the final select query is using the same connection/session, it can access the tables.

How can I achieve similar using SQLAlchemy and pd.read_sql_query() ?

For example:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql_query('SELECT * FROM tmptable WHERE ...', engine)

yields a DB error that the TEMP table tmptable doesn't exist. Presumably this is because passing the engine to the read_sql_query() requires it to open a new connection which has an independent session scope and hence can't see the TEMP table. Is that a reasonable assumption?

Is there a way to work around that? (passing the connection to read_sql_query() isn't supported)

(I know that I can concatenate the SQL into a single string with ; separating the statements, but this is a simplification of the actual situation where the TEMP tables are created by a multitude of functions which call others nesting 3-4 deep. So, to achieve that would require implementing a layer than can coalesce the SQL across multiple calls before issuing it, which I'd rather avoid implementing if there is a nicer way)

Using -
Pandas: 0.14.1
sqlalchemy: 0.9.7
pyodbc: 3.0.6
Win7 x86_64 Canopy Python distribution (Python 2.7.6)
Josh Kuhn's Netezza SQLAlchemy dialect from https://github.com/deontologician/netezza_sqlalchemy

Lexicography answered 9/10, 2014 at 19:37 Comment(4)

Can you open a github issue for this as an enhancement request? See github.com/pydata/pandas/issues – Bradski 10/10, 2014 at 11:21

See further github.com/pydata/pandas/issues/8533 – Bradski 10/10, 2014 at 17:27

Hi DavidJ, according to the accepted answer, do you know when the connection will be closed and thus the temp table deleted? Like, if I would like to use that temp table in multiple pd.read_sql() queries how would I do? Thanks! – Tackling 22/6, 2019 at 6:19

@Tackling The connection would need to be explicitly closed with something like: connection.close() so you should be able to query the temp table for as long as your session is still active. – Coagulate 6/5, 2020 at 21:43

You can now pass SQLAlchemy connectable to pandas.read_sql. From the docs:

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None)

...

con : SQLAlchemy connectable (engine/connection) or database string URI

or DBAPI2 connection (fallback mode)

Using SQLAlchemy makes it possible to use any DB supported by that > library. If a DBAPI2 object, only sqlite3 is supported.

So, this should work:

engine = sqlalchemy.create_engine('netezza://@mydsn')
connection = engine.connect()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
connection.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=connection)

Coagulate answered 28/12, 2018 at 0:26 Comment(0)

All you need to do is add 'SET NOCOUNT ON' at the beginning of your query, that way pandas read_sql will read everything as one statement.

sSQL = '''SET NOCOUNT ON
CREATE TABLE ...... '''

Carolynecarolynn answered 5/3, 2020 at 21:24 Comment(1)

Carlos, I use this all the time for complex multi-statement queries on MS SQL Server. Thanks for pointing out such a simple but effective setting. – Stercoricolous 30/7, 2021 at 14:48

You are using Python and Netezza, I was using R and SQL Server, so this might be different. In my script, I ran into a similar issue. sp_execute_external_script in T-SQL which allows for external code to run on the database only allows for select statements. This was burdensome for me because I wanted to run a stored procedure to create a temp table to select from. Alternatively, I could use common table expressions, unions, etc. It might be worth further investigation.

Dealfish answered 14/12, 2016 at 22:40 Comment(0)

I understand the issue, but is creating regular tables not working? You could come up with a convention such as CREATE TABLE TEMP_t1' etc., andDROP` them at the end of your session.

Shred answered 20/8, 2018 at 20:50 Comment(0)

As @ssharma says, you can now pass SQLAlchemy connectable to pandas.read_sql. If you create the session with the session maker, you need the connection object.

For reading the uncommited changes you have to use the same connection like this:

engine = sqlalchemy.create_engine('netezza://@mydsn')
session = sessionmaker(bind=self.engine)()

sql = 'CREATE TEMP TABLE tmptable AS SELECT ...'
session.execute(sql)

result = pd.read_sql('SELECT * FROM tmptable WHERE ...', con=session.connection())

Flieger answered 26/6, 2019 at 9:50 Comment(0)

Recommended topics

Hot tags