I have a Python process that uses SQLAlchemy to insert some data into a MS SQL Server DB. When the Python process runs it hangs during the insert. I turned on SQLAlchemy logging to get some more information. I found that it hangs at this point where SQLAlchemy seems to be requesting table schema info about the entire DB:
2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:12:07 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')
I have other "stuff" going on in the DB at this time, including some open transactions and my guess is that for whatever reason querying [INFORMATION_SCHEMA].[TABLES]
creates some deadlock or blocks somehow.
I've also read (here) that [INFORMATION_SCHEMA].[TABLES]
is a view that cannot cause a deadlock which would contradict my guess of what is causing this issue.
My question is: Can I alter the configuration/settings of SQLAlchemy so that it does not make this query in the first place?
UPDATE 1: The Python code for the insert is like this:
with sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params).connect() as connection:
# df is a Pandas DataFrame
df.to_sql(name=my_table, con=connection, if_exists='append', index=False)
Note that the code works without any problems when I run the Python script during other times of the day when I don't have those other DB transactions going on. In those cases, the log continues immediately like this, listing all the tables in the DB:
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1235) INFO: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) ORDER BY [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._execute_context(base.py:1240) INFO: ('dbo', 'BASE TABLE')
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine._init_metadata(result.py:810) DEBUG: Col ('TABLE_NAME',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table1',)
2020-10-30 08:13:03 [11444:6368] sqlalchemy.engine.base.Engine.process_rows(result.py:1260) DEBUG: Row ('t_table2',)
...
UPDATE 2:
Apparently when a table or other object is created in an open transaction and not committed yet, querying [INFORMATION_SCHEMA].[TABLES]
will get blocked (source). Is anyone familiar with the internals of SQLAlchemy to suggest how to prevent it from making this query in the first place?
UPDATE 3: After posting this issue on the SQLAlchemy github (issue link) the SQLAlchemy devs confirmed that the query of [INFORMATION_SCHEMA].[TABLES] is in fact being caused by the Pandas function to_sql()
.
So, my new question is does anyone know how to disable this behavior in the Pandas to_sql()
function? I looked over the documentation and could not find anything that would seem to help.
ALTER DATABASE MyDatabase SET ALLOW_SNAPSHOT_ISOLATION ON
orALTER DATABASE MyDatabase SET READ_COMMITTED_SNAPSHOT ON
to not lock as aggressively by turning on snapshot isolation. I'm concerned about making these changes on the database level. Do you know of a setting in SQL server to make queries on[INFORMATION_SCHEMA].[TABLES]
not block when an open transaction exists that has created a new table in that database? – Summertreecreate_engine(connection_uri, isolation_level="READ_UNCOMMITTED")
and see if that prevents the hang. – Achromatinisolation_level
and does not hang when I add it.) Perhaps tryengine = create_engine(connection_uri)
and the omit thewith
block and callto_sql
withengine
as the second parameter. That works for me, too. – Achromatintable_names = [r[0] for r in connection.execute(s)]
inget_table_names
ofsqlalchemy.dialects.mssql.base.py
. Just to be clear, when you test this, you have an open transaction on the same database where you have created a new table in that database in the open transaction? That's what I have and as soon as I rollback or commit that transaction in SQL the Python code continues running again. – Summertreeisolation_level="READ_UNCOMMITTED"
option is being used in general. – SummertreeSELECT transaction_isolation_level FROM sys.dm_exec_sessions WHERE session_id = @@SPID;
returns the numeric value.1
means READ_UNCOMMITTED. – Achromatin