I'm populating a PostgreSQL table with ~11.000.000 rows that have been selected before from another database. I'm using Python and psycopg2. The whole process takes an estimated 1.5 hours to complete. However, after ~30 minutes I get "connection closed unexpectedly" exception. The source code looks like this:
incursor = indb.cursor()
incursor.execute("SELECT ...")
indb.commit() # (1) close transaction
outcursor = outdb.cursor()
rows = 0
for (col1, col2, col3) in incursor: # incursor contains ~11.000.000 rows
outcursor.execute("INSERT ...", (col1, col2, col3)) # This fails after ~30 minutes
row += 1
if row % 100 == 0: # (2) Write data every 100 rows
outcursor.close()
outdb.commit()
outcursor = outdb.cursor()
incursor.close()
outcursor.close()
outdb.commit()
I inserted (1)
and (2)
after the first tries that failed, assuming that an open transaction has an upper time limit of ~30 minutes or that a cursor has an upper limit of pending inserts. It seems that none of this assumptions are true and the error lies somewhere else.
Both databases are stored on a VirtualBox machine that I connect via port forwarding from the host. I run the program on the host machine.
Both database are just for testing purposes and they have no other connections to manage. Maybe I have to rewrite the problem to get around this, but I need very time-consuming inserts elsewhere (running approx. for days) so I'm very concerned about some hidden time limits in psycopg2
or PostgreSQL.
outdb
. – DeterminerCOPY
or larger transactions. Executing just 100 records within a single transaction, gives you about 110.000 transactions to complete the entire job. A single 7400rpm drive can only handle 120 commits per second (unless it lies because of cache, that would makes it unreliable). Your current problem sounds like a network issue. – Foran