PostgreSQL connection closes unexpectedly when doing a large insert
Asked Answered
D

3

6

I'm populating a PostgreSQL table with ~11.000.000 rows that have been selected before from another database. I'm using Python and psycopg2. The whole process takes an estimated 1.5 hours to complete. However, after ~30 minutes I get "connection closed unexpectedly" exception. The source code looks like this:

incursor = indb.cursor()
incursor.execute("SELECT ...")
indb.commit() # (1) close transaction
outcursor = outdb.cursor()
rows = 0
for (col1, col2, col3) in incursor: # incursor contains ~11.000.000 rows
    outcursor.execute("INSERT ...", (col1, col2, col3)) # This fails after ~30 minutes
    row += 1
    if row % 100 == 0: # (2) Write data every 100 rows
         outcursor.close()
         outdb.commit()
         outcursor = outdb.cursor()
incursor.close()
outcursor.close()
outdb.commit()

I inserted (1) and (2) after the first tries that failed, assuming that an open transaction has an upper time limit of ~30 minutes or that a cursor has an upper limit of pending inserts. It seems that none of this assumptions are true and the error lies somewhere else.

Both databases are stored on a VirtualBox machine that I connect via port forwarding from the host. I run the program on the host machine.

Both database are just for testing purposes and they have no other connections to manage. Maybe I have to rewrite the problem to get around this, but I need very time-consuming inserts elsewhere (running approx. for days) so I'm very concerned about some hidden time limits in psycopg2 or PostgreSQL.

Determiner answered 23/2, 2011 at 10:45 Comment(5)
I thing that the problem might be in your work_mem variable in configuration. AFAIK this variable sets maximum memory allowed for one connection. Check the logs there should be an entry about whats wrongLibrium
But then the SELECT statement wouldn't work at all, doesn't it? But i loose connection to outdb.Determiner
Use COPY or larger transactions. Executing just 100 records within a single transaction, gives you about 110.000 transactions to complete the entire job. A single 7400rpm drive can only handle 120 commits per second (unless it lies because of cache, that would makes it unreliable). Your current problem sounds like a network issue.Foran
Are you using a connection pool? Some pools have a default configuration that will close a connection if it hasn't been returned after a certain amount of time to avoid connection leaking.Deaminate
work_mem does not set the maximum memory for one connection--the default is just 1MB.Mane
W
5

I don't know of any such "hidden" timeout in postgresql itself. PostgreSQL does have statement_timeout, but if you hit that you should get a ERROR: canceling statement due to statement timeout in the server log (and it will log the cancelled statement too). I can't speak for psycopg2. Definitely check the server log for anything that looks relevant.

Maybe it's a networking issue? A long-running statement will be a TCP connection that stays idle for a long time. Perhaps your port forwarding purges connections that are idle for more than 30 minutes? Maybe your TCP connections aren't using keepalive. Postgresql has some settings for tuning TCP keepalive (tcp_keepalives_interval etc.) and you may also need to do some kernel/networking configuration to make sure they are actually enabled.

e.g. I just tried connecting to my own machine here and tcp_keepalives_interval defaults to 7200, which is 2 hours. If your port forwarding cuts off after 30 minutes, this default won't do. You can override the setting used in the client connection string (assuming you can twiddle the conninfo string directly), or set the GUC variable in user/database properties or postgresql.conf.

See:

Wynnie answered 23/2, 2011 at 11:11 Comment(0)
I
0

To insert millons of rows, I'd look through the official guide to populating a db and consider using copy.

Insert answered 23/2, 2011 at 11:14 Comment(0)
Q
0

I have a django admin commands which updates thousands of thousands of rows. After some time, I see the same error. I believe the memory usage exceeds the limit. Don't know how to manually control the transaction in commands, though.

Quemoy answered 30/7, 2011 at 17:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.