Is it normal sqlite.fetchall() is slow?

Asked 26/4, 2012 at 15:24 Answered 30/10, 2022 at 20:11

I have an SQL query that selects from two inner joined tables. The select statement takes about 50 seconds. However, the fetchall() takes 788 seconds for only 981 results:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                    "FROM spectrum AS s "+
                    "INNER JOIN feature AS f "+
                    "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                    "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                                 "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                    "ON t.feature_feature_table_id = f.feature_table_id "+
                    "WHERE s.msrun_msrun_id = ? "+
                    "AND s.scan_start_time >= t.rtMin "+
                    "AND s.scan_start_time <= t.rtMax "+
                    "AND base_peak_mz >= t.mzMin "+
                    "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'
    
time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

Is there a reason why fetchall() takes so long? Doing:

while 1:
    info = self.cursor.fetchone()
    if info:
        <do something>
    else:
        break

is just as slow as :

allInfo = self.cursor.fetchall()         
for info in allInfo:
    <do something>

Organzine answered 26/4, 2012 at 15:24 Comment(1)

I would try to move as many of the strict equality constraints as possible into the INNER JOIN's ON clause, because that would (hopefully) minimize the size of the heap hitting the WHERE section (inequalities usually get loaded to memory for a full heap scan, at which point your timescale is determined by heap size, not index -- this is annoying and perhaps unintuitive, but mostly unavoidable due to the non-deterministic nature of inequality query result size). – Dryasdust 14/5, 2012 at 18:59

By default fetchall() is as slow as looping over fetchone() due to the arraysize of the Cursor object being set to 1.

To speed things up you can loop over fetchmany(), but to see a performance gain, you need to provide it with a size parameter bigger than 1, otherwise it'll fetch "many" by batches of arraysize, i.e. 1.

It is quite possible that you can get the performance gain simply by raising the value of arraysize, but I have no experience doing this, so you may want to experiment with that first by doing something like:

>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> cu = conn.cursor()
>>> cu.arraysize
1
>>> cu.arraysize = 10
>>> cu.arraysize
10

More on the above here: http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.fetchmany

Housebreak answered 26/4, 2012 at 20:50 Comment(1)

I accepted this because it answer my question, but I can't get it to speed up using fetchmany. Thanks though. – Organzine 2/5, 2012 at 11:13

-1

If you are running a complex query, try to create a table using that complex query and then run a select on the created table so the performance is good.

Graveyard answered 30/10, 2022 at 20:11 Comment(0)

Recommended topics

Hot tags