How can I speed up fetching the results after running an sqlite query?
Asked Answered
O

1

2

As an answer on my question: Is it normal that sqlite.fetchall() is so slow? it seems that fetch-all and fetch-one can be incredibly slow for sqlite.

As I mentioned there, I have the following query:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                "FROM spectrum AS s "+
                "INNER JOIN feature AS f "+
                "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                             "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                "ON t.feature_feature_table_id = f.feature_table_id "+
                "WHERE s.msrun_msrun_id = ? "+
                "AND s.scan_start_time >= t.rtMin "+
                "AND s.scan_start_time <= t.rtMax "+
                "AND base_peak_mz >= t.mzMin "+
                "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'

time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

The execution of the select statement takes about 50 seconds (acceptable). However, the fetchall() takes 788 seconds, only fetching 981 results.

The way proposed to speed up the query given as answer to my question: Is it normal that sqlite.fetchall() is so slow? using fetchmany(), has not improved the speed of fetching the results.

How can I speed up fetching the results after running an sqlite query?


The sql exactly as I tried to execute it on command line:

sqlite> SELECT spectrum_id, feature_table_id
   ...> FROM spectrum AS s 
   ...> INNER JOIN feature AS f 
   ...> ON f.msrun_msrun_id = s.msrun_msrun_id 
   ...> INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax 
   ...> FROM convexhull GROUP BY feature_feature_table_id) AS t 
   ...> ON t.feature_feature_table_id = f.feature_table_id 
   ...> WHERE s.msrun_msrun_id = 1
   ...> AND s.scan_start_time >= t.rtMin
   ...> AND s.scan_start_time <= t.rtMax
   ...> AND base_peak_mz >= t.mzMin
   ...> AND base_peak_mz <= t.mzMax;

update:

So I started running the query on the commandline about 45 minutes ago, and it's still busy, so it's also very slow using the commandline.

Ortiz answered 2/5, 2012 at 11:14 Comment(14)
How much time does the same query take when executing it through the client?Hatten
Also, what sqlite3 python module are you using? What version? What is the sqlite3 version used by the module?Hatten
I'm using sqlite module: 2.6.3 and sqlite version: 3.7.10. I'm trying to execute the command through SQLite manager but it doesn't seem to be able to cope with it.Ortiz
Just use the client. Are you on Windows or UNIX/Linux?Hatten
I get an sql error: SQL error: malformed database schema - near "NO": syntax errorOrtiz
can you add that sql exactly as you tried to execute it?Hatten
just to be sure, what does sqlite> .version return?Hatten
sqlite> .version unknown command or invalid arguments: "version". Enter ".help" for helpOrtiz
When you start the client, what version does it print?Hatten
can you put your hands on a more recent client, perhaps the latest version (3.7.11)?Hatten
I'm running the 3.7.10 version with python by importing _sqlite.so (as described https://mcmap.net/q/340545/-force-python-to-forego-native-sqlite3-and-use-the-installed-latest-sqlite3-version). I have no idea how to run that via commandline, and I can't install a newer version in a different way.Ortiz
You don't have to install, you can just grab/compile some binaries from sqlite.org/download.html, and use them as-is in you own home directory. The same goes for using some other python module.Hatten
Ok got it now, I'm running the query will update when it's finishedOrtiz
got the same problem. I could speed up the selection using indexing. But fetching data remain desperately slow even using fetchmany, apsw, sqlite3 version 3.11. The last alternative was to store each column into a binary file, load it with numpy.fromfile, and select data in python.Kittiekittiwake
H
2

From reading this question, it sounds like you could benefit from using the APSW sqlite module. Somehow you may be victim of your sqlite module causing your query to be executed in some less performant manner.

I was curious so I tried using apsw myself. It wasn't too complicated. Why don't you give it a try?

To install it I had to:

  1. Extract the latest version.
  2. Have the installation package fetch the latest sqlite amalgamation.

    python setup.py fetch --sqlite
    
  3. Build and install.

    sudo python setup.py install
    
  4. Use it in place of the other sqlite module.

    import apsw
    <...>
    conn = apsw.Connection('foo.db')
    
Hatten answered 2/5, 2012 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.