SQLite: what are the practical limits?

Asked 12/3, 2012 at 13:44 Answered 12/5, 2016 at 20:41

Before you mark this question as duplicate, PLEASE HEAR ME OUT!!
I have already read the questions asked here on how to improve performance e.g. just to mention a few Improve INSERT-per-second performance of SQLite? and What are the performance characteristics of sqlite with very large database files?

I am struggling to make sqlite work with database file size of 5 gigabytes. On the contrary there are people out there, who claim that sqlite works 'great' for them even when database size is as big as 160 GB. I have not tried it myself but from the questions asked, I guess that all the bench-marking is perhaps done with only table in the database.

I am using a database with
- 20 or so tables
- Half of the tables have more than 15 columns
- Each of these 15-or-so-column-tables have 6/7 foreign key columns - A few of these table have already grown to have 27 million records with in a month

The development machine that I am using is 3 GHz Quad core machine with 4 gigs of RAM and yet it takes more than 3 minutes just to query the row_count in these big tables.

I couldn't find any way to partition the data horizontally. The best shot that I have is to split the data across multiple database files one for each table. But in that case, as far as I know, the foreign key column constraints can't be employed, so I will have to create a self sufficient table (without any foreign keys).

So my questions are
a) Am I using the wrong database for the job?
b) What do you think where am I going wrong?
c) I have not added indexes on foreign keys yet but if just row count query takes four minutes how are indexes of foreign keys going to me help me?

EDIT To provide more information even though no one has asked for it :) I am using SQLite version 3.7.9 with system.data.sqlite.dll version 1.0.77.0

EDIT2: I THINK where I am going different from the 160 gig guys is that they may select an individual record or a small range of records. But I have to load all of the 27 millions rows in my table, join them against another tables, group the records as asked by the User and return the results. Any input on whats the best way to optimize the database for such results.

I can't cache the results of a previous query as it does not make sense in my case. The chances of hitting the cache will are fairly low.

Deodorant answered 12/3, 2012 at 13:44 Comment(9)

If others are saying that 160GB databases run just fine, clearly it must be something you're doing, but you're not telling us how you're doing things, except to say that you have no indexes on foreign keys. Have you tried indexing the foreign keys? – Justifier 12/3, 2012 at 13:46

As per the questions asked I guess that 160 Gb databases were using only one table. No, I have not added indexes on foreign keys yet, because even when I run a query where foreign keys are no involved i.e. 'Select count(*) from some_table', sqlite takes minutes to return the result of this query. I will add indexes to foreign keys and get back. Please let me know what more information would you like to know. – Deodorant 12/3, 2012 at 13:56

Do you have indexes on the tables at all? – Justifier 12/3, 2012 at 13:59

reading millions of rows from the hard disk could take a while, but subsequent readings should be a lot faster, depending on the available memory of the system. – Tendentious 12/3, 2012 at 14:0

Yes, the fields that are most commonly used in queries e.g. date time, they are indexed. I didn't index everything is the first place as it should slow down the inserts. – Deodorant 12/3, 2012 at 14:1

@Nick Dandoulakis: What i have noticed is that there are too many page faults going on for my process even though the memory usage of my process is not that high and according to task manager I have still have around 1 gigabyte of free memory – Deodorant 12/3, 2012 at 14:3

the one thing that you might try that had reduced the runtime of some sqlite queries for me is PRAGMA cache_size=16000 (this reduced a process that was taking over 20 mins to under 2 mins) you could use a higher value if needed but be warned that value is in pages so the memory used is cache_size in pages times page_size_ in bytes_ – Hoxie 12/3, 2012 at 17:36

I am already using the following PRAGMA directives. PRAGMA journal_mode=WAL; PRAGMA page_size=4096; PRAGMA cache_size=10000; PRAGMA locking_mode=EXCLUSIVE; PRAGMA synchronous=NORMAL; PRAGMA count_changes=OFF; PRAGMA temp_store=MEMORY; The thing that really bugs me is that if I run a query on one of the bigger tables the I/O reads for my process never come to a stand still. – Deodorant 13/3, 2012 at 9:59

#14452124 | #2778454 – Erebus 24/10, 2016 at 16:7

There's a lot to consider here, but my first bit of advice would be not to take other's performance statistics at face value. Database performance is dependent on a lot of things, including how your database is structured, the complexity of your queries, which indexes you have defined (or not), and often just the sheer amount of data in them. A lot of reported performance numbers comes from a lot of trial and error, and/or matching the database to the job at hand. To say it another way, the performance you're going to get from any DBMS can't plainly be compared to another application's performance unless your datasets and structures are damn near identical - they are certainly a guide, and perhaps an ideal to strive for, but you're not necessarily going to get insane performance "out of the box."

I would, as a starting point, start indexing the data on those really large tables (looks, from the comments, that you've got that), and see what happens. Granted, the count taking four minutes is a pretty long time, but don't stop there. Add some indexes, change them around, ask if you're storing data that you don't need to store, and look at other database queries, not just the count query, to judge performance. Look for other apps and blog posts that use SQLite for large numbers of rows, and see what they've done to address it (which may include changing databases). Basically, try stuff - and then make a judgement. Don't let the initial fear stop you, thinking that you're going down the wrong path. Maybe you are, maybe you aren't, but don't just stop with the COUNT query. Any way you slice it, 27 million records in a table is a crap ton.

Finally, one specific piece of advice is this: in SQLite, don't split up the database into multiple files - I don't see that helping, because then you're going to have to do a lot of extra query work, and then manually join your separate tables after the results return from multiple queries. This is reinventing what the RDBMS does for you, and it a crazy idea. You're not going to somehow figure out a way to do joins faster than the creators of the RDBMS system - you'd definitely be wasting time there.

Gimlet answered 12/3, 2012 at 14:7 Comment(2)

Can you please explain what do you mean by slicing the table? As far as I know sqlite doesn't inherently support any horizontal partitioning. – Deodorant 12/3, 2012 at 14:23

I wasn't referring to the database, it's just a figure of speech. When I say "any way you slice [this problem]..." I just mean "any way you approach this problem, 27 million records in a table is a lot." – Gimlet 12/3, 2012 at 14:27

select count(*) in SQLite will always be slower when comparing to other DMBS, because it does a table scan for that particular request. It doesn't have a statistic table to help out. That doesn't mean your application queries will be slow. You need to test your queries to really tell what you can expect.

Some general guidelines: Indexing is an absolute must, because navigating a subset of data in a binary tree is a lot faster than traversing an entire table when enormous size is involved. To help load time, you should sort your data for an unique index, and if you don't have an unique index then the largest index. If you can drop the indices before loading and put it back after, it will be faster. If these techniques couldn't meet your operating and SLA parameters, then it's time to do horizontal partitioning, and use "attach" to span across the data range you need. SQLite can support up to 10 attaches. I know some say partitioning is the job of the tool, not developers, but when you are facing physical limits you have to roll up your sleeves or perhaps choose a commercial tool that's doing it under the cover for you.

Dentition answered 6/10, 2014 at 20:3 Comment(0)

-1

If you have 50MB or more db directly deployed on client side, it means you do something wrong. Try migrate to server side while storing key - important value at the client. ( just references) You will not have real time, but at least it will produce an appropriate solution. "Server side" is an answer to your question, that is if you drop , or optimize the real time requirements, because that's what you have ( based on your description). In any case. SQLite can handle almost anything, but from personal experience, just try to keep things simple as possible even in the cost of real time result.

Boulanger answered 12/5, 2016 at 20:41 Comment(1)

One size does not fit all. There are plenty of scenarios where a large client-side database is perfectly reasonable. – Tryck 28/7, 2018 at 22:21

Recommended topics

Hot tags