I am using SQLite3 in one of my projects and I need to ensure that the rows that are inserted into a table are unique with regard to a combination of some of their columns. In most cases the rows inserted will differ in that respect, but in case of a match the new row must update/replace the existing one.
The obvious solution was to use a composite primary key, with a conflict clause to handle collisions. Thefore this:
CREATE TABLE Event (Id INTEGER, Fld0 TEXT, Fld1 INTEGER, Fld2 TEXT, Fld3 TEXT, Fld4 TEXT, Fld5 TEXT, Fld6 TEXT);
became this:
CREATE TABLE Event (Id INTEGER, Fld0 TEXT, Fld1 INTEGER, Fld2 TEXT, Fld3 TEXT, Fld4 TEXT, Fld5 TEXT, Fld6 TEXT, PRIMARY KEY (Fld0, Fld2, Fld3) ON CONFLICT REPLACE);
This does indeed enforce the uniqueness constraint as I need it to. Unfortunately, this change also incurs a performance penalty that is way beyond what I expected. I did
a few tests using the sqlite3
command line utility to ensure that there is not a fault in the rest of my code. The tests involve entering 100,000 rows, either in a single
transaction or in 100 transactions of 1,000 rows each. I got the following results:
| 1 * 100,000 | 10 * 10,000 | 100 * 1,000 |
|---------------|---------------|---------------|
| Time | CPU | Time | CPU | Time | CPU |
| (sec) | (%) | (sec) | (%) | (sec) | (%) |
--------------------------------|-------|-------|-------|-------|-------|-------|
No primary key | 2.33 | 80 | 3.73 | 50 | 15.1 | 15 |
--------------------------------|-------|-------|-------|-------|-------|-------|
Primary key: Fld3 | 5.19 | 84 | 23.6 | 21 | 226.2 | 3 |
--------------------------------|-------|-------|-------|-------|-------|-------|
Primary key: Fld2, Fld3 | 5.11 | 88 | 24.6 | 22 | 258.8 | 3 |
--------------------------------|-------|-------|-------|-------|-------|-------|
Primary key: Fld0, Fld2, Fld3 | 5.38 | 87 | 23.8 | 23 | 232.3 | 3 |
My application currently performs transactions of at most 1,000 rows and I was surprised by the 15-fold drop in performance. I expected at most a 3-fold drop in throughput and a rise in CPU usage, as seen in the 100k-transaction case. I guess the indexing involved in maintaining the primary key constraints requires a significantly larger number of synchronous DB operations, thus making my hard drives the bottleneck in this case.
Using WAL mode does have some effect - a performance increase of about 15%. Unfortunately that is not enough on its own. PRAGMA synchronous = NORMAL
did not seem to have any effect.
I might be able to recover some performance by increasing the transaction size, but I'd rather not do that, due to the increased memory usage and concerns about responsiveness and reliability.
The text fields in each row have variable lengths of about 250 bytes in average. The query performance does not matter too much, but the insert performance is very important. My application code is in C and is (supposed to be) portable to at least Linux and Windows.
Is there a way to improve the insert performance without increasing the transaction size? Either some setting in SQLite (anything but permanently forcing the DB into asynchronous operation, that is) or programmatically in my application code? For example, is there a way to ensure row uniqueness without using an index?
BOUNTY:
By using the hashing/indexing method described in my own answer, I managed to somewhat moderate the performance drop to a point where it's probably acceptable for my application. It seems, however, that as the number of rows in the table increases, the presence of the index makes inserts slower and slower.
I am interested in any technique or fine-tuning setting that will increase performance in this particular use case, as long as it does not involve hacking the SQLite3 code or otherwise cause the project to become unmaintainable.