INTEGER PRIMARY KEY vs rowid in SQLite
Asked Answered
L

1

5

I am trying to import some spatial data (OSM) into an SQLite database. The SQLite reference states that an INTEGER PRIMARY KEY becomes an alias for the rowid (if WITHOUT ROWID is not specified). Just to be sure, I created my main table in two different manners:

CREATE TABLE points (tags BLOB NOT NULL,
                     lon INTEGER NOT NULL,
                     lat INTEGER NOT NULL)

vs.

CREATE TABLE points (id INTEGER PRIMARY KEY,
                     tags BLOB NOT NULL,
                     lon INTEGER NOT NULL,
                     lat INTEGER NOT NULL)

I expected the same results, but after running the application twice, my two database files clearly differ in size: The version with explicit primary key takes about 100 MB more of disk space (1.5 GB vs 1.4 GB). My insert statements are equal apart from the fact that one uses "id", the other one "rowid" as destination column for the point ID.

Does anyone have a clue where this massive difference in size comes from? Thanks in advance.

Leggat answered 4/6, 2018 at 17:53 Comment(7)
Are the id and rowid of the corresponding rows the same? Are the rows inserted in the same order?Keep
Yes, check this screenshot from the DB browser: On the left side, id is clearly an alias for the rowid. And the insert order is equal in both cases.Leggat
Try running VACUUM on both. If there is still a difference, show the output of sqlite3_analyzer for both.Keep
@CL just replicated above Using DROP TABLE IF EXISTS points; CREATE TABLE IF NOT EXISTS points (id INTEGER PRIMARY KEY, tags BLOB NOT NULL, lon INTEGER NOT NULL, lat INTEGER NOT NULL); WITH RECURSIVE counter(tags,lon,lat) AS (SELECT x'00000000', 0,0 UNION ALL SELECT tags, random() AS lon, random() AS lat FROM counter LIMIT 1000000) INSERT INTO points (tags,lon,lat) SELECT * FROM counter;. (first without id column obviously) and first is 29484Kb, latter with id is 30600kb. Also did VACUUM doesn't change sizes. So there does appear to an overhead for having an alias.Cynthia
perhaps When an SQL table includes an INTEGER PRIMARY KEY column (which aliases the rowid) then that column appears in the record as a NULL value. SQLite will always use the table b-tree key rather than the NULL value when referencing the INTEGER PRIMARY KEY column from Database File Format is the cause.Cynthia
Running VACUUM only makes a difference on the table without INTEGER PRIMARY KEY: It reassigns the rowids by forming a continuous sequence that starts at 1 and even raises the difference in storage size. But that doesn't help me as I need to preserve the exact IDs. So the INTEGER PRIMARY KEY seems the way to go. Maybe the diff in space is the price I have to pay for a constant identifier that is not changed by VACUUM or anything else?Leggat
@Cynthia That actually sounds reasonable. Thank you both for your input. I'll stick to the primary key to protect the column from unwanted changes.Leggat
C
7

It would appear that there is an overhead of having an alias for the rowid of a byte (I think) per row, which I believe is explained by :-

When an SQL table includes an INTEGER PRIMARY KEY column (which aliases the rowid) then that column appears in the record as a NULL value. SQLite will always use the table b-tree key rather than the NULL value when referencing the INTEGER PRIMARY KEY column. Database File Format - 2.3. Representation Of SQL Tables.

The 1 byte per row appears to be pretty close according to the following testing:-

Two databases were created with the two differing tables, loaded with 1,000,000 million rows using the following SQL :-

For the First :-

DROP TABLE IF EXISTS points;
CREATE TABLE IF NOT EXISTS points (tags BLOB NOT NULL, lon INTEGER NOT NULL, lat INTEGER NOT NULL);
WITH RECURSIVE counter(tags,lon,lat) AS (SELECT x'00000000', 0,0 UNION ALL SELECT tags, random() AS lon, random() AS lat FROM counter LIMIT 1000000)
INSERT INTO points (tags,lon,lat) SELECT * FROM counter;
SELECT * FROM points;
VACUUM

For the Second (with an alias of the rowid):-

DROP TABLE IF EXISTS points;
CREATE TABLE IF NOT EXISTS points (id INTEGER PRIMARY KEY, tags BLOB NOT NULL, lon INTEGER NOT NULL, lat INTEGER NOT NULL);
WITH RECURSIVE counter(tags,lon,lat) AS (SELECT x'00000000', 0,0 UNION ALL SELECT tags, random() AS lon, random() AS lat FROM counter LIMIT 1000000)
INSERT INTO points (tags,lon,lat) SELECT * FROM counter;
SELECT * FROM points;
VACUUM

The the resultant file sizes were 29484Kb and 30600Kb respectively.

That being a difference of 30600 - 29484 = 1,116, multiply this by 1024 = 1142784 (not that far off the 1,000,000 rows, pages and freespace probably accounting for the discrepancy ).

  • Note the VACUUM command made no difference (as they were new tables there was no expectation that they would.)
Cynthia answered 4/6, 2018 at 22:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.