How do I speed up the import of data from a CSV file into a SQLite table (in Windows)?
Asked Answered
W

5

9

When I was searching for a tool to create and update SQlite databases for use in an Android application I was recommended to use SQLite Database Browser. This has a windows GUI and is reasonably powerful, offering in particular a menu option to import data to a new table from a CSV file.

This has proved perfectly capable for initial creation of the database and I have been using the CSV Import option to update the database whenever I have new data to be added.

When there were only a few records to import this worked well, however as the volume of data has grown the process has become painfully slow. A data file of 11,000 records (800 kilobytes) takes about 10 minutes to import on my averagely slow laptop. Using SQLite Database Browser the whole process of deleting the old table, running the import command, then correcting the data types of the new table created by the import command takes the best part of 15 minutes.

How can the import be speeded up?

Wurtz answered 12/6, 2011 at 20:24 Comment(0)
S
13

You could use the built-in csv import (using the sqlite3 command line utility):

create table test (id integer, value text);
.separator ","
.import no_yes.csv test

Importing 10,000 records took less than 1 second on my Laptop.

Sonata answered 12/6, 2011 at 20:34 Comment(3)
Thanks for your prompt response. I think that .import can give problems if the data includes commas, in which case it may be better to use tab-separated source data. Using .mode tabs instead of .separator "," seems to work. How would one specify tab-separation using the .separator command?Wurtz
you can use .separator ' ' where the character between the single quotes is a literal tab character.Certiorari
You can also use .separator "\t"Cephalization
W
5

By googling I have found several people asking this question, however I have not found the answer set out in once place in simple terms that I could understand. So, I hope the following will help.

The command line utility sqlite3.exe offers a very simple solution. The reason why the "import CSV" option in SQLite Database Browser is so slow is that it executes and commits to the database a separate SQL 'insert' statement foreach line in the CSV file. However, sqlite3.exe includes an "import" command which will process the whole in one go. What's more this is done virtually instantaneously: my 11,000 records are imported in well under a second.

There is a slight drawback in that the import command does not deal with commas in the same way as other programs such as Excel. For example, if cell A1 in Excel contains Joe Bloggs and cell B1 contains 123 Main Street, Anytown the row is exported into a CSV file as: Joe Bloggs,"123 Main Street, Anytown" However, if you tried to import this using sqlite3 into a 2-column table, sqlite3 would report an error because it would treat each of the commas as a field separator and so would try to import Joe Bloggs, "123 Main Street and Anytown" as 3 separate fields.

Because it is unusual for text fields (especially in Excel) to include tabs this problem can usually be avoided by using a file where the fields are delimited by tabs rather than by commas.

Since sqlite3.exe can execute any SQL statement and a number of additional commands (like 'import') it is very flexible. However, a routine job like my need to import a delimited data file into a database table can be automated by:

  • listing the SQL statements and sqlite3.exe commands in a small text file, and feeding this file into sqlite3.exe as a command line parameter

  • writing a short Windows (MS-DOS) batch file to run sqlite3.exe with the specified list of commands.

These are the steps I followed:

  1. Download and unzip sqlite3.exe
  2. Convert the raw data from comma separated values to tab separated values.
  3. Create a script file listing commands to be executed by sqlite3.exe as follows:

    drop table tblTableName;

    create table tblTableName(_id INTEGER PRIMARY KEY, fldField1 TEXT, fldField2 NUMERIC, .... );

    .mode tabs

    .import SubfolderName/DataToBeImported.tsv tblTableName

    (Note: SQL statements are followed by a semi-colon; sqlite3.exe commands are preceded by a full stop (period))

  4. Create a .bat file as follows:

    cd "c:\users\UserName\FolderWhereSqlite3DatabaseFileAndScriptFileAreStored"

    sqlite3 DatabaseName < textimportscript.txt

Having set this up, all I need to do whenever I have new data to add is run the batch file and the data is imported in an instant.

Wurtz answered 12/6, 2011 at 20:43 Comment(0)
C
4

If you are generating INSERT statements, enclose them in a single transaction as stated in the official SQLite FAQ:

BEGIN; -- or BEGIN TRANSACTION;

INSERT ...;
INSERT ...;

END; -- can be COMMIT TRANSACTION; also
Certiorari answered 12/6, 2011 at 20:29 Comment(1)
Thanks for the speedy response. Being a relative beginner I haven't found the SQLite FAQ the friendliest of documentation, hence my attempt to write an idiot's guide to using the sqlite3 import command. Incidentally, if I were to follow your example how would I obtain the tens of thousands of INSERT statements? Would I have to write code to do this? Or is there a shortcut such as, for example, an Excel export option that will provide the data in this form?Wurtz
D
1

Have you tried wrapping all of your updates into a transaction? I had a similar problem and doing that sped it up no end.

Assuming Android Device:

db.beginTransaction();
// YOUR CODE
db.setTransactionSuccessful();
db.endTransaction();

Try that :)

Detta answered 12/6, 2011 at 20:29 Comment(1)
Thanks for your speedy response. I'm not sure what you mean by "assuming Android device". My starting point is that the source data is on a Windows PC and that the easiest way to get into into a database on my phone is to create an updated database file on the PC then copy that to the phone. Is there a better way? (Also, I'm trying to avoid code, if possible.)Wurtz
P
1
sqlite> PRAGMA journal_mode=WAL;
sqlite> PRAGMA synchronous = 0;
sqlite> PRAGMA journal_mode=MEMORY;
memory
sqlite> BEGIN IMMEDIATE;
.import --csv blah.csv <tablename>
sqlite> COMMIT;

This turns off sync() on write, and puts the WAL file in memory, so it's not "safe", but as long as you are doing this "offline" as it were, and were OK re-creating the DB if power went out, disk gets full, etc, then this will def. speed up the import.

Portingale answered 26/3, 2021 at 20:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.