I am using Delphi XE8 with FireDAC to load a large SQLite database. To do so, I'm using the Array DML execution technique to efficiently insert a large number of records at once, like this:
FDQueryAddINDI.SQL.Text := 'insert into indi values ('
+ ':indikey, :hasdata, :gedcomnames, :sex, :birthdate, :died, '
+ ':deathdate, :changed, :eventlinesneedprocessing, :eventlines, '
+ ':famc, :fams, :linkinfo, :todo, :nextreportindi, :firstancestralloop'
+ ')';
FDQueryAddINDI.Params.Bindmode := pbByNumber; {more efficient than by name }
FDQueryAddINDI.Params.ArraySize := MaxParams; { large enough to load all of them }
NumParams := 0;
repeat
{ the code to determin IndiKey,... is not shown, but goes here }
FDQueryAddINDI.Params[0].AsStrings[NumParams] := IndiKey;
FDQueryAddINDI.Params[1].AsIntegers[NumParams] := HasData;
FDQueryAddINDI.Params[2].AsStrings[NumParams] := GedcomNames;
FDQueryAddINDI.Params[3].AsStrings[NumParams] := Sex;
FDQueryAddINDI.Params[4].AsStrings[NumParams] := Birthdate;
FDQueryAddINDI.Params[5].AsIntegers[NumParams] := Died;
FDQueryAddINDI.Params[6].AsStrings[NumParams] := Deathdate;
FDQueryAddINDI.Params[7].AsStrings[NumParams] := Changed;
FDQueryAddINDI.Params[8].AsIntegers[NumParams] := EventLinesNeedProcessing;
FDQueryAddINDI.Params[9].AsStrings[NumParams] := EventLines;
FDQueryAddINDI.Params[10].AsIntegers[NumParams] := FamC;
FDQueryAddINDI.Params[11].AsIntegers[NumParams] := FamS;
FDQueryAddINDI.Params[12].AsIntegers[NumParams] := Linkinfo;
FDQueryAddINDI.Params[13].AsIntegers[NumParams] := ToDo;
FDQueryAddINDI.Params[14].AsIntegers[NumParams] := NextReportIndi;
FDQueryAddINDI.Params[15].AsIntegers[NumParams] := FirstAncestralLoop;
inc(NumParams);
until done;
FDQueryAddINDI.Params.ArraySize := NumParams; { Reset to actual number }
FDQueryAddINDI.Execute(LogoAppForm.FDQueryAddINDI.Params.ArraySize);
The actual loading of the data into the SQLite database is very fast, and I have no problem with the speed of that.
What is slowing me down is the amount of time being taken in the repeat loop to assign all the values to the parameters.
The Params are built into FireDAC and are a TCollection. I do not have access to the source code, so I can't see what the AsStrings and AsIntegers methods are actually doing.
Assigning each value to each parameter for each insert does not appear to me to be a very efficient way to load this TCollection. Is there a faster way to load this? I'm thinking maybe a way to load a whole set of parameters at once, e.g. (IndiKey, HasData, ... FirstAncestralLoop) all as one. Or maybe to load my own TCollection as efficiently as I can, and then use the TCollection's Assign method to copy my TCollection into the the FireDAC's TCollection.
So my question is what would be the fastest way to load this TCollection of parameters that FireDAC requires?
Update: I'm including some timings for Arnaud.
As stated in Using SQLite with FireDAC (see its Array DML section):
Starting with v 3.7.11, SQLite supports the INSERT command with multiple VALUES. FireDAC uses this feature to implement Array DML, when Params.BindMode = pbByNumber. Otherwise, FireDAC emulates Array DML.
I've tested inserting 33,790 records changing the arraysize (number of records to load per execute), and timed the loading time with both pbByName (for emulation) and pbByNumber (using multiple values insert).
This was the timing:
Arraysize: 1, Executes: 33,790, Timing: 1530 ms (pbByName), 1449 ms (pbByNumber)
Arraysize: 10, Executes: 3,379, Timing: 1034 ms (pbByName), 782 ms (pbByNumber)
Arraysize: 100, Executes: 338, Timing: 946 ms (pbByName), 499 ms (pbByNumber)
Arraysize: 1000, Executes: 34, Timing: 890 ms (pbByName), 259 ms (pbByNumber)
Arraysize: 10000, Executes: 4, Timing: 849 ms (pbByName), 227 ms (pbByNumber)
Arraysize: 20000, Executes: 2, Timing: 594 ms (pbByName), 172 ms (pbByNumber)
Arraysize: 50000, Executes: 1, Timing: 94 ms (pbByName), 94 ms (pbByNumber)
Now the interesting thing about these timings is that the loading of those 33,790 records into the TCollection is taking a full 93 ms every single test run. It doesn't matter whether they are being added 1 at a time or 10000 at a time, this overhead of filling the TCollection of Params is always there.
For comparison, I did a larger test with 198,522 inserts just for pbByNumber:
Arraysize: 100, Executes: 1986, Timing: 2774 ms (pbByNumber)
Arraysize: 1000, Executes: 199, Timing: 1371 ms (pbByNumber)
Arraysize: 10000, Executes: 20, Timing: 1292 ms (pbByNumber)
Arraysize: 100000, Executes: 2, Timing: 894 ms (pbByNumber)
Arraysize: 1000000, Executes: 1, Timing: 506 ms (pbByNumber)
For all cases of this test, the overhead of loading the TCollection of Params takes about 503 ms.
So the loading of the TCollection seems to be at about 400,000 records per second. This is a significant portion of the insert time, and once I start working with large databases in the millions, this added time will be quite noticeable to the user of my program.
I would like to improve this, but I have not yet found a way to speed the loading of the Params.
Update 2: I was able to get about a 10% time improvement by putting all my code between a StartTransaction and a Commit, so that all blocks will be processed at once.
But I'm still looking for some way to load the TCollection of Params much faster.
One other idea:
What might work well and could be up to 16 times faster if it were possible would be something like the ParamValues method. This assigns multiple params at once and has the added advantage of directly supplying a variant array, and avoids the need to cast values.
It would work like this:
FDQueryAddINDI.Params.ParamValues['indikey;hasdata;gedcomnames;sex;birthdate;died;deathdate;changed;eventlinesneedprocessing;eventlines;famc;fams;linkinfo;todo;nextreportindi;firstancestralloop']
:= VarArrayOf([Indikey, 0, ' ', ' ', ' ', 0, ' ', ' ', 1, ' ', -1, -1, -1, -1, -1, -1]);
However, ParamValues will only assign to the first set of Params, i.e. where NumIndiParms = 0.
Is there a way to do this for each Index in the loop, i.e., every instance of NumIndiParms?
Bounty: I really want to speed up the loading of the Params. I am now offering a bounty for someone to help find me a way to speed up the loading of the Params array TCollection as implemented in FireDAC.