The truly terrible way to do it is to execute each INSERT
statement as its own batch:
Batch 1:
INSERT INTO Entries (id, name) VALUES (1, 'Ian Boyd);
Batch 2:
INSERT INTO Entries (id, name) VALUES (2, 'Bottlenecked);
Batch 3:
INSERT INTO Entries (id, name) VALUES (3, 'Marek Grzenkowicz);
Batch 4:
INSERT INTO Entries (id, name) VALUES (4, 'Giorgi);
Batch 5:
INSERT INTO Entries (id, name) VALUES (5, 'AMissico);
Note: Parameterization, error checking, and any other nit-picks elided for expoistory purposes.
This is truly, horrible, terrible way to do things. It gives truely awful performance, because you suffer the network round-trip-time every time.
A much better solution is to batch all the INSERT
statements into one batch:
Batch 1:
INSERT INTO Entries (id, name) VALUES (1, 'Ian Boyd');
INSERT INTO Entries (id, name) VALUES (2, 'Bottlenecked');
INSERT INTO Entries (id, name) VALUES (3, 'Marek Grzenkowicz');
INSERT INTO Entries (id, name) VALUES (4, 'Giorgi');
INSERT INTO Entries (id, name) VALUES (5, 'AMissico');
This way you only suffer one-round trip. This version has huge performance wins; on the order of 5x faster.
Even better is to use the VALUES
clause:
INSERT INTO Entries (id, name)
VALUES
(1, 'Ian Boyd'),
(2, 'Bottlenecked'),
(3, 'Marek Grzenkowicz'),
(4, 'Giorgi'),
(5, 'AMissico');
This gives you some performance improvements over the 5 separate INSERT
s version; it lets the server do what it's good at: operating on sets:
- each trigger only has to operate once
- foreign keys are only checked once
- unique constraints are only checked once
SQL Sever loves to operate on sets of data; it's where it's a viking!
Parameter limit
The above T-SQL examples have all the parameteriztion stuff removed for clarity. But in reality you want to parameterize queries
- Not because you want to avoid SQL injection; because you're already a good developer who's using
QuotedString(firstName)
- Not because you want the performance bonus of saving the server from having to compile each T-SQL batch (Although, during a high-speed bulk-import, saving the parsing time really adds up)
- but because you want to avoid flooding the server's query plan cache with gibibytes upon gibibytes of ad-hoc query plans. (I've seen SQL Server's working set, i.e. RAM usage, not memory usage, be 2 GB of just unparameterized SQL query plans)
But Bruno has an important point; SQL Server's driver only lets you include 2,100 parameters in a batch. The above query has two values:
@id, @name
If you import 1,051 rows in a single batch, that's 2,102 parameters - you'll get the error:
Too many parameters were provided in this RPC request
That is why i generally insert 5 or 10 rows at a time. Adding more rows per batch doesn't improve performance; there's diminishing returns.
It keeps the number of parameters low, so it doesn't get anywhere near the T-SQL batch size limit. There's also the fact that a VALUES
clause is limited to 1000 tuples anyway.
Implementing it
Your first approach is good, but you do have the issues of:
- parameter name collisions
- unbounded number of rows (possibly hitting the 2100 parameter limit)
So the goal is to generate a string such as:
INSERT INTO Entries (id, name) VALUES
(@p1, @p2),
(@p3, @p4),
(@p5, @p6),
(@p7, @p8),
(@p9, @p10)
I'll change your code by the seat of my pants
IEnumerable<Entry> entries = GetStuffToInsert();
SqlCommand cmd = new SqlCommand();
StringBuilder sql = new StringBuilder();
Int32 batchSize = 0; //how many rows we have build up so far
Int32 p = 1; //the current paramter name (i.e. "@p1") we're going to use
foreach(var entry in entries)
{
//Build the names of the parameters
String pId = String.Format("@p{0}", p); //the "Id" parameter name (i.e. "p1")
String pName = String.Format("@p{0}", p+1); //the "Name" parameter name (i.e. "p2")
p += 2;
//Build a single "(p1, p2)" row
String row = String.Format("({0}, {1})", pId, pName); //a single values tuple
//Add the row to our running SQL batch
if (batchSize > 0)
sb.AppendLine(",");
sb.Append(row);
batchSize += 1;
//Add the parameter values for this row
cmd.Parameters.Add(pID, System.Data.SqlDbType.Int ).Value = entry.Id;
cmd.Parameters.Add(pName, System.Data.SqlDbType.String).Value = entry.Name;
if (batchSize >= 5)
{
String sql = "INSERT INTO Entries (id, name) VALUES"+"\r\n"+
sb.ToString();
cmd.CommandText = sql;
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
sb.Clear();
batchSize = 0;
p = 1;
}
}
//handle the last few stragglers
if (batchSize > 0)
{
String sql = "INSERT INTO Entries (id, name) VALUES"+"\r\n"+
sb.ToString();
cmd.CommandText = sql;
cmd.ExecuteNonQuery();
}