I am trying to insert a large(-ish) number of elements in the shortest time possible and I tried these two alternatives:
1) Pipelining:
List<Task> addTasks = new List<Task>();
for (int i = 0; i < table.Rows.Count; i++)
{
DataRow row = table.Rows[i];
Task<bool> addAsync = redisDB.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
addTasks.Add(addAsync);
}
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);
2) Batching:
List<Task> addTasks = new List<Task>();
IBatch batch = redisDB.CreateBatch();
for (int i = 0; i < table.Rows.Count; i++)
{
DataRow row = table.Rows[i];
Task<bool> addAsync = batch.SetAddAsync(string.Format(keyFormat, row.Field<int>("Id")), row.Field<int>("Value"));
addTasks.Add(addAsync);
}
batch.Execute();
Task[] tasks = addTasks.ToArray();
Task.WaitAll(tasks);
I am not noticing any significant time difference (actually I expected the batch method to be faster): for approx 250K inserts I get approx 7 sec for pipelining vs approx 8 sec for batching.
Reading from the documentation on pipelining,
"Using pipelining allows us to get both requests onto the network immediately, eliminating most of the latency. Additionally, it also helps reduce packet fragmentation: 20 requests sent individually (waiting for each response) will require at least 20 packets, but 20 requests sent in a pipeline could fit into much fewer packets (perhaps even just one)."
To me, this sounds a lot like the a batching behaviour. I wonder if behind the scenes there's any big difference between the two because at a simple check with procmon
I see almost the same number of TCP Send
s on both versions.
SetAddAsync + FireAndForget
+ finalPing
. Excluding case of unknown transient errors, would the Sets be guaranteed to be added to by the timePing
completes? Or could they arrive out of order? – Isotonic