I have some basic Azure tables that I've been querying serially:
var query = new TableQuery<DynamicTableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey",
QueryComparisons.Equal, myPartitionKey));
foreach (DynamicTableEntity entity in myTable.ExecuteQuery(query)) {
// Process entity here.
}
To speed this up, I parallelized this like so:
Parallel.ForEach(myTable.ExecuteQuery(query), (entity, loopState) => {
// Process entity here in a thread-safe manner.
// Edited to add: Details of the loop body below:
// This is the essence of the fixed loop body:
lock (myLock) {
DataRow myRow = myDataTable.NewRow();
// [Add entity data to myRow.]
myDataTable.Rows.Add(myRow);
}
// Old code (apparently not thread-safe, though NewRow() is supposed to create
// a DataRow based on the table's schema without changing the table state):
/*
DataRow myRow = myDataTable.NewRow();
lock (myLock) {
// [Add entity data to myRow.]
myDataTable.Rows.Add(myRow);
}
*/
});
This produces significant speedup, but the results tend to be slightly different between runs (i.e., some of the entities differ occasionally, though the number of entities returned is exactly the same).
From this and some web searching, I conclude that the enumerator above is not always thread-safe. The documentation appears to suggest that thread safety is guaranteed only if the table objects are public static, but that hasn't made a difference for me.
Could someone suggest how to resolve this? Is there a standard pattern for parallelizing Azure table queries?
Parallel.ForEach()
can handle that. A problem could be if the entities shared some state. – RnDataTable.NewRow()
call inside my critical section. I don't see why this is necessary, since that call is supposed only to create a new row based on the table's schema without affecting any table state (.NET DataTable, not Azure table). Thus, I'm not sure the problem is truly solved, but the code has always worked thus far. – Rawlinson