I have been using PLINQ recently to perform some data handling.
Basically I have about 4000 time series (so basically instances of Dictionary<DataTime,T>
) which I stock in a list called timeSeries
.
To perform my operation, I simply do:
timeSeries.AsParallel().ForAll(x=>myOperation(x))
If I have a look at what is happening with my different cores, I notice that first, all my CPUs are being used and I see on the console (where I output some logs) that several time series are processed at the same time.
However, the process is lengthy, and after about 45 minutes, the logging clearly indicates that there is only one thread working. Why is that?
I tried to give it some thought, and I realized that timeSeries
contains instances simpler to process from myOperation
's point of view at the beginning and the end of the list. So, I wondered if maybe the algorithm that PLINQ was using consisted in splitting the 4000 instances on, say, 4 cores, giving each of them 1000. Then, when the core is finished with its allocation of work, it goes back to idle. This would mean that one of the core may be facing a much heavier workload.
Is my theory correct or is there another possible explanation?
Shall I shuffle my list before running it or is there some kind of parallelism parameters I can use to fix that problem?