Parallel.ForEach not spinning up new threads
Asked Answered
R

2

3

Parallel.ForEach Not Spinning Up New Threads

Hello all, we have a very IO-intensive operation that we wrote using Parallel.ForEach from Microsoft's Parallel Extensions for the .NET Framework. We need to delete a large number of files, and we represent the files to be deleted as a list of lists. Each nested list has 1000 messages in it, and we have 50 of these lists. The issue here is that when I look in the logs afterwards, I only see one thread executing inside of our Parallel.ForEach block.

Here's what the code looks like:

List<List<Message>> expiredMessagesLists = GetNestedListOfMessages();
foreach (List<Message> subList in expiredMessagesLists)
{
    Parallel.ForEach(subList, msg =>
    {
        try
        {
            Logger.LogEvent(TraceEventType.Information, "Purging Message {0} on Thread {1}", msg.MessageID, msg.ExtensionID, Thread.CurrentThread.Name);

            DeleteMessageFiles(msg);
        }
        catch (Exception ex)
        {
            Logger.LogException(TraceEventType.Error, ex);
        }
    });
}

I wrote some sample code with a simpler data structure and no IO logic, and I could see several different threads executing within the Parallel.ForEach block. Are we doing something incorrect with Parallel.ForEach in the code above? Could it be the list of lists that's tripping it up, or is there some sort of threading limitation for IO operations?

Radiometer answered 7/12, 2009 at 17:33 Comment(8)
How much do you gain by deleting files in parallel? Is it a significant amount? Does your underlying hardware configuration support such gains?Eaddy
What is the code for DeleteMessageFilesSnowonthemountain
Do you get the same results if you use Thread.ManagedThreadId instead of Thread.CurrentThread.Name? For threadpool threads, the names often look the same, even if they're not...Exurbanite
@roygbiv: That's a very good point, IF the "Delete file" is a local delete. Disk IO may be slower if done in a multithreaded fashion. Without knowing what "DeleteMessageFiles" does, it's difficult to tell, though. It may do other, substantial "work", in which case it'd be a good opportunity for concurrency.Exurbanite
@roygbiv I can't yet say how much we gain by deleting in parallel, as we haven't been able to successfully do it yet. The code I posted above will need to delete a few million files from network storage, so there's a great deal of blocking here. In terms of hardware configuration, I can only speculate that our hardware will support such gains, as this is a heavy duty NAS.Radiometer
@Snowonthemountain DeleteMessageFiles gets a FileInfo on the file we're deleting, removes the readonly attribute on the file, deletes the file via File.Delete, and makes a couple of stored procedure calls to reflect in the db that the file has been deleted.Radiometer
@Reed Copsey I see the same results with ManagedThreadId.Radiometer
@codypo: Did you check to see the lengths of your inner lists? Have you tried parallelizing the outer loop?Exurbanite
E
6

There are a couple of possibilities.

First off, in most cases, Parallel.ForEach will not spawn a new thread. It uses the .NET 4 ThreadPool (all of the TPL does), and will reuse ThreadPool threads.

That being said, Parallel.ForEach uses a partitioning strategy based on the size of the List being passed to it. My first guess is that your "outer" list has many messages, but the inner list only has one Message instance, so the ForEach partitioner is only using a single thread. With one element, Parallel is smart enough to just use the main thread, and not spin work onto a background thread.

Normally, in situations like this, it's better to parallelize the outer loop, not the inner loop. That will usually give you better performance (since you'll have larger work items), although it's difficult to know without having a good sense of the loop sizes plus the size of the Unit of Work. You could also, potentially, parallelize both the inner and outer loops, but without profiling, it'd be difficult to tell what would be the best option.

One other possibility:

Try using [Thread.ManagedThreadId][1] instead of Thread.CurrentThread.Name for your logging. Since Parallel uses ThreadPool threads, the "Name" is often identical across multiple threads. You may think you're only using a single thread, when you're in fact using more than one....

Exurbanite answered 7/12, 2009 at 17:51 Comment(0)
G
2

The assumption underlying your code is that it is possible to delete files in parallel. I'm not saying it isn't (I'm no expert on the matter), but I wouldn't be surprised if that is simply not possible for most hardware. You are, after all, performing an operation with a physical object (your hard disk) when you do this.

Suppose you had a class, Person, with a method called RaiseArm(). You could always try shooting off RaiseArm() on 100 different threads, but the Person is only ever going to be able to raise two at a time...

Like I said, I could be wrong. This is just my suspicion.

Genetics answered 7/12, 2009 at 17:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.