Parallel.ForEach keeps spawning new threads
Asked Answered
T

3

16

While I was using Parallel.ForEach in my program, I found that some threads never seemed to finish. In fact, it kept spawning new threads over and over, a behaviour that I wasn't expecting and definitely don't want.

I was able to reproduce this behaviour with the following code which, just like my 'real' program, both uses processor and memory a lot (.NET 4.0 code):

public class Node
{
    public Node Previous { get; private set; }

    public Node(Node previous)
    {
        Previous = previous;
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        DateTime startMoment = DateTime.Now;
        int concurrentThreads = 0;

        var jobs = Enumerable.Range(0, 2000);
        Parallel.ForEach(jobs, delegate(int jobNr)
        {
            Interlocked.Increment(ref concurrentThreads);

            int heavyness = jobNr % 9;

            //Give the processor and the garbage collector something to do...
            List<Node> nodes = new List<Node>();
            Node current = null;
            for (int y = 0; y < 1024 * 1024 * heavyness; y++)
            {
                current = new Node(current);
                nodes.Add(current);
            }

            TimeSpan elapsed = DateTime.Now - startMoment;
            int threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
            Console.WriteLine("[{0:mm\\:ss}] Job {1,4} complete. {2} threads remaining.",
                elapsed, jobNr, threadsRemaining);
        });
    }
}

When run on my quad-core, it initially starts of with 4 concurrent threads, just as you would expect. However, over time more and more threads are being created. Eventually, this program then throws an OutOfMemoryException:

[00:00] Job    0 complete. 3 threads remaining.
[00:01] Job    1 complete. 4 threads remaining.
[00:01] Job    2 complete. 4 threads remaining.
[00:02] Job    3 complete. 4 threads remaining.
[00:05] Job    9 complete. 5 threads remaining.
[00:05] Job    4 complete. 5 threads remaining.
[00:05] Job    5 complete. 5 threads remaining.
[00:05] Job   10 complete. 5 threads remaining.
[00:08] Job   11 complete. 5 threads remaining.
[00:08] Job    6 complete. 5 threads remaining.
...
[00:55] Job   67 complete. 7 threads remaining.
[00:56] Job   81 complete. 8 threads remaining.
...
[01:54] Job  107 complete. 11 threads remaining.
[02:00] Job  121 complete. 12 threads remaining.
..
[02:55] Job  115 complete. 19 threads remaining.
[03:02] Job  166 complete. 21 threads remaining.
...
[03:41] Job  113 complete. 28 threads remaining.
<OutOfMemoryException>

The memory usage graph for the experiment above is as follows:

Processor and memory usage

(The screenshot is in Dutch; the top part represents processor usage, the bottom part memory usage.) As you can see, it looks like a new thread is being spawned almost every time the garbage collector gets in the way (as can be seen in the dips of memory usage).

Can anyone explain why this is happening, and what I can do about it? I just want .NET to stop spawning new threads, and finish the existing threads first...

Taylor answered 26/12, 2012 at 10:11 Comment(1)
I've shooted a follow up question "How to count the amount of concurrent threads in .NET application?" If to ount the threads directly c they are mostly only increasing (very rarely and insignificantly decreasing)Overstudy
H
18

You can limit the maximum number of threads that get created by specifying a ParallelOptions instance with the MaxDegreeOfParallelism property set:

var jobs = Enumerable.Range(0, 2000);
ParallelOptions po = new ParallelOptions
{ 
    MaxDegreeOfParallelism = Environment.ProcessorCount
};

Parallel.ForEach(jobs, po, jobNr =>
{
    // ...
});

As to why you're getting the behaviour you're observing: The TPL (which underlies PLINQ) is, by default, at liberty to guess the optimal number of threads to use. Whenever a parallel task blocks, the task scheduler may create a new thread in order to maintain progress. In your case, the blocking might be happening implicitly; for example, through the Console.WriteLine call, or (as you observed) during garbage collection.

From Concurrency Levels Tuning with Task Parallel Library (How Many Threads to Use?):

Since the TPL default policy is to use one thread per processor, we can conclude that TPL initially assumes that the workload of a task is ~100% working and 0% waiting, and if the initial assumption fails and the task enters a waiting state (i.e. starts blocking) - TPL with take the liberty to add threads as appropriate.

Hemmer answered 26/12, 2012 at 10:16 Comment(5)
Clarification: Parallel.ForEach() is not part of PLINQ.Teledu
Thanks for your answer. It makes sense that when a thread blocks the TPL creates a new thread to keep the progress going, I just didn't think that the garbage collector cleaning up would count as a thread being blocked. Also, thanks for the link you posted, it clarifies why you may actually want more threads than cores, I never thought of that.Taylor
Really, it does not limit but makes it a little bit less promiscuousOverstudy
"You can limit the maximum number of threads that get created by specifying a ParallelOptions instance with the MaxDegreeOfParallelism property set:" Does it really limit the number of threads? What happens if you specify MaxDegree = 8, but one of those 8 threads blocks for some time. May it create a 9th thread?Swor
@NiklasPeter: "The thread injection logic doesn't distinguish between a thread that's blocked and a thread that's performing a lengthy, processor-intensive operation." The number of threads processing the Parallel.ForEach is capped by the MaxDegreeOfParallelism value. If this value is close to the size of the thread pool, then a small number of new threads might be created to replenish the thread pool (such that one worker thread is available per core), but these new threads cannot be used for the Parallel.ForEach tasks.Hemmer
K
8

You should probably read a bit about the how the task scheduler works.

Parallel Programming with Microsoft .NET - Parallel Tasks (latter half of the page)

"The .NET thread pool automatically manages the number of worker threads in the pool. It adds and removes threads according to built-in heuristics. The .NET thread pool has two main mechanisms for injecting threads: a starvation-avoidance mechanism that adds worker threads if it sees no progress being made on queued items and a hill-climbing heuristic that tries to maximize throughput while using as few threads as possible.

The goal of starvation avoidance is to prevent deadlock. This kind of deadlock can occur when a worker thread waits for a synchronization event that can only be satisfied by a work item that is still pending in the thread pool's global or local queues. If there were a fixed number of worker threads, and all of those threads were similarly blocked, the system would be unable to ever make further progress. Adding a new worker thread resolves the problem.

A goal of the hill-climbing heuristic is to improve the utilization of cores when threads are blocked by I/O or other wait conditions that stall the processor. By default, the managed thread pool has one worker thread per core. If one of these worker threads becomes blocked, there's a chance that a core might be underutilized, depending on the computer's overall workload. The thread injection logic doesn't distinguish between a thread that's blocked and a thread that's performing a lengthy, processor-intensive operation. Therefore, whenever the thread pool's global or local queues contain pending work items, active work items that take a long time to run (more than a half second) can trigger the creation of new thread pool worker threads."

You can mark a task as LongRunning but this has the side effect of allocating a thread for it from outside the thread pool which means that the task cannot be inlined.

Remember that the ParallelFor treats the work it is given as blocks so even if the work in one loop is fairly small the overall work done by the task invoked by the look may appear longer to the scheduler.

Most calls to the GC in and of them selves aren't blocking (it runs on a separate thread) but if you wait for GC to complete then this does block. Remember also that the GC is rearranging memory so this may have some side effects (and blocking) if you are trying to allocate memory while running GC. I don't have specifics here but I know the PPL has some memory allocation features specifically for concurrent memory management for this reason.

Looking at your code's output it seems that things are running for many seconds. So I'm not surprised that you are seeing thread injection. However I seem to remember that the default thread pool size is roughly 30 threads (probably depending on the number of cores on your system). A thread takes up roughly a MB of memory before your code allocates any more so I'm not clear why you could get an out of memory exception here.

Kershaw answered 16/1, 2013 at 20:41 Comment(2)
Thanks for the background information. As you can see each task allocates between 0 and 8 million Nodes and so takes up a fair amount of memory. As more tasks are started memory becomes scarce and the GC needs to free up memory more frequently to accommodate the new tasks. I suspect that the scheduler cannot tell if a thread is blocked because it is waiting for memory or because of something else, and starts a new thread in both cases. Obviously, if the blocked thread was waiting for memory, this only worsens the effect ...Taylor
I'm seeing the same effect, and I'm a bit amazed that this is not handled better by the TPL. I'm loading chunks of data from the web from 3M+ different locations, and does some processing on each chunk after it is loaded in memory. While tasks are waiting for data to load from the web, the TPL sees the web load blocking, and schedules new tasks to load even more data. After a while, there are so many tasks loading chunks of data that eventually I end up getting OOM exceptions.Limiting Parallel.ForEach to use a fixed value MaxDegreeOfParallelism solved the problem.Thersathersites
O
1

I've posted the follow-up question "How to count the amount of concurrent threads in .NET application?"

If to count the threads directly, their number in Parallel.For() mostly ((very rarely and insignificantly decreasing) only increases and is not releleased after loop completion.

Checked this in both Release and Debug mode, with

ParallelOptions po = new ParallelOptions
{
  MaxDegreeOfParallelism = Environment.ProcessorCount
};

and without

The digits vary but conclusions are the same.

Here is the ready code I was using, if someone wants to play with:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace Edit4Posting
{
public class Node
{

  public Node Previous { get; private set; }
  public Node(Node previous)
  {
    Previous = previous;
    }
  }
  public class Edit4Posting
  {

    public static void Main(string[] args)
    {
      int concurrentThreads = 0;
      int directThreadsCount = 0;
      int diagThreadCount = 0;

      var jobs = Enumerable.Range(0, 160);
      ParallelOptions po = new ParallelOptions
      {
        MaxDegreeOfParallelism = Environment.ProcessorCount
      };
      Parallel.ForEach(jobs, po, delegate(int jobNr)
      //Parallel.ForEach(jobs, delegate(int jobNr)
      {
        int threadsRemaining = Interlocked.Increment(ref concurrentThreads);

        int heavyness = jobNr % 9;

        //Give the processor and the garbage collector something to do...
        List<Node> nodes = new List<Node>();
        Node current = null;
        //for (int y = 0; y < 1024 * 1024 * heavyness; y++)
        for (int y = 0; y < 1024 * 24 * heavyness; y++)
        {
          current = new Node(current);
          nodes.Add(current);
        }
        //*******************************
        directThreadsCount = Process.GetCurrentProcess().Threads.Count;
        //*******************************
        threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
        Console.WriteLine("[Job {0} complete. {1} threads remaining but directThreadsCount == {2}",
          jobNr, threadsRemaining, directThreadsCount);
      });
      Console.WriteLine("FINISHED");
      Console.ReadLine();
    }
  }
}
Overstudy answered 15/3, 2013 at 6:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.