Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade
Asked Answered
B

3

34

I have some code to process several million data rows in my own R-like C# DataFrame class. There's a number of Parallel.ForEach calls for iterating over the data rows in parallel. This code has been running for over a year using VS2013 and .NET 4.5 without issues.

I have two dev machines (A and B) and recently upgraded machine A to VS2015. I started noticing a strange intermittent freeze in my code about half the time. Letting it run for a long time, it turns out that the code does eventually finish. It just takes 15-120 minutes instead of 1-2 minutes.

Attempts to Break All using the VS2015 debugger keep failing for some reason. So I inserted a bunch of log statements. It turns out that this freeze occurs when there is a Gen2 collection during a Parallel.ForEach loop (comparing the collection count before and after each Parallel.ForEach loop). The entire extra 13-118 minutes is spent inside whichever Parallel.ForEach loop call happens to overlap with a Gen2 collection (if any). If there are no Gen2 collections during any Parallel.ForEach loops (about 50% of the time when I run it), then everything finishes fine in 1-2 minutes.

When I run the same code in VS2013 on Machine A, I get the same freezes. When I run the code in VS2013 on Machine B (which was never upgraded), it works perfectly. It ran dozens of time overnight with no freezing.

Some things I've noticed / tried:

  • The freezes happen with or without the debugger attached on Machine A (I figured it was something with the VS2015 debugger at first)
  • The freezes happen whether I build in Debug or Release mode
  • The freezes happen if I target .NET 4.5 or .NET 4.6
  • I tried disabling RyuJIT but that did not affect the freezes

I'm not changing the default GC settings at all. According to GCSettings, all runs are happening with LatencyMode Interactive and IsServerGC as false.

I could just switch to LowLatency before every call to Parallel.ForEach, but I'd really prefer to understand what's going on.

Has anyone else seen strange freezes in Parallel.ForEach after the VS2015 upgrade? Any ideas on what a good next step would be?

UPDATE 1: Adding some sample code to the nebulous explanation above...

Here is some sample code that I hope will demonstrate this issue. This code runs in 10-12 seconds on B machine, consistently. It encounters a number of Gen2 collections, but they take almost no time at all. If I uncomment the two GC settings lines, I can force it to have no Gen2 collections. It's somewhat slower then at 30-50 seconds.

Now on my A machine, the code takes a random amount of time. Seems to be between 5 and 30 minutes. And it seems to get worse, the more Gen2 collections it encounters. If I uncomment the two GC settings lines, it takes 30-50 seconds on Machine A also (same as Machine B).

It might take some tweaking in terms of the number of rows and array size for this to show up on another machine.

using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Linq;
using System.Runtime;    

public class MyDataRow
{
    public int Id { get; set; }
    public double Value { get; set; }
    public double DerivedValuesSum { get; set; }
    public double[] DerivedValues { get; set; }
}

class Program
{
    static void Example()
    {
        const int numRows = 2000000;
        const int tempArraySize = 250;

        var r = new Random();
        var dataFrame = new List<MyDataRow>(numRows);

        for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { Id = i, Value = r.NextDouble() });

        Stopwatch stw = Stopwatch.StartNew();

        int gcs0Initial = GC.CollectionCount(0);
        int gcs1Initial = GC.CollectionCount(1);
        int gcs2Initial = GC.CollectionCount(2);

        //GCSettings.LatencyMode = GCLatencyMode.LowLatency;

        Parallel.ForEach(dataFrame, dr =>
        {
            double[] tempArray = new double[tempArraySize];
            for (int j = 0; j < tempArraySize; j++) tempArray[j] = Math.Pow(dr.Value, j);
            dr.DerivedValuesSum = tempArray.Sum();
            dr.DerivedValues = tempArray.ToArray();
        });

        int gcs0Final = GC.CollectionCount(0);
        int gcs1Final = GC.CollectionCount(1);
        int gcs2Final = GC.CollectionCount(2);

        stw.Stop();

        //GCSettings.LatencyMode = GCLatencyMode.Interactive;

        Console.Out.WriteLine("ElapsedTime = {0} Seconds ({1} Minutes)", stw.Elapsed.TotalSeconds, stw.Elapsed.TotalMinutes);

        Console.Out.WriteLine("Gcs0 = {0} = {1} - {2}", gcs0Final - gcs0Initial, gcs0Final, gcs0Initial);
        Console.Out.WriteLine("Gcs1 = {0} = {1} - {2}", gcs1Final - gcs1Initial, gcs1Final, gcs1Initial);
        Console.Out.WriteLine("Gcs2 = {0} = {1} - {2}", gcs2Final - gcs2Initial, gcs2Final, gcs2Initial);

        Console.Out.WriteLine("Press Any Key To Exit...");
        Console.In.ReadLine();
    }

    static void Main(string[] args)
    {
        Example();
    }
}

UPDATE 2: Just to move things out of the comments for future readers...

This hotfix: https://support.microsoft.com/en-us/kb/3088957 totally fixes the issue. I'm not seeing any slowness issues at all after applying.

It turned out not to have anything to do with Parallel.ForEach I believe based on this: http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx though the hotfix does mention Parallel.ForEach for some reason.

Ballot answered 31/7, 2015 at 13:57 Comment(4)
The next step would be to post an MCVE so we can try reproducing this on our machine and see if we experience the same behavior. Was this built to run as a x86 or x64 process?Ailee
x64. Understood, working on one. But its hard to get the GCs to work out just right. Was hoping that I was missing some obvious.Ballot
@MichaelCovelli What happens when you force GC in the loop using GC.Collect()?Chromolithography
This hotfix: support.microsoft.com/en-us/kb/3088957 totally fixes the issue. I'm not seeing any slowness issues at all after applying.Ballot
C
5

It looks like the problem has been addressed now, see http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx

Communal answered 13/8, 2015 at 22:20 Comment(3)
This hotfix: support.microsoft.com/en-us/kb/3088957 was just released and totally fixes the issue.Ballot
The hotfix version is different depending on the version of Windows. Based on a comment from blogs.msdn.com/b/maoni/archive/2015/08/12/…, I believe we have the following. For Windows Vista, Windows 7, Windows Server 2008, and Windows Server 2008 R2: 3088957. For Windows 8 and Windows Server 2012: 3088955. For Windows 8.1 and Windows Server 2012 R2: 3088956. For Windows 10: No hotfix available.Ballot
According to Lee Coward in the comments of the link above, the fix for Windows 10 is part of the following hotfix: support.microsoft.com/en-us/kb/3093266Communal
H
29

This indeed performs excessively poorly, the background GC is not doing you favor here. First thing I noted is that Parallel.ForEach() is using too many tasks. The threadpool manager misinterprets the thread behavior as "bogged down by I/O" and starts extra threads. This makes the problem worse. Workaround for that is:

var options = new ParallelOptions();
options.MaxDegreeOfParallelism = Environment.ProcessorCount;

Parallel.ForEach(dataFrame, options, dr => {
    // etc..
}

This gives better insight in what ails the program from the new diagnostics hub in VS2015. It doesn't take long for only a single core doing any work, easy to tell from the CPU usage. With occasional spikes, they don't last very long, coinciding with an orange GC mark. When you take a closer look at the GC mark you see it is a gen #1 collection. Taking a very long time, about 6 seconds on my machine.

A gen #1 collection of course doesn't take that long, what you see happening here is the gen #1 collection waiting for the background GC to finish its job. In other words, it is actually the background GC that's taking 6 seconds. Background GC can only be effective if the space in the gen #0 and gen #1 segments is large enough to not require a gen #2 collection while the background GC is trundling. Not the way this app works, it eats memory at a very high rate. The little spike you see is multiple tasks getting unblocked, being able to allocate arrays again. Quickly grinding to a halt when a gen #1 collection has to wait for the background GC again.

Notable is that the allocation pattern of this code is very unfriendly to the GC. It interleaves long-lived arrays (dr.DerivedValues) with short-lived arrays (tempArray). Giving the GC lots of work when it compacts the heap, every single allocated array is going to end up getting moved.

The apparent flaw in the .NET 4.6 GC is that the background collection never seems to effectively compact the heap. It looks like it does the job over and over again, as though the previous collection didn't compact at all. Whether this is by design or a bug is hard to tell, I don't have a clean 4.5 machine anymore. I'm certainly leaning towards bug. You should report this problem at connect.microsoft.com to have Microsoft take a look at it.


A workaround is very easy to come by, all you have to do is prevent the awkward inter-leaving of long- and short-lived objects. Which you do by pre-allocating them:

    for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { 
        Id = i, Value = r.NextDouble(), 
        DerivedValues = new double[tempArraySize] });

    ...
    Parallel.ForEach(dataFrame, options, dr => {
        var array = dr.DerivedValues;
        for (int j = 0; j < array.Length; j++) array[j] = Math.Pow(dr.Value, j);
        dr.DerivedValuesSum = array.Sum();
    });

And of course by disabling background GC completely.


UPDATE: GC bug confirmed in this blog post. Fix coming soon.


UPDATE: a hotfix was released.


UPDATE: fixed in .NET 4.6.1

Hernardo answered 31/7, 2015 at 13:57 Comment(2)
Thanks for taking a look. Will mark as the answer if there's no others after a day or two. I agree that this instance is easy to optimize away. I was just playing around with unnecessary allocations until I got something to demonstrate what I was seeing in my code. The difference between .NET 4.5 and 4.6 is what surprises me the most here. Will report the problem at connect.microsoft.com. Thanks!Ballot
@MichaelCovelli Please post the Microsoft Connect link here once you reported it so that we can track the issue too.Overblown
H
10

We (and other users) have encountered a similar problem. We worked around it by disabling background GC in the application's app.config. Please see discussion in comments of https://connect.microsoft.com/VisualStudio/Feedback/Details/1594775.

app.config for gcConcurrent (non-concurrent workstation GC)

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcConcurrent enabled="false" />
</runtime>

You can also switch to the server GC, although this approach seems to use more memory (on an unsaturated machine?).

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <startup> 
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5.1" />
    </startup>
<runtime>
    <gcServer enabled="true" />
</runtime>
</configuration>
Harkey answered 3/8, 2015 at 20:36 Comment(1)
Both of these workaround fix it. Switching to server GC uses more memory but brings the execution time down to 5 seconds on my machine. Setting gcConcurrent to false makes the app take about 10 seconds - the same amount that it used to take in .NET 4.5 in VS2013.Ballot
C
5

It looks like the problem has been addressed now, see http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx

Communal answered 13/8, 2015 at 22:20 Comment(3)
This hotfix: support.microsoft.com/en-us/kb/3088957 was just released and totally fixes the issue.Ballot
The hotfix version is different depending on the version of Windows. Based on a comment from blogs.msdn.com/b/maoni/archive/2015/08/12/…, I believe we have the following. For Windows Vista, Windows 7, Windows Server 2008, and Windows Server 2008 R2: 3088957. For Windows 8 and Windows Server 2012: 3088955. For Windows 8.1 and Windows Server 2012 R2: 3088956. For Windows 10: No hotfix available.Ballot
According to Lee Coward in the comments of the link above, the fix for Windows 10 is part of the following hotfix: support.microsoft.com/en-us/kb/3093266Communal

© 2022 - 2024 — McMap. All rights reserved.