Why does Interlocked.Increment give an incorrect result in a Parallel.ForEach loop?
Asked Answered
B

1

10

I have a migration job and I need to validate the target data when done. To notify the admin of the success/failure of validations, I use a counter to compare the number of rows from table Foo in Database1 to the number of rows from table Foo in Database2.

Each row from Database2 is validated against the corresponding row in Database1. To speed up the process, I use a Parallel.ForEach loop.

My initial problem was that the count was always different from what I expected. I later found that the += and -= operations are not thread-safe (not atomic). To correct the problem, I updated the code to use Interlocked.Increment on the counter variable. This code prints a count that is closer to the actual count, but still, it seems to be different on each execution and it doesn't give the result I expect :

Private countObjects As Integer

Private Sub MyMainFunction()
    Dim objects As List(Of MyObject)

    'Query with Dapper, unrelevant to the problem.
    Using connection As New System.Data.SqlClient.SqlConnection("aConnectionString")
        objects = connection.Query("SELECT * FROM Foo") 'Returns around 81000 rows.
    End Using

    Parallel.ForEach(objects, Sub(u) MyParallelFunction(u))

    Console.WriteLine(String.Format("Count : {0}", countObjects)) 'Prints "Count : 80035" or another incorrect count, which seems to differ on each execution of MyMainFunction.
End Sub

Private Sub MyParallelFunction(obj As MyObject)
    Interlocked.Increment(countObjects) 'Breakpoint Hit Count is at around 81300 or another incorrect number when done.

    'Continues executing unrelated code using obj...
End Sub

After some experiments with other ways of making the increment thread-safe, I found that wrapping the increment in a SyncLock on a dummy reference object gives the expected result :

Private countObjects As Integer
Private locker As SomeType

Private Sub MyMainFunction()
    locker = New SomeType()
    Dim objects As List(Of MyObject)

    'Query with Dapper, unrelevant to the problem.
    Using connection As New System.Data.SqlClient.SqlConnection("aConnectionString")
        objects = connection.Query("SELECT * FROM Foo") 'Returns around 81000 rows.
    End Using

    Parallel.ForEach(objects, Sub(u) MyParallelFunction(u))

    Console.WriteLine(String.Format("Count : {0}", countObjects)) 'Prints "Count : 81000".
End Sub

Private Sub MyParallelFunction(obj As MyObject)
    SyncLock locker
        countObjects += 1 'Breakpoint Hit Count is 81000 when done.
    End SyncLock

    'Continues executing unrelated code using obj...
End Sub

Why doesn't the first code snippet work as expected? The most confusing thing is the Breakpoint Hit Count giving unexpected results.

Is my understanding of Interlocked.Increment or of atomic operations flawed? I would prefer not to use SyncLock on a dummy object, and I hope there's a way to do it cleanly.

Update:

  • I run the example in Debug mode on Any CPU.
  • I use ThreadPool.SetMaxThreads(60, 60) upper in the stack, because I'm querying an Access database at some point. Could this cause a problem?
  • Could the call to Increment mess with the Parallel.ForEach loop, forcing it to exit before all tasks are done?

Update 2 (Methodology):

  • My tests are executed with code as close as possible to what is displayed here, with the exception of object types and query string.
  • The query always give the same number of results, and I always verify objects.Count on a breakpoint before continuing to Parallel.ForEach.
  • The only code that changes in between the executions is Interlocked.Increment replaced by SyncLock locker and countObjects += 1.

Update 3

I created a SSCCE by copying my code in a new console app and replacing external classes and code.

This is the Main method of the console app:

Sub Main()
    Dim oClass1 As New Class1
    oClass1.MyMainFunction()
End Sub

This is the definition of Class1:

Imports System.Threading

Public Class Class1

    Public Class Dummy
        Public Sub New()
        End Sub
    End Class

    Public Class MyObject
        Public Property Id As Integer

        Public Sub New(p_Id As Integer)
            Id = p_Id
        End Sub
    End Class

    Public Property countObjects As Integer
    Private locker As Dummy

    Public Sub MyMainFunction()
        locker = New Dummy()
        Dim objects As New List(Of MyObject)

        For i As Integer = 1 To 81000
            objects.Add(New MyObject(i))
        Next

        Parallel.ForEach(objects, Sub(u As MyObject)
                                      MyParallelFunction(u)
                                  End Sub)

        Console.WriteLine(String.Format("Count : {0}", countObjects)) 'Interlock prints an incorrect count, different in each execution. SyncLock prints the correct count.
        Console.ReadLine()
    End Sub

    'Interlocked
    Private Sub MyParallelFunction(ByVal obj As MyObject)
        Interlocked.Increment(countObjects)
    End Sub

    'SyncLock
    'Private Sub MyParallelFunction(ByVal obj As MyObject)
    '    SyncLock locker
    '        countObjects += 1
    '    End SyncLock
    'End Sub

End Class

I still note the same behavior when switching MyParallelFunction from Interlocked.Increment to SyncLock.

Backdate answered 16/12, 2013 at 19:45 Comment(14)
The Interlocked version should be working. I just gave it a try and indeed - it is working. Are you sure the same number of rows is returned from the database in both cases?Lindgren
The code snippet you provided will work correctly. Are you sure that provided code reflects the real problem what happens?Binnacle
lock will slow down code and may delay/remove failures to update DB due too many operations happening at once... Any chance you are ignoring all exceptions in your code and part of iteration just does not happen?Hearty
Interlocked.Increment() works, and works as you understand it. The problem is most likely in the code that we cannot see. An SSCCE is what is needed.Topography
Also, what is the connection.Query()?Binnacle
The only change in code between both snippets is that Interlocked.Increment is replaced by a SyncLock on a dummy object. I commented/uncommented and executed both multiple times before posting the question, as I couldn't believe it myself. The query gives the same number of results (the query never changes and the database is isolated). The SyncLock approach always give the correct result, while Interlocked.Increment blows my mind.Backdate
connection.Query is Dapper. Pretty much irrelevant to the problem.Backdate
Things irrelevant to the problem should not be here. You will get a good answer when you provide the SSCCE.Topography
Does access to countObjects in MyMainFunction also need to be protected to ensure that the Console.WriteLine call isn't dumping a stale value? Some type of memory barrier needed there or a volatile read?Certificate
I provided the SSCCE, tested it in an new console app, and I still noted the same behavior. As it seems we all agree on the documented behavior of Interlocked.Increment, I can't see why it doesn't work as expected. I think the question should be reopened, because the problem is still there and the SSCCE might allow others to reproduce it.Backdate
most of the time the SSCCE returns 81000. I've made three modifications to it now, and seem to be able to reproduce the issue - 1) Wrap the call to MyMainFunction in Main with a While True loop, 2) Set countObjects to 0 before the Parallel.ForEach and 3) only do the console code If countObjects <> 81000. Those modifications may help anyone else trying to diagnose the issue. (it'll repeatedly run the code until the issue shows up)Socioeconomic
Actually, the SSCCE is different to your original code. I can make it never fail if I make countObjects a field rather than a property. TBH, I'm surprised that you're allowed to call Interlocked.Increment on a property without any kind of warning from the compiler. Why did you make it a property in the SSCCE?Socioeconomic
The SSCCE reflects the actual code. The first code snippet was written from scratch to show the issue, but I mistakenly put a field instead of a property (not that I would have thought it would make any difference anyway). The actual code uses a property. Is there a reason why Interlocked.Increment would have a different behavior on a property?Backdate
@Damien_The_Unbeliever: Interlocked.Increment takes a ref to an int or a long. Not sure how this could compileOutdoors
S
10

Interlocked.Increment on a property will always be broken. Effectively, the VB compiler re-writes it as:

Value = <value from Property>
Interlocked.Increment(Value)
<Property> = Value

Thus breaking any threading guarantees provided by Increment. Change it to be a field. VB will re-write any property passed as a ByRef parameter to code that resembles the above.

Socioeconomic answered 17/12, 2013 at 15:2 Comment(6)
So, is this specificity either missing from the documentation, a bug in the .Net framework, or is it implied that a property shouldn't be used for atomic operations somewhere?Backdate
@BenjaminBeaulieu: VB always let programmers do stupid things and apparently still does.Outdoors
@BenjaminBeaulieu - I'd describe it as an unfortunate confluence of several features. VB tries to make it easy to use Properties where you would normally need to use a variable or field. You cannot write the same code in C# without explicitly doing the re-write I show above.Socioeconomic
@Benjamin: It should be obvious that properties cannot be used atomically. They have a setter and a getter, so any update to a property involves, at minimum, two distinct operations.Belew
@BenVoigt: Since VB wraps properties to provide almost the same behavior as fields, and since I'm normally using them this way, without really thinking about it like I would in C#, I couldn't put my finger on it. I agree with others that the compiler should at least give a warning.Backdate
@BenjaminBeaulieu: Yes, I didn't mean to say that the wrongness of that code is glaringly obvious. But once you ask the question "Can property updates be atomic?", the answer to that should be obvious.Belew

© 2022 - 2024 — McMap. All rights reserved.