parallel linq: AsParallel().forAll() nulls some objects
Asked Answered
J

2

5

So, I've got a very weird situation here where it seems a forAll() plinq-query removes some of my custom objects and, to be honest, I have no clue why.

var myArticles = data.FilterCustomerArticles([]params]).ToList(); //always returns 201 articles

result.Articles = new List<ArticleMinimal>();

try
{
    myArticles.AsParallel().ForAll(article =>
                    {
                        result.Articles.Add(new ArticleMinimal()
                        {
                            ArticleNumber = article.ArticleNumber,
                            Description = article.Description,
                            IsMaterial = false,
                            Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
                            Quantity = 1,
                            ValidFrom = new DateTime(1900, 1, 1),
                            ValidTo = new DateTime(2222, 1, 1)
                        });
                    });

}
catch (Exception ex)
{
    ...
}

The Code above returns different result counts nearly every time I call it. It should return 201 ArticleMinimal-Objects. Instead, it returns 200, 189, 19x... and from time to time 201, though. There happens no Exception, nothing. It just returns less objects than it should.

After changing the code to "good ol'" classy foreach-loop, I always get the expected 201 Objects.

Working Code:

var myArticles = data.FilterCustomerArticles([]params]).ToList(); //always returns 201 articles

result.Articles = new List<ArticleMinimal>();

try
{
    foreach (var article in myArticles) { 
        result.Articles.Add(new ArticleMinimal()
                        {
                            ArticleNumber = article.ArticleNumber,
                            Description = article.Description,
                            IsMaterial = false,
                            Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
                            Quantity = 1,
                            ValidFrom = new DateTime(1900, 1, 1),
                            ValidTo = new DateTime(2222, 1, 1)
                        });
    }

}
catch (Exception ex)
{
    ...
}

Additionally, after some more Lines of code, I have another forAll like this:

try
{
    result.Articles.AsParallel().ForAll(article =>
                {
                    if (article.Weight != null){
                        ...
                    }
                });
}
catch (Exception)
{
    ...
}

Using the first forAll, this throws a NullReferenceException - imho, cause it expects some 201 Objects, but some Listentries are null.

Now my actual Question is: Why is it, that the first forAll returns less objects than it should?! Only clue I could think of is the inline declaration of new ArticleMinimal(){ ...}); - but even if that's the cause it seems weird to me. Is it not possible to do this while using plinq? I'm just guessing here.

Hope you could help.

Best regards, Dom

Jone answered 8/1, 2016 at 10:9 Comment(4)
You're manipulating a shared object (the result.Articles collection) from multiple threads, if this object is not thread-safe this will likely corrupt the object.Countrywoman
Now, having commented that, and left my own answer, do you really need to do this in parallel? 201 articles, and thus 201 simple object constructions should not take much time at all, why complicate the matter? Personally I would just use the code from my answer and remove .AsParallel() and leave the rest as is until you think this is a performance problem. There could be things happening in the constructor of ArticleMinimal that takes time but then perhaps the name Minimal is incorrect?Countrywoman
Thanks for your answer! The 201 articles are just my dev-test data. Talking about higher numbers for the actual live system. But valid question :)Jone
OK, there is no way for anyone here to say that you should or should not use parallelism here so if you're aware of the difference and the possibilities, that's the only thing that matters.Countrywoman
C
9

You cannot manipulate result.Articles from many threads as this will likely corrupt the internals, as you are observing.

Instead turn your parallel workflow into a pipeline that returns the created objects:

result.Articles.AddRange(myArticles.AsParallel().Select(article =>
    new ArticleMinimal()
    {
        ArticleNumber = article.ArticleNumber,
        Description = article.Description,
        IsMaterial = false,
        Price = article.PortionPrice.HasValue ? article.PortionPrice.Value : decimal.Zero,
        Quantity = 1,
        ValidFrom = new DateTime(1900, 1, 1),
        ValidTo = new DateTime(2222, 1, 1)
    })
);

The .Select here, since it is being executed on ParallelQuery returned by .AsParallel() will run in parallel on the items.

The .AddRange however will ask for ParallelQuery.GetEnumerator() which will return the items collected into one long collection, giving you what you want.

The fundamental difference is that .AddRange() will likely not add anything until all the parallel tasks have started to complete, whereas your way, if you add the appropriate locking, will add items to the collection as they are being produced. However, unless you want to observe items flowing into the collection as they are being produced this is unlikely to mean anything in your case.

Countrywoman answered 8/1, 2016 at 10:15 Comment(2)
See my second comment to your question though.Countrywoman
Also see @SchlaWiener's answer for an alternative way if you need it.Countrywoman
C
5

List.Add is not thread safe. Please refer to https://mcmap.net/q/1922667/-correct-way-to-guarantee-thread-safety-when-adding-to-a-list-using-parallel-library

Use either lock

lock (result.Articles)
{
    result.Articles.Add(...);
}

or a thread safe collection. I would use a temporary collection and at the end use result.Articles.AddRange(...)

Chevron answered 8/1, 2016 at 10:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.