Finding symmetric difference with LINQ
Asked Answered
A

3

24

I have two collections a and b. I would like to compute the set of items in either a or b, but not in both (a logical exclusive or). With LINQ, I can come up with this:

IEnumerable<T> Delta<T>(IEnumerable<T> a, IEnumerable<T> b)
{
    return a.Except (b).Union (b.Except (a));
}

I wonder if there are other more efficient or more compact ways of producing the difference between the two collections.

Edit 1: Jon Skeet posted a first solution which does not preserve the order of the items by relying on a HashSet. I wonder if there are other approaches which would preserve the order of a and b in the output.

Abnormity answered 26/5, 2010 at 5:31 Comment(2)
What if a or b contain duplicates?Steeplejack
In my case, a and b do not contain duplicates, so this is not a concern for me.Abnormity
H
30

Use HashSet<T> directly - it has a SymmetricExceptWith method:

HashSet<T> data = new HashSet<T>(a);
data.SymmetricExceptWith(b);

EDIT: If you want to maintain the order, here's an alternative:

HashSet<T> data = new HashSet<T>(a);
data.IntersectWith(b);
foreach (T t in a.Concat(b))
{
    if (!data.Contains(t))
    {
        yield return t;
    }
}

This has the following important differences:

  • Both a and b are iterated over twice. In some cases that could be a very bad thing - you could call ToList on each of them to start with to retain a buffer.
  • If there are duplicates in either a or b, they will be yielded multiple times. If you wanted to avoid this you could keep a set of already-yielded values. At this point, it would be equivalent to:

    a.Concat(b).Except(a.Intersect(b))
    

That's still only two set operations instead of the three in your original code though.

Hewlett answered 26/5, 2010 at 5:39 Comment(3)
Thanks Jon for your quick reply. HashSet work fine as long as you are not interested in the original order of the items. What if I want to keep the order of the items in a and b in the difference?Abnormity
@Pierre: I've edited my answer with another couple of options.Hewlett
Thanks a lot for your time. In my case, a and b do not contain duplicates, so this is not a concern. The LINQ expression you propose is much more readable (and therefore maintainable) than the piece of code involving the HashSet. I like it!Abnormity
S
6

Given a.Except(b) and b.Except(a) are disjoint, you can use concat instead of union, saving a set operator (and concat is more efficient).

return a.Except (b).Concat (b.Except (a));

This still runs through each list twice.

Steeplejack answered 26/5, 2010 at 6:26 Comment(1)
Thank you; you are right, Concat will be faster than Union since my inputs are disjoint; I had overlooked that point.Abnormity
I
1

We had a similar need for a project in my company, so we wrote this extension:

public class EnumerablePair<T> : IReadOnlyCollection<T>
{
    private IReadOnlyCollection<T> _Left;
    private IReadOnlyCollection<T> _Right;
    private IEnumerable<T> _Union;
    private int _Count;
    public EnumerablePair(IEnumerable<T> left, IEnumerable<T> right)
    {
        _Left = left?.ToList() ?? Enumerable.Empty<T>().ToList();
        _Right = right?.ToList() ?? Enumerable.Empty<T>().ToList();
        _Count = Left.Count + Right.Count;
        _Union = Left.Union(Right);
    }

    public int Count => _Count;
    public IReadOnlyCollection<T> Left { get => _Left; }
    public IReadOnlyCollection<T> Right { get => _Right; }

    public IEnumerator<T> GetEnumerator()
    {
        return _Union.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return _Union.GetEnumerator();
    }
}

public static class EnumerableExtension
{
    public static EnumerablePair<T> ExclusiveDisjunction<T>(this IEnumerable<T> leftOperand, IEnumerable<T> rightOperand, IEqualityComparer<T> comparer = null)
    {
        if (leftOperand == null)
            throw new ArgumentNullException(nameof(leftOperand), $"{nameof(leftOperand)} is null.");
        if (rightOperand == null)
            throw new ArgumentNullException(nameof(rightOperand), $"{nameof(rightOperand)} is null.");

        // TODO : Can be optimized if one of the IEnumerable parameters is empty.

        bool leftIsBigger = leftOperand.Count() > rightOperand.Count();
        var biggestOperand = leftIsBigger ? leftOperand.ToList() : rightOperand.ToList();
        var smallestOperand = leftIsBigger ? rightOperand.ToList() : leftOperand.ToList();

        var except1 = biggestOperand.ToList();
        var except2 = Enumerable.Empty<T>().ToList();

        Func<T, T, bool> areEquals;
        if (comparer != null)
            areEquals = (one, theOther) => comparer.Equals(one, theOther);
        else
            areEquals = (one, theOther) => one?.Equals(theOther) ?? theOther == null;

        foreach (T t in smallestOperand)
            if (except1.RemoveAll(item => areEquals(item, t)) == 0)
                except2.Add(t);

        if (leftIsBigger)
            return new EnumerablePair<T>(except1, except2);
        return new EnumerablePair<T>(except2, except1);
    }
}

It compares elements of two collections (using an IEqualityComparer or not, at your choice).

  • The returned object, an EnumerablePair<T>, contains objects that are in leftOperand or rightOperand, but not both (XOR).
  • EnumerablePair<T>.Left contains objects that are in leftOperand but not in rightOperand.
  • EnumerablePair<T>.Right contains objects that are in rightOperand but not in leftOperand.

You can use the extension like this :

var xorList = list1.ExclusiveDisjunction(list2);
var leftXor = xorList.Left;
var rightXor = xorList.Right;

xorList, leftXor and rightXor are IEnumerable<T>.

Insolvency answered 3/8, 2017 at 9:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.