Comparing two collections for equality irrespective of the order of items in them
Asked Answered
V

21

186

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

In my case, two collections would be equal if they both contain the same items (no matter the order).

Example:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

An example I can think of that would be wrong is:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?


The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).

Vanegas answered 8/9, 2008 at 16:36 Comment(3)
How about algorithm? All answer related by compare something, generic lists compare linq etc. Really did we promised to someone that we will never use algorithm as an old fashioned programmer?Michaelmas
You are not checking for Equality you are checking for Equivalence. It's nitpicky but an important distinction. And a long time ago. This is a good Q+A.Schaerbeek
You may be interested in this post, which discusses a tuned version of the dictionary-based method described below. One issue with most simple dictionary approaches is that they don't handle nulls properly because .NET's Dictionary class doesn't permit null keys.Jude
C
129

It turns out Microsoft already has this covered in its testing framework: CollectionAssert.AreEquivalent

Remarks

Two collections are equivalent if they have the same elements in the same quantity, but in any order. Elements are equal if their values are equal, not if they refer to the same object.

Using reflector, I modified the code behind AreEquivalent() to create a corresponding equality comparer. It is more complete than existing answers, since it takes nulls into account, implements IEqualityComparer and has some efficiency and edge case checks. plus, it's Microsoft :)

public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    private readonly IEqualityComparer<T> m_comparer;
    public MultiSetComparer(IEqualityComparer<T> comparer = null)
    {
        m_comparer = comparer ?? EqualityComparer<T>.Default;
    }

    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == null)
            return second == null;

        if (second == null)
            return false;

        if (ReferenceEquals(first, second))
            return true;

        if (first is ICollection<T> firstCollection && second is ICollection<T> secondCollection)
        {
            if (firstCollection.Count != secondCollection.Count)
                return false;

            if (firstCollection.Count == 0)
                return true;
        }

        return !HaveMismatchedElement(first, second);
    }

    private bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
    {
        int firstNullCount;
        int secondNullCount;

        var firstElementCounts = GetElementCounts(first, out firstNullCount);
        var secondElementCounts = GetElementCounts(second, out secondNullCount);

        if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
            return true;

        foreach (var kvp in firstElementCounts)
        {
            var firstElementCount = kvp.Value;
            int secondElementCount;
            secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);

            if (firstElementCount != secondElementCount)
                return true;
        }

        return false;
    }

    private Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
    {
        var dictionary = new Dictionary<T, int>(m_comparer);
        nullCount = 0;

        foreach (T element in enumerable)
        {
            if (element == null)
            {
                nullCount++;
            }
            else
            {
                int num;
                dictionary.TryGetValue(element, out num);
                num++;
                dictionary[element] = num;
            }
        }

        return dictionary;
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        if (enumerable == null) throw new 
            ArgumentNullException(nameof(enumerable));

        int hash = 17;

        foreach (T val in enumerable)
            hash ^= (val == null ? 42 : m_comparer.GetHashCode(val));

        return hash;
    }
}

Sample usage:

var set = new HashSet<IEnumerable<int>>(new[] {new[]{1,2,3}}, new MultiSetComparer<int>());
Console.WriteLine(set.Contains(new [] {3,2,1})); //true
Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false

Or if you just want to compare two collections directly:

var comp = new MultiSetComparer<string>();
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false

Finally, you can use your an equality comparer of your choice:

var strcomp = new MultiSetComparer<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true
Corso answered 24/9, 2010 at 20:10 Comment(24)
Thanks for the answer, I didn't know Microsoft had it covered. For the matter, I actually use a variant of the answer on top which allows me to define how elements are compared for equality and checks for nulls.Vanegas
No problem. It's also easy to add an IEqualityComparer<T> parameter to the above implementation in order to support a custom equality definition for T - In GetElementCounts(), simply use the Dictionary ctor that accepts IEqualityComparer<T>.Corso
NUnit also has a CollectionAssert.AreEquivalent() method. I'm curious which one came first, MS's or NUnit's.Qatar
MS's appears to date back to visual studio 2005... don't know about NUnitCorso
@Portman - Sorry, I completed forgot about your query ! The most straightforwad use would be comparing two collections yourself: new CollectionComparer<int>().Equals(intList1, intList2). There are also many collections that take an IEqualityComparer as a ctor parameter, used to define what equality means in the scope of that collection. For example, see: msdn.microsoft.com/en-us/library/ms132072.aspxCorso
This is nice because it works on IEnumerable. With CollectionAssert.AreEqual and CollectionAssert.AreEquivalent, you may have to convert an IEnumerable into an ICollection. For instance, CollectionAssert.AreEquivalent( new HashSet<int>(new [] {1, 2, 3}), new HashSet<int>(new [] {1, 2, 3})); won't compile, but CollectionAssert.AreEquivalent( new HashSet<int>(new [] {1, 2, 3}).ToList(), new HashSet<int>(new [] {1, 2, 3}).ToList()); will.Changeable
I'm not 100% sure but I think your answer violates Microsoft's terms of use against reverse engineering.Juryman
This isnt the correct solution . This solution compares hash codes of objects , but PLEASE NOTICE that when implementing an object hash code override method , you base your hash only the non-changing fields (such as ID) and not on all fields , for that you have equals . So this method is not good for situtation where the collections contains objects with several fields .Gazo
@JamesRoeiter please read Eric Lippert's Guidelines and rules for GetHashCode: blogs.msdn.com/b/ericlippert/archive/2011/02/28/… - specifically the section "Rule: equal items have equal hashes"Corso
Hello Ohad, Please read the following long debate in the topic , #371828 If you change object hashcode , while its in a hashset it will interrupt with the hashset proper action and might cause an exception . The rule is as following : If two objects are equals - they must have same hash code. If two objects have same hashcode - it isnt a must for them to be equal. Hashcode must stay same for entire object's lifetime! Thats why you impelment ICompareable and IEqualrity .Gazo
And also , in Eric's article , he holds the same statement as I am : "If two objects are equal then they must have the same hash code; or, equivalently, if two objects have different hash codes then they must be unequal." But it does not say that if they have same hash codes they must be equal , equalirtiy is relative , it might be same entity but not same state. Your answer is used by the test framework in end case where all objects state is final , it is not a good solution for runtime. Regards , JamesGazo
@JamesRoeiter Perhaps my comment was misleading. When a dictionary encounters a hashcode it already contains, it checks for actual equality with an EqualityComparer (either the one you supplied or EqualityComparer.Default, you can check Reflector or the reference source to verify this). True, if objects change (and specifically thier hashcode changes) while this method is running then the results are unexpected, but that just means this method is not thread safe in this context.Corso
I have to disagree , I think , it means that in order for that method to work you have to implement get hashcode in the wrong way -- based on all the object's fields because the purpose of this method is to check if two sequences are equal in-depth. So if you used this method you had to impelment gethashcode in a way that will cause failures in your system. While the correct way to implement gethashcode is only based on the non-changing object's fields such as entity ID which will have no use when comparing in-depth equality of two sequences .Gazo
@JamesRoeiter Suppose x and y are two objects we want to compare. If they have different hashcodes, we know they are different (because equal items have equal hashcodes), and the above implementation is correct. If they have the same hashcode, the dictionary implementation will check for actual equality using the specified EqualityComparer (or EqualityComparer.Default if none was specified) and again the implementation is correct.Corso
If the collections contain the same values but in a different order then they are not equal - they are equivalent - if the method name was public bool AreEquivalent(IEnumerable<T> first, IEnumerable<T> second) then there is no debate.Schaerbeek
@CADbloke the method has to be named Equals because of the IEqualityComparer<T> interface. What you should be looking at is the name of the comparer itself. In this case it's MultiSetComparer which makes sense.Corso
Ah, yes, sorry. I overlooked the Interface implementation requiring you to call it Equals. Thanks.Schaerbeek
The GetHashCode by Microsoft here is optimized for collision rates and not performance of GetHashCode call itself (by ordering the enumerable in the GetHashCode method it's bound to be on the slower side). You should always think of your data and decide for yourself. If the ordering logic is going to be slower then just list.Sum(x => x.GetHashCode()) is good to go (though results in higher collisions since summing is not a good hash code). I say test it for your data.Haslam
The GetHashCode implementation contains a bug: it will fail if val is null (i.e. if the collection contains a null element).Dendrology
As user nawfal indicated, you might simply sum the hash codes, as addition is commutative. It seems you want the extra hash collision protection provided by your own hash method, but then you also might want to include nulls in the hash calculation. For example, hash = hash * 23 + (val != null ? val.GetHashCode() : 42).Dendrology
@OhadSchneider thank you kindly for your answer. but how would i actually use your answer to to compare two lists of custom objects (ignoring order). I have already created an IEqualityComparer for these two objects. (in my case they are 3dPoints with an x,y, and z value). clarification much appreciated.Emeliaemelin
@BKSpurgeon I added a couple of samples to the answer, hope that clears up your question. I also added a constructor that allows you to pass in your IEqualityComparer<T> (see the last sample). Alternatively, you could have your class implement IEquatable<T> (or less preferably Equals and GetHashCode) so that when the default generic equality comparer is used by the internal dictionary (msdn.microsoft.com/en-us/library/x525za90(v=vs.110).aspx), your implementation will be used (msdn.microsoft.com/en-us/library/ms224763(v=vs.110).aspx).Corso
GetHashCode is still not correct. The OrderBy attempts to avoid the element ordering problem, but it causes two other problems: it assumes that there is an order (example), and it assumes that the default order is compatible with the custom item equality comparer (example). You'd need a commutative hash that delegates to the custom equality comparer hashes to properly fix GetHashCode.Inglorious
@StephenCleary good points, fixed as you suggested (using xor as the commutative hash).Corso
E
111

A simple and fairly efficient solution is to sort both collections and then compare them for equality:

bool equal = collection1.OrderBy(i => i).SequenceEqual(
                 collection2.OrderBy(i => i));

This algorithm is O(N*logN), while your solution above is O(N^2).

If the collections have certain properties, you may be able to implement a faster solution. For example, if both of your collections are hash sets, they cannot contain duplicates. Also, checking whether a hash set contains some element is very fast. In that case an algorithm similar to yours would likely be fastest.

Eisk answered 8/9, 2008 at 17:7 Comment(5)
You just have to add a using System.Linq; first to make it workNomenclator
if this code is within a loop and collection1 gets updated and collection2 remains untouched, notice even when both collections have the same object, debugger would show false for this "equal" variable.Nomenclator
@Chaulky - I believe the OrderBy is needed. See: dotnetfiddle.net/jA8iwEWycliffite
Which was the other answer referred to as "above"? Possibly https://mcmap.net/q/102085/-comparing-two-collections-for-equality-irrespective-of-the-order-of-items-in-them ?Pyrometer
Take care to chose the comparison carefully. Elements may not be re-ordered at all if e.g. two elements are comparably the same (but are different objects).Prurigo
P
34

Create a Dictionary "dict" and then for each member in the first collection, do dict[member]++;

Then, loop over the second collection in the same way, but for each member do dict[member]--.

At the end, loop over all of the members in the dictionary:

    private bool SetEqual (List<int> left, List<int> right) {

        if (left.Count != right.Count)
            return false;

        Dictionary<int, int> dict = new Dictionary<int, int>();

        foreach (int member in left) {
            if (dict.ContainsKey(member) == false)
                dict[member] = 1;
            else
                dict[member]++;
        }

        foreach (int member in right) {
            if (dict.ContainsKey(member) == false)
                return false;
            else
                dict[member]--;
        }

        foreach (KeyValuePair<int, int> kvp in dict) {
            if (kvp.Value != 0)
                return false;
        }

        return true;

    }

Edit: As far as I can tell this is on the same order as the most efficient algorithm. This algorithm is O(N), assuming that the Dictionary uses O(1) lookups.

Pyuria answered 8/9, 2008 at 17:0 Comment(6)
This is almost what I want. However, I'd like to be able to do this even if I am not using integers. I'd like to use reference objects, but they do not behave properly as keys in dictionaries.Vanegas
Mono, your question is moot if your Items are not comparable. If they cannot be used as keys in Dictionary, there is no solution available.Kan
I think Mono meant the keys are not sortable. But Daniel's solution is clearly intended to be implemented with a hashtable, not a tree, and will work as long as there's an equivalence test and a hash function.Depone
Upvoted of course for the help, but not accepted since it's missing an important point (which I cover in my answer).Vanegas
FWIW, you can simplify your last foreach loop and return statement with this: return dict.All(kvp => kvp.Value == 0);Insufficient
If the collection contains null elements the dictionary throws an exception, because dictionary keys cannot be null.Chemosh
V
17

This is my (heavily influenced by D.Jennings) generic implementation of the comparison method (in C#):

/// <summary>
/// Represents a service used to compare two collections for equality.
/// </summary>
/// <typeparam name="T">The type of the items in the collections.</typeparam>
public class CollectionComparer<T>
{
    /// <summary>
    /// Compares the content of two collections for equality.
    /// </summary>
    /// <param name="foo">The first collection.</param>
    /// <param name="bar">The second collection.</param>
    /// <returns>True if both collections have the same content, false otherwise.</returns>
    public bool Execute(ICollection<T> foo, ICollection<T> bar)
    {
        // Declare a dictionary to count the occurence of the items in the collection
        Dictionary<T, int> itemCounts = new Dictionary<T,int>();

        // Increase the count for each occurence of the item in the first collection
        foreach (T item in foo)
        {
            if (itemCounts.ContainsKey(item))
            {
                itemCounts[item]++;
            }
            else
            {
                itemCounts[item] = 1;
            }
        }

        // Wrap the keys in a searchable list
        List<T> keys = new List<T>(itemCounts.Keys);

        // Decrease the count for each occurence of the item in the second collection
        foreach (T item in bar)
        {
            // Try to find a key for the item
            // The keys of a dictionary are compared by reference, so we have to
            // find the original key that is equivalent to the "item"
            // You may want to override ".Equals" to define what it means for
            // two "T" objects to be equal
            T key = keys.Find(
                delegate(T listKey)
                {
                    return listKey.Equals(item);
                });

            // Check if a key was found
            if(key != null)
            {
                itemCounts[key]--;
            }
            else
            {
                // There was no occurence of this item in the first collection, thus the collections are not equal
                return false;
            }
        }

        // The count of each item should be 0 if the contents of the collections are equal
        foreach (int value in itemCounts.Values)
        {
            if (value != 0)
            {
                return false;
            }
        }

        // The collections are equal
        return true;
    }
}
Vanegas answered 8/9, 2008 at 19:18 Comment(2)
Nice job, but Note: 1. In contrast to Daniel Jennings solution, This is not O(N) but rather O(N^2), because of the find function inside the foreach loop on the bar collection; 2. You can generalize the method to accept IEnumerable<T> instead of ICollection<T> with no further modification to the codeCorso
The keys of a dictionary are compared by reference, so we have to find the original key that is equivalent to the "item" - this is not true. The algorithm is based on wrong assumptions and while works, it is terribly inefficient.Taxaceous
S
13

You could use a Hashset. Look at the SetEquals method.

Skiver answered 4/10, 2008 at 4:38 Comment(2)
of course, using a HashSet assumes no duplicates but if so HashSet is the best way to goBuchalter
As ToHashSet() is now built into Linq, the SetEquals() solution can be written as a very compact, efficient one-liner: collection1.ToHashSet().SetEquals(collection2). That doesn't support duplicates, but it's easily the shortest answer, it requires no external libraries, and it even runs in amortized O(n) time.Rancourt
M
8
static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
    var setXOR = new HashSet<T>(set1);
    setXOR.SymmetricExceptWith(set2);
    return (setXOR.Count == 0);
}

Solution requires .NET 3.5 and the System.Collections.Generic namespace. According to Microsoft, SymmetricExceptWith is an O(n + m) operation, with n representing the number of elements in the first set and m representing the number of elements in the second. You could always add an equality comparer to this function if necessary.

Mellophone answered 24/6, 2016 at 0:18 Comment(2)
Interesting and rare fact. Thanks for the knowledgeDaddylonglegs
Best answer here, concise, correct and fast. Should be upvoted.Turin
F
8

If you use Shouldly, you can use ShouldAllBe with Contains.

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1.ShouldAllBe(item=>collection2.Contains(item)); // true

And finally, you can write an extension.

public static class ShouldlyIEnumerableExtensions
{
    public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
    {
        list.ShouldAllBe(l => equivalent.Contains(l));
    }
}

UPDATE

A optional parameter exists on ShouldBe method.

collection1.ShouldBe(collection2, ignoreOrder: true); // true
Fortnightly answered 14/11, 2017 at 15:9 Comment(2)
I've just found on latest version that there is a parameter bool ignoreOrder on ShouldBe method.Fortnightly
Fantastic reference to Shouldly.Thaumatology
T
7

EDIT: I realized as soon as I posed that this really only works for sets -- it will not properly deal with collections that have duplicate items. For example { 1, 1, 2 } and { 2, 2, 1 } will be considered equal from this algorithm's perspective. If your collections are sets (or their equality can be measured that way), however, I hope you find the below useful.

The solution I use is:

return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;

Linq does the dictionary thing under the covers, so this is also O(N). (Note, it's O(1) if the collections aren't the same size).

I did a sanity check using the "SetEqual" method suggested by Daniel, the OrderBy/SequenceEquals method suggested by Igor, and my suggestion. The results are below, showing O(N*LogN) for Igor and O(N) for mine and Daniel's.

I think the simplicity of the Linq intersect code makes it the preferable solution.

__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect    
1024, 0, 0, 0    
2048, 0, 0, 0    
4096, 31.2468, 0, 0    
8192, 62.4936, 0, 0    
16384, 156.234, 15.6234, 0    
32768, 312.468, 15.6234, 46.8702    
65536, 640.5594, 46.8702, 31.2468    
131072, 1312.3656, 93.7404, 203.1042    
262144, 3765.2394, 187.4808, 187.4808    
524288, 5718.1644, 374.9616, 406.2084    
1048576, 11420.7054, 734.2998, 718.6764    
2097152, 35090.1564, 1515.4698, 1484.223
Theomania answered 18/6, 2009 at 23:20 Comment(3)
The only issue with this code is that it only works when comparing value types or comparing the pointers to reference types. I could have two different instances of the same object in the collections, so I need to be able to specify how to compare each. Can you pass a comparison delegate to the intersect method?Vanegas
Sure, you can pass a comparer delegate. But, note the above limitation regarding sets that I added, which puts a significant limit on its applicability.Theomania
The Intersect method returns a distinct collection. Given a = {1,1,2} and b ={2,2,1}, a.Intersect(b).Count() != a.Count, which causes your expression to correctly return false. {1,2}.Count != {1,1,2}.Count See link[/link] (Note that both sides are made distinct before comparison.)Undercarriage
C
5

In the case of no repeats and no order, the following EqualityComparer can be used to allow collections as dictionary keys:

public class SetComparer<T> : IEqualityComparer<IEnumerable<T>> 
where T:IComparable<T>
{
    public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
    {
        if (first == second)
            return true;
        if ((first == null) || (second == null))
            return false;
        return first.ToHashSet().SetEquals(second);
    }

    public int GetHashCode(IEnumerable<T> enumerable)
    {
        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + val.GetHashCode();

        return hash;
    }
}

Here is the ToHashSet() implementation I used. The hash code algorithm comes from Effective Java (by way of Jon Skeet).

Corso answered 1/4, 2010 at 23:51 Comment(2)
What is the point of Serializable for Comparer class? :o Also you can change the input to ISet<T> to express it is meant for sets (ie no duplicates).Haslam
@Haslam thanks, don't know what I was thinking when I marked it Serializable... As for ISet, the idea here was to treat the IEnumerable as a set (because you got an IEnumerable to begin with), though considering the 0 upvotes in over 5 years that may not have been the sharpest idea :PCorso
S
4

Why not use .Except()

// Create the IEnumerable data sources.
string[] names1 = System.IO.File.ReadAllLines(@"../../../names1.txt");
string[] names2 = System.IO.File.ReadAllLines(@"../../../names2.txt");
// Create the query. Note that method syntax must be used here.
IEnumerable<string> differenceQuery =   names1.Except(names2);
// Execute the query.
Console.WriteLine("The following lines are in names1.txt but not names2.txt");
foreach (string s in differenceQuery)
     Console.WriteLine(s);

http://msdn.microsoft.com/en-us/library/bb397894.aspx

Simulated answered 8/8, 2011 at 16:6 Comment(3)
Except won't work for counting duplicate items. It will return true for sets {1,2,2} and {1,1,2}.Qatar
@CristiDiaconescu you could do a ".Distinct()" first to remove any duplicatesSimulated
The OP is asking for [1, 1, 2] != [1, 2, 2] . Using Distinct would make them look equal.Qatar
E
2

A duplicate post of sorts, but check out my solution for comparing collections. It's pretty simple:

This will perform an equality comparison regardless of order:

var list1 = new[] { "Bill", "Bob", "Sally" };
var list2 = new[] { "Bob", "Bill", "Sally" };
bool isequal = list1.Compare(list2).IsSame;

This will check to see if items were added / removed:

var list1 = new[] { "Billy", "Bob" };
var list2 = new[] { "Bob", "Sally" };
var diff = list1.Compare(list2);
var onlyinlist1 = diff.Removed; //Billy
var onlyinlist2 = diff.Added;   //Sally
var inbothlists = diff.Equal;   //Bob

This will see what items in the dictionary changed:

var original = new Dictionary<int, string>() { { 1, "a" }, { 2, "b" } };
var changed = new Dictionary<int, string>() { { 1, "aaa" }, { 2, "b" } };
var diff = original.Compare(changed, (x, y) => x.Value == y.Value, (x, y) => x.Value == y.Value);
foreach (var item in diff.Different)
  Console.Write("{0} changed to {1}", item.Key.Value, item.Value.Value);
//Will output: a changed to aaa

Original post here.

Engelbert answered 29/4, 2010 at 20:35 Comment(0)
C
2

This simple solution forces the IEnumerable's generic type to implement IComparable. Because of OrderBy's definition.

If you don't want to make such an assumption but still want use this solution, you can use the following piece of code :

bool equal = collection1.OrderBy(i => i?.GetHashCode())
   .SequenceEqual(collection2.OrderBy(i => i?.GetHashCode()));
Cheder answered 19/12, 2018 at 19:20 Comment(0)
F
1

Here's my extension method variant of ohadsc's answer, in case it's useful to someone

static public class EnumerableExtensions 
{
    static public bool IsEquivalentTo<T>(this IEnumerable<T> first, IEnumerable<T> second)
    {
        if ((first == null) != (second == null))
            return false;

        if (!object.ReferenceEquals(first, second) && (first != null))
        {
            if (first.Count() != second.Count())
                return false;

            if ((first.Count() != 0) && HaveMismatchedElement<T>(first, second))
                return false;
        }

        return true;
    }

    private static bool HaveMismatchedElement<T>(IEnumerable<T> first, IEnumerable<T> second)
    {
        int firstCount;
        int secondCount;

        var firstElementCounts = GetElementCounts<T>(first, out firstCount);
        var secondElementCounts = GetElementCounts<T>(second, out secondCount);

        if (firstCount != secondCount)
            return true;

        foreach (var kvp in firstElementCounts)
        {
            firstCount = kvp.Value;
            secondElementCounts.TryGetValue(kvp.Key, out secondCount);

            if (firstCount != secondCount)
                return true;
        }

        return false;
    }

    private static Dictionary<T, int> GetElementCounts<T>(IEnumerable<T> enumerable, out int nullCount)
    {
        var dictionary = new Dictionary<T, int>();
        nullCount = 0;

        foreach (T element in enumerable)
        {
            if (element == null)
            {
                nullCount++;
            }
            else
            {
                int num;
                dictionary.TryGetValue(element, out num);
                num++;
                dictionary[element] = num;
            }
        }

        return dictionary;
    }

    static private int GetHashCode<T>(IEnumerable<T> enumerable)
    {
        int hash = 17;

        foreach (T val in enumerable.OrderBy(x => x))
            hash = hash * 23 + val.GetHashCode();

        return hash;
    }
}
Foretop answered 11/3, 2012 at 21:1 Comment(3)
How well does this perform, any ideas?Haslam
I only use this for small collections, so have not thought about Big-O complexity or done benchmarking. HaveMismatchedElements alone is O(M*N) so it may not perform well for large collections.Foretop
If IEnumerable<T>s are queries then calling Count() is not a good idea. Ohad's original answer's approach of checking if they are ICollection<T> is the better idea.Haslam
B
1

Here is a solution which is an improvement over this one.

public static bool HasSameElementsAs<T>(
        this IEnumerable<T> first, 
        IEnumerable<T> second, 
        IEqualityComparer<T> comparer = null)
    {
        var firstMap = first
            .GroupBy(x => x, comparer)
            .ToDictionary(x => x.Key, x => x.Count(), comparer);

        var secondMap = second
            .GroupBy(x => x, comparer)
            .ToDictionary(x => x.Key, x => x.Count(), comparer);

        if (firstMap.Keys.Count != secondMap.Keys.Count)
            return false;

        if (firstMap.Keys.Any(k1 => !secondMap.ContainsKey(k1)))
            return false;

        return firstMap.Keys.All(x => firstMap[x] == secondMap[x]);
    }
Bistort answered 29/5, 2017 at 18:27 Comment(0)
I
1

Based on this answer of a duplicate question, and the comments below the answer, and @brian-genisio answer I came up with these:

        public static bool AreEquivalentIgnoringDuplicates<T>(this IEnumerable<T> items, IEnumerable<T> otherItems)
        {
            var itemList = items.ToList();
            var otherItemList = otherItems.ToList();
            var except = itemList.Except(otherItemList);
            return itemList.Count == otherItemList.Count && except.IsEmpty();
        }

        public static bool AreEquivalent<T>(this IEnumerable<T> items, IEnumerable<T> otherItems)
        {
            var itemList = items.ToList();
            var otherItemList = otherItems.ToList();
            var except = itemList.Except(otherItemList);
            return itemList.Distinct().Count() == otherItemList.Count && except.IsEmpty();
        }

Tests for these two:

        [Test]
        public void collection_with_duplicates_are_equivalent()
        {
            var a = new[] {1, 5, 5};
            var b = new[] {1, 1, 5};

            a.AreEquivalentIgnoringDuplicates(b).ShouldBe(true); 
        }

        [Test]
        public void collection_with_duplicates_are_not_equivalent()
        {
            var a = new[] {1, 5, 5};
            var b = new[] {1, 1, 5};

            a.AreEquivalent(b).ShouldBe(false); 
        }
Ianteen answered 11/5, 2021 at 11:49 Comment(0)
C
0

erickson is almost right: since you want to match on counts of duplicates, you want a Bag. In Java, this looks something like:

(new HashBag(collection1)).equals(new HashBag(collection2))

I'm sure C# has a built-in Set implementation. I would use that first; if performance is a problem, you could always use a different Set implementation, but use the same Set interface.

Cordiform answered 8/9, 2008 at 17:5 Comment(0)
D
0

There are many solutions to this problem. If you don't care about duplicates, you don't have to sort both. First make sure that they have the same number of items. After that sort one of the collections. Then binsearch each item from the second collection in the sorted collection. If you don't find a given item stop and return false. The complexity of this: - sorting the first collection: NLog(N) - searching each item from second into the first: NLOG(N) so you end up with 2*N*LOG(N) assuming that they match and you look up everything. This is similar to the complexity of sorting both. Also this gives you the benefit to stop earlier if there's a difference. However, keep in mind that if both are sorted before you step into this comparison and you try sorting by use something like a qsort, the sorting will be more expensive. There are optimizations for this. Another alternative, which is great for small collections where you know the range of the elements is to use a bitmask index. This will give you a O(n) performance. Another alternative is to use a hash and look it up. For small collections it is usually a lot better to do the sorting or the bitmask index. Hashtable have the disadvantage of worse locality so keep that in mind. Again, that's only if you don't care about duplicates. If you want to account for duplicates go with sorting both.

Dory answered 11/10, 2008 at 20:20 Comment(0)
G
0

In many cases the only suitable answer is the one of Igor Ostrovsky , other answers are based on objects hash code. But when you generate an hash code for an object you do so only based on his IMMUTABLE fields - such as object Id field (in case of a database entity) - Why is it important to override GetHashCode when Equals method is overridden?

This means , that if you compare two collections , the result might be true of the compare method even though the fields of the different items are non-equal . To deep compare collections , you need to use Igor's method and implement IEqualirity .

Please read the comments of me and mr.Schnider's on his most voted post.

James

Gazo answered 30/8, 2013 at 23:30 Comment(0)
W
0

Allowing for duplicates in the IEnumerable<T> (if sets are not desirable\possible) and "ignoring order" you should be able to use a .GroupBy().

I'm not an expert on the complexity measurements, but my rudimentary understanding is that this should be O(n). I understand O(n^2) as coming from performing an O(n) operation inside another O(n) operation like ListA.Where(a => ListB.Contains(a)).ToList(). Every item in ListB is evaluated for equality against each item in ListA.

Like I said, my understanding on complexity is limited, so correct me on this if I'm wrong.

public static bool IsSameAs<T, TKey>(this IEnumerable<T> source, IEnumerable<T> target, Expression<Func<T, TKey>> keySelectorExpression)
    {
        // check the object
        if (source == null && target == null) return true;
        if (source == null || target == null) return false;

        var sourceList = source.ToList();
        var targetList = target.ToList();

        // check the list count :: { 1,1,1 } != { 1,1,1,1 }
        if (sourceList.Count != targetList.Count) return false;

        var keySelector = keySelectorExpression.Compile();
        var groupedSourceList = sourceList.GroupBy(keySelector).ToList();
        var groupedTargetList = targetList.GroupBy(keySelector).ToList();

        // check that the number of grouptings match :: { 1,1,2,3,4 } != { 1,1,2,3,4,5 }
        var groupCountIsSame = groupedSourceList.Count == groupedTargetList.Count;
        if (!groupCountIsSame) return false;

        // check that the count of each group in source has the same count in target :: for values { 1,1,2,3,4 } & { 1,1,1,2,3,4 }
        // key:count
        // { 1:2, 2:1, 3:1, 4:1 } != { 1:3, 2:1, 3:1, 4:1 }
        var countsMissmatch = groupedSourceList.Any(sourceGroup =>
                                                        {
                                                            var targetGroup = groupedTargetList.Single(y => y.Key.Equals(sourceGroup.Key));
                                                            return sourceGroup.Count() != targetGroup.Count();
                                                        });
        return !countsMissmatch;
    }
Wilkens answered 9/8, 2018 at 20:7 Comment(0)
T
0

If comparing for the purpose of Unit Testing Assertions, it may make sense to throw some efficiency out the window and simply convert each list to a string representation (csv) before doing the comparison. That way, the default test Assertion message will display the differences within the error message.

Usage:

using Microsoft.VisualStudio.TestTools.UnitTesting;

// define collection1, collection2, ...

Assert.Equal(collection1.OrderBy(c=>c).ToCsv(), collection2.OrderBy(c=>c).ToCsv());

Helper Extension Method:

public static string ToCsv<T>(
    this IEnumerable<T> values,
    Func<T, string> selector,
    string joinSeparator = ",")
{
    if (selector == null)
    {
        if (typeof(T) == typeof(Int16) ||
            typeof(T) == typeof(Int32) ||
            typeof(T) == typeof(Int64))
        {
            selector = (v) => Convert.ToInt64(v).ToStringInvariant();
        }
        else if (typeof(T) == typeof(decimal))
        {
            selector = (v) => Convert.ToDecimal(v).ToStringInvariant();
        }
        else if (typeof(T) == typeof(float) ||
                typeof(T) == typeof(double))
        {
            selector = (v) => Convert.ToDouble(v).ToString(CultureInfo.InvariantCulture);
        }
        else
        {
            selector = (v) => v.ToString();
        }
    }

    return String.Join(joinSeparator, values.Select(v => selector(v)));
}
Tangible answered 26/8, 2019 at 20:29 Comment(0)
I
0

Here's my stab at the problem. It's based on this strategy but also borrows some ideas from the accepted answer.

public static class EnumerableExtensions
{
    public static bool SequenceEqualUnordered<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> second)
    {
        return SequenceEqualUnordered(source, second, EqualityComparer<TSource>.Default);
    }
   
    public static bool SequenceEqualUnordered<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
    {
        if (source == null)
            throw new ArgumentNullException(nameof(source));

        if (second == null)
            throw new ArgumentNullException(nameof(second));

        if (source.TryGetCount(out int firstCount) && second.TryGetCount(out int secondCount))
        {
            if (firstCount != secondCount)
                return false;

            if (firstCount == 0)
                return true;
        }

        IEqualityComparer<ValueTuple<TSource>> wrapperComparer = comparer != null ? new WrappedItemComparer<TSource>(comparer) : null;

        Dictionary<ValueTuple<TSource>, int> counters;
        ValueTuple<TSource> key;
        int counter;

        using (IEnumerator<TSource> enumerator = source.GetEnumerator())
        {
            if (!enumerator.MoveNext())
                return !second.Any();

            counters = new Dictionary<ValueTuple<TSource>, int>(wrapperComparer);

            do
            {
                key = new ValueTuple<TSource>(enumerator.Current);

                if (counters.TryGetValue(key, out counter))
                    counters[key] = counter + 1;
                else
                    counters.Add(key, 1);
            }
            while (enumerator.MoveNext());
        }

        foreach (TSource item in second)
        {
            key = new ValueTuple<TSource>(item);

            if (counters.TryGetValue(key, out counter))
            {
                if (counter <= 0)
                    return false;

                counters[key] = counter - 1;
            }
            else
                return false;
        }

        return counters.Values.All(cnt => cnt == 0);
    }

    private static bool TryGetCount<TSource>(this IEnumerable<TSource> source, out int count)
    {
        switch (source)
        {
            case ICollection<TSource> collection:
                count = collection.Count;
                return true;
            case IReadOnlyCollection<TSource> readOnlyCollection:
                count = readOnlyCollection.Count;
                return true;
            case ICollection nonGenericCollection:
                count = nonGenericCollection.Count;
                return true;
            default:
                count = default;
                return false;
        }
    }

    private sealed class WrappedItemComparer<TSource> : IEqualityComparer<ValueTuple<TSource>>
    {
        private readonly IEqualityComparer<TSource> _comparer;

        public WrappedItemComparer(IEqualityComparer<TSource> comparer)
        {
            _comparer = comparer;
        }

        public bool Equals(ValueTuple<TSource> x, ValueTuple<TSource> y) => _comparer.Equals(x.Item1, y.Item1);

        public int GetHashCode(ValueTuple<TSource> obj) => _comparer.GetHashCode(obj.Item1);
    }
}

Improvements on the MS solution:

  • Doesn't take the ReferenceEquals(first, second) shortcut because it's kind of debatable. For example, consider a custom IEnumerable<T> which has an implementation like this: public IEnumerator<T> GetEnumerator() => Enumerable.Repeat(default(T), new Random().Next(10)).GetEnumerator().
  • Takes possible shortcuts when both enumerable is a collection but checks not only for ICollection<T> but also for other collection interfaces.
  • Handles null values properly. Counting null values separately from the other (non-null) values also doesn't look 100% fail-safe. Consider a custom equality comparer which handles null values in a non-standard way.

This solution is also available in my utility NuGet package.

Intern answered 29/6, 2022 at 12:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.