How to generate a unique hash for a collection of objects independent of their order [duplicate]
Asked Answered
D

4

6

Let's say I have a class

public class MyClass
{
    public string Type { get; set; }
    public int Id { get; set; }
}

and I have a collection class that is simply a strongly typed List

public class MyClassList : List<MyClass>
{
    public MyClassList(IEnumerable<MyClass> enumerable) : base (enumerable) {}
}

I want MyClassList to be able to generate a unique hash-code for MyClassList based on the contents. The hash-code of MyClass should be based on both properties. The hash-code of MyClassList should be the same even if the order of the objects is different.

To handle the ordering issue I was thinking I could order the list before generating the hash-code, but I'm not sure how to generate the hash-code of the list.

Deltoro answered 22/10, 2013 at 15:1 Comment(11)
if you are just looking for a unique ID you can use Guid.NewGuid();Annalisaannalise
@Annalisaannalise Note the word hashHighams
Your comment about sorting implies that you want the hash code to be the same regardless of the sequence of elements in the list. Is this true?Kalynkam
Assumming uniqueness of the MyClass.ID, you could sort on that. For the hash calculation itself, you'll need to treat all entities in your list as the single input. You could expose a byte[] or stream from MyClass and use that to calculate the hash?Robertoroberts
@Kalynkam That's right, so if I have 2 collections and the only difference is that the order is different, the hash should be the same for both of them.Deltoro
Does the hash code have to be unique? Typically hash codes are only used as a quick first test for equality (followed by a second, more thorough test) and don't have to be unique.Bourse
@DStanley they should probably be unique, but let's say for the moment that they don't have to be. Is the solution simpler if they have to be unique (using the Guid solution outlined earlier)?Deltoro
No - it will be harder to ensure uniqueness because you have to consider what combinations of elements could generate the same hash code.Bourse
I think the most important question to ask here is "Why?" Why do you want to do this?Glochidium
The hash code can't be completely unique, by the pigeonhole principle: there are 2^32 possible hashes, and (2^32)^n possible ways to have n MyClass.Ids (not even counting the strings).Discoverer
Remember that the HashCode of objects containing the same values must be the same. If you create 2 lists and put the same values in them, you have to make sure the hash codes are the same for those 2 lists. In this case, even if the order is different. Not sure how a GUID would help you much in that circumstance.Kalynkam
K
5

For optimal performance I would try to avoid iterating the whole collection every time GetHashCode is called. The purpose of GetHashCode is to improve performance to a point better than evaluating every element. So I might try maintaining the hash code when elements in the list are changed like this.

class Program
{
  static void Main(string[] args)
  {
     MyClassList l = new MyClassList() { new MyClass() {Type="Bob", Id=1}, new MyClass() {Type="Jones", Id=2}};
     MyClassList l2 = new MyClassList() { new MyClass() { Type = "Jones", Id = 2 }, new MyClass() { Type = "Bob", Id = 1 } };
     MyClassList l3 = new MyClassList() { new MyClass() { Type = "Jones", Id = 2 }};
     Console.WriteLine("{0} {1} {2}", l.GetHashCode(), l2.GetHashCode(), l3.GetHashCode());
     l3.Add(new MyClass() { Type = "Bob", Id = 1 });
     Console.WriteLine("{0}", l3.GetHashCode());
  }
}

public class MyClass
{
  public string Type { get; set; }
  public int Id { get; set; }
  public override int GetHashCode()
  {
     return (Type.GetHashCode() % 0x8000) | (int)((uint)Id.GetHashCode() & 0xFFFF0000);
  }
}

public class MyClassList : IList<MyClass>
{
  List<MyClass> internalList;
  int hashCode = 0;

  public MyClassList()
  {
     internalList = new List<MyClass>();
  }

  private void IncludeInHash(MyClass item)
  {
     hashCode ^= item.GetHashCode();
  }

  private void ExcludeFromHash(MyClass item)
  {
     IncludeInHash(item);
  }

  public override int GetHashCode()
  {
     return hashCode;
  }

  public int IndexOf(MyClass item)
  {
     return internalList.IndexOf(item);
  }

  public void Insert(int index, MyClass item)
  {
     internalList.Insert(index, item);
     // Make sure Insert is successful (doesn't throw an exception) before affecting the hash
     IncludeInHash(item);
  }

  public void RemoveAt(int index)
  {
     MyClass reduce = internalList[index];
     internalList.RemoveAt(index);
     // Make sure RemoveAt is successful before affecting the hash
     ExcludeFromHash(reduce);
  }

  public MyClass this[int index]
  {
     get
     {
        return internalList[index];
     }
     set
     {
        MyClass reduce = internalList[index];
        internalList[index] = value;
        // Make sure these happen atomically; don't allow exceptions to prevent these from being accurate.
        ExcludeFromHash(reduce);
        IncludeInHash(value);
     }
  }

  public void Add(MyClass item)
  {
     internalList.Add(item);
     IncludeInHash(item);
  }

  public void Clear()
  {
     internalList.Clear();
     hashCode = 0;
  }

  public bool Contains(MyClass item)
  {
     return internalList.Contains(item);
  }

  public void CopyTo(MyClass[] array, int arrayIndex)
  {
     internalList.CopyTo(array, arrayIndex);
  }

  public int Count
  {
     get { return internalList.Count; }
  }

  public bool IsReadOnly
  {
     get { return false; }
  }

  public bool Remove(MyClass item)
  {
     if (internalList.Remove(item))
     {
        ExcludeFromHash(item);
        return true;
     }
     else
        return false;
  }

  public IEnumerator<MyClass> GetEnumerator()
  {
     return internalList.AsReadOnly().GetEnumerator();
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
     return GetEnumerator();
  }
}
Kalynkam answered 22/10, 2013 at 15:41 Comment(0)
B
1

The solution given by clto works. Here is an alternative: sort the list by some total ordering (any ordering will do, as long as it is unambiguous). Then you can calculate the hash code using any normal means. You don't need order-independence. You could even use a cryptographic hash function.

Badman answered 22/10, 2013 at 15:26 Comment(0)
B
0

I propose this solution (I didn't implement the Equals method) :

public class MyClass
{
    public string Type { get; set; }
    public int Id { get; set; }

    public override int GetHashCode()
    {
        int hash = 17;
        hash = hash + 23 * this.Type.GetHashCode();
        hash = hash + 23 * this.Id.GetHashCode();
        return hash;
    }
}

public class MyClassList : List<MyClass>
{
    public MyClassList(IEnumerable<MyClass> enumerable) : base(enumerable) { }

    public override int GetHashCode()
    {
        return this.Aggregate(17, (state, current) => state * 23 + current.GetHashCode());
    }
}

The way to generate the hashcode is inspired from Microsoft method to compute the hash value for anonymous objects.

Biochemistry answered 22/10, 2013 at 15:18 Comment(4)
That's an order dependent comparer, not an order independent comparer.Spier
and he wants the hashcode of MyClass to be based on both fieldsThreedimensional
oops, I was too quick to answer... I edited my answer to take in account clcto's comment. With my solution, the GetHashCode method should create a temporary sorted list, and compute the hash on it.Biochemistry
Evaluating the entire list doesn't give GetHashCode much of an advantage over actually comparing all the individual values.Kalynkam
S
0

If the order isn't important then you should use a collection that inherently is a set, rather than a list.

Also, it's generally best to not inherit from collections; use composition instead.

So for a collection you can use a HashSet, as it will have set semantics.

To have MyClass use both properties as it's identity just override it's equals and get hash code implementations, or create an IComparer<MyClass> if you can't or don't want to do that.

public class MyClass:IEquatable<MyClass>
{
    public string Type { get; set; }
    public int Id { get; set; }

    public override bool Equals(object obj)
    {
        return Equals(obj as MyClass);
    }

    public bool Equals(MyClass other)
    {
        if (other == null)
            return false;

        return Type == other.Type &&
            Id == other.Id;
    }

    public override int GetHashCode()
    {
        return Type.GetHashCode() * 79 + Id;
    }
}

Then your collection is as simple as:

HashSet<MyClass> set = new HashSet<MyClass>();

And if you want to compare various sets just use:

HashSet<MyClass>.CreateSetComparer();
Spier answered 22/10, 2013 at 15:30 Comment(2)
order not mattering and not allowing duplicates are not the same.Threedimensional
@Threedimensional While it's conceivable he wants a bag instead, given that there is no library implementation of a bag I'd wait to see if a set is in fact unacceptable before trying to avoid using it. If he isn't going to have duplicate values then it will work much more effectively than a list based solution.Spier

© 2022 - 2024 — McMap. All rights reserved.