Why can't I retrieve an item from a HashSet without enumeration?
Asked Answered
Q

12

36

I'm looking for insight into the heads of HashSet designers. As far as I am aware, my question applies to both Java and C# HashSets, making me think there must be some good reason for it, though I can't think of any myself.

After I have inserted an item into a HashSet, why is it impossible to retrieve that item without enumeration, hardly an efficient operation? Especially since a HashSet is explicitly built in a way which supports efficient retrieval.

It would often be useful to me to have Remove(x) and Contains(x) return the actual item that is being removed or contained. This is not necessarily the item I pass into the Remove(x) or Contains(x) function. Sure, I guess I could achieve the same effect through a HashMap but why waste all that space and effort when it should be perfectly possible to do this with a set?

I can appreciate that there may be some design concerns that adding this functionality would allows uses of HashSet which are not consistent with their role or future role in the framework, but if this is so, what are these design concerns?

Edit

To answer some more questions, here are more details:

I am using an immutable reference type with overridden hashcode, equals, etc to emulate a value type in C#. Let's say the type has members A, B, and C. Hashcode, equals, etc depend only on A and B. Given some A and B I want to be able to retrieve that equivalent item from a hashset and get it's C. I won't be able to use HashSet for this it appears, but I would at least like to know if there is any good reason for this. Pseudo code follows:

public sealed class X{
 object A;
 object B;
 object extra;

 public int HashCode(){
  return A.hashCode() + B.hashCode();
 }

 public bool Equals(X obj){
  return obj.A == A && obj.B == B;
 }
}

hashset.insert(new X(1,2, extra1));
hashset.contains(new X(1,2)); //returns true, but I can't retrieve extra
Quietude answered 29/9, 2009 at 20:41 Comment(5)
"why is it impossible to retrieve that item without enumeration" Could you clarify what you mean here, do you mean get(), contains() is O(n) in your case?Croesus
Sure :) I meant that I cannot retrieve the exact reference I put into the set without enumeration. There is no get() operator for HashSet, and contains() takes an argument which presumably evaluates to equal to the reference you put in, but may not be the exact reference you put in. Hope that clears it up.Quietude
In that case, you can implement your equals() as return this == obj - I.e. to check for only the same reference. Without object creation control, this is a high price to pay. And object creation control would may resolve the issue alone.Croesus
You are missing that the hash is not necessarily unique. It is just and indexing tool.Laicize
This is added to .NET at least (v4.7.2).Fag
S
9

How were you proposing to retrieve the item from the hash set? A set is by definition not ordered in any way and therefore, there is no index with which to use to retrieve the object in question.

Sets, as a concept, are used to test inclusion, i.e. whether or not the element in question is in the hash data set. If you're looking to retrieve a value from a data source using a key value or index, I would suggest looking into either a Map or a List.

EDIT: Additional answer based on the Edit to the original question

Soonil, based on your new information, it looks like you might be interested in implementing your data as a Java Enum, something similar to this:

 public enum SoonilsDataType {
      A, B, C;

      // Just an example of what's possible
      public static SoonilsDataType getCompositeValue(SoonilsDataType item1,
           SoonilsDataType item2) {
           if (item1.equals(A) && 
                     item2.equals(B)) {
                return C;
           }
      }
 }

Enum's automatically inherit values() which returns the list of all values in the enum's "set", which you can use to test inclusion against in the same way as the Set. Also, because its a full class, you can define new static methods to do the composite logic (like I was trying to allude to in the example code). The only thing about the Enum is that you can't add new instances at runtime, which may not be what you want (though if the set's data size isn't going to grow at runtime, the Enum is what you want).

Sondra answered 29/9, 2009 at 20:57 Comment(10)
See my comment to penpen for your answer.Quietude
@Soonil If I understand your comment to penpen, you're using not only object equality (for existence in the Set) but also a identifier (or some other numerical index) of your own devising as a secondary equals and you want this identifier to also be used as a index into the hash set? If this is correct, you seem to be working orthogonal to the nature of hash set and I once again suggest you look into a Map (probably HashMap) as your data source.Sondra
Sure Peter, I've added some details to the end of my question, let me know if that helps.Quietude
Peter, TBH, I don't entirely understand what you are getting at with the enums, but I think our understanding has diverged somewhere :) +1 for the effort though!Quietude
actually I have a use case where it is pretty clear which element we would want to return from the set, and that is the case where there is only one element there. obviously this is probably too specific to include support for it in the framework.Gunflint
@Gunflint I know this is really late, but if you know you have only one element in an enumerable collection (HashSet included) then the extension methods First() and/or Single() are exactly what you want/need.Archaeo
I don't even remember the specific situation where I needed this, but thanks anyway :)Gunflint
@quetzalcoatl if you know you have only one element, his comment was not directed at OP.Squier
@Sondra I am still wondering what you hoped to accomplish with this SoonilsDataType. Especially since the posted code fails, as getCompositeValue does not have a return statement outside of the if statement.Squier
@Wolfzoon: I wrote this comment basing on the title, namely hashset-without-enumeration part, probably I didn't notice that andresp!=sooniln, thanks! :)Humber
S
11

In .Net, what you are probably looking for is KeyedCollection http://msdn.microsoft.com/en-us/library/ms132438.aspx

You can get around the nastiness of re-implementing this abstract class each time with some "generic" cleverness. (See IKeyedObject`1.)

Note: Any data transfer object which implements IKeyedObject`1 should have an overridden GetHashCode method simply returning this.Key.GetHashCode(); and same goes for equals...

My Base Class Library usually ends up with something like this in it:

public class KeyedCollection<TItem> : System.Collections.ObjectModel.KeyedCollection<TItem, TItem>
    where TItem : class
{
    public KeyedCollection() : base()
    {
    }

    public KeyedCollection(IEqualityComparer<TItem> comparer) : base(comparer)
    {
    }

    protected override TItem GetKeyForItem(TItem item)
    {
        return item;
    }
}

public class KeyedObjectCollection<TKey, TItem> : System.Collections.ObjectModel.KeyedCollection<TKey, TItem>
    where TItem : class, IKeyedObject<TKey>
    where TKey : struct
{
    public KeyedCollection() : base()
    {
    }

    protected override TItem GetKeyForItem(TItem item)
    {
        return item.Key;
    }
}

///<summary>
/// I almost always implement this explicitly so the only
/// classes that have access without some rigmarole
/// are generic collections built to be aware that an object
/// is keyed.
///</summary>
public interface IKeyedObject<TKey>
{
    TKey Key { get; }
}
Senter answered 26/1, 2011 at 15:56 Comment(1)
I'm not sure if KeyedCollection<TItem, TItem> should be useful in real.Fag
S
9

How were you proposing to retrieve the item from the hash set? A set is by definition not ordered in any way and therefore, there is no index with which to use to retrieve the object in question.

Sets, as a concept, are used to test inclusion, i.e. whether or not the element in question is in the hash data set. If you're looking to retrieve a value from a data source using a key value or index, I would suggest looking into either a Map or a List.

EDIT: Additional answer based on the Edit to the original question

Soonil, based on your new information, it looks like you might be interested in implementing your data as a Java Enum, something similar to this:

 public enum SoonilsDataType {
      A, B, C;

      // Just an example of what's possible
      public static SoonilsDataType getCompositeValue(SoonilsDataType item1,
           SoonilsDataType item2) {
           if (item1.equals(A) && 
                     item2.equals(B)) {
                return C;
           }
      }
 }

Enum's automatically inherit values() which returns the list of all values in the enum's "set", which you can use to test inclusion against in the same way as the Set. Also, because its a full class, you can define new static methods to do the composite logic (like I was trying to allude to in the example code). The only thing about the Enum is that you can't add new instances at runtime, which may not be what you want (though if the set's data size isn't going to grow at runtime, the Enum is what you want).

Sondra answered 29/9, 2009 at 20:57 Comment(10)
See my comment to penpen for your answer.Quietude
@Soonil If I understand your comment to penpen, you're using not only object equality (for existence in the Set) but also a identifier (or some other numerical index) of your own devising as a secondary equals and you want this identifier to also be used as a index into the hash set? If this is correct, you seem to be working orthogonal to the nature of hash set and I once again suggest you look into a Map (probably HashMap) as your data source.Sondra
Sure Peter, I've added some details to the end of my question, let me know if that helps.Quietude
Peter, TBH, I don't entirely understand what you are getting at with the enums, but I think our understanding has diverged somewhere :) +1 for the effort though!Quietude
actually I have a use case where it is pretty clear which element we would want to return from the set, and that is the case where there is only one element there. obviously this is probably too specific to include support for it in the framework.Gunflint
@Gunflint I know this is really late, but if you know you have only one element in an enumerable collection (HashSet included) then the extension methods First() and/or Single() are exactly what you want/need.Archaeo
I don't even remember the specific situation where I needed this, but thanks anyway :)Gunflint
@quetzalcoatl if you know you have only one element, his comment was not directed at OP.Squier
@Sondra I am still wondering what you hoped to accomplish with this SoonilsDataType. Especially since the posted code fails, as getCompositeValue does not have a return statement outside of the if statement.Squier
@Wolfzoon: I wrote this comment basing on the title, namely hashset-without-enumeration part, probably I didn't notice that andresp!=sooniln, thanks! :)Humber
O
4

If you change an object after it has been inserted, it's hash may have changed (this is especially likely if hashCode() has been overridden). If the hash changes, a lookup of it in the set will fail, as you will be attempting to lookup an object that is hashed at a different location than it is stored in.

Also, you need to make sure you have overridden hashCode and equals in your object if you want to lookup equal objects that are different instances.

Note that this is all for Java - I am assuming C# has something similar, but as it has been several years since I used C#, I will let others speak to it's capabilities.

Outrank answered 29/9, 2009 at 20:46 Comment(1)
This is true and all, but it is perfectly possible at the moment to break HashSet (and all components with this invariant) by hanging onto a reference after you add it to the set. You can use that reference to mutate and invalidate the invariant as you wish. The API already relies on the user to fulfill this contract, returning the object reference would not change this.Quietude
S
3

I imagine the designers of the Set interface and HashSet class wanted to ensure that the remove(Object) method defined on the Collection interface was also applicable to Set; this method returns a boolean denoting whether the object was successfully removed. If the designers wanted to provide functionality whereby remove(Object) returned the "equal" object already in the Set this would mean a different method signature.

Also, given that the object being removed is logically equal to the object passed to remove(Object) it is arguable about the value added in returning the contained object. However, I have had this problem myself before and have used a Map to solve the problem.

Note that in Java, a HashSet uses a HashMap internally and so there isn't additional storage overhead in using a HashMap instead.

Sergeant answered 29/9, 2009 at 21:1 Comment(1)
You are correct about Java :) alas, C# does not use a HashMap to implement a HashSet, and there are some space/time benefits I would like to keep if possible.Quietude
T
3

Why not just use a HashMap<X,X>? This does exactly what you want. Just do .put(x,x) every time and then you can just get the stored element equal to x with .get(x).

Tracheotomy answered 13/9, 2012 at 22:18 Comment(0)
U
3

This was an oversight from the library designers. As I mentioned under another answer, this method has been added to .NET Framework 4.7.2 (and .NET Core 2.0 before it); see HashSet<T>.TryGetValue. Citing the source:

/// <summary>
/// Searches the set for a given value and returns the equal value it finds, if any.
/// </summary>
/// <param name="equalValue">The value to search for.
/// </param>
/// <param name="actualValue">
/// The value from the set that the search found, or the default value
/// of <typeparamref name="T"/> when the search yielded no match.</param>
/// <returns>A value indicating whether the search was successful.</returns>
/// <remarks>
/// This can be useful when you want to reuse a previously stored reference instead of 
/// a newly constructed one (so that more sharing of references can occur) or to look up
/// a value that has more complete data than the value you currently have, although their
/// comparer functions indicate they are equal.
/// </remarks>
public bool TryGetValue(T equalValue, out T actualValue)
Urticaceous answered 7/7, 2018 at 7:53 Comment(0)
R
2

SOLVED. Wishing to find an element seems perfectly valid to me, because the representative used for the search may differ from the found element. This is especially true if elements contain key and value information, and a custom equality comparer compares the key part only. See the code example. The code contains a comparer that implements a custom search and that captures the element found. This requires an instance of the comparer. Clear the reference to the found element. Perform a search by means of Contains. Access the found element. Be aware of multithread issues when sharing the comparer instance.

using System;
using System.Collections.Generic;

namespace ConsoleApplication1 {

class Box
{
    public int Id;
    public string Name;
    public Box(int id, string name)
    {
        Id = id;
        Name = name;
    }
}

class BoxEq: IEqualityComparer<Box>
{
    public Box Element;

    public bool Equals(Box element, Box representative)
    {
        bool found = element.Id == representative.Id;
        if (found)
        {
            Element = element;
        }
        return found;
    }

    public int GetHashCode(Box box)
    {
        return box.Id.GetHashCode();
    }
}

class Program
{
    static void Main()
    {
        var boxEq = new BoxEq();
        var hashSet = new HashSet<Box>(boxEq);
        hashSet.Add(new Box(3, "Element 3"));
        var box5 = new Box(5, "Element 5");
        hashSet.Add(box5);
        var representative = new Box(5, "Representative 5");
        boxEq.Element = null;
        Console.WriteLine("Contains {0}: {1}", representative.Id, hashSet.Contains(representative));
        Console.WriteLine("Found id: {0}, name: {1}", boxEq.Element.Id, boxEq.Element.Name);
        Console.WriteLine("Press enter");
        Console.ReadLine();
    }
}

} // namespace
Renegado answered 11/3, 2016 at 7:30 Comment(0)
H
1

Looks to me like you're actually looking for a Map<X,Y>, where Y is the type of extra1.


(rant below)

The equals and hashCode methods define meaningful object equality. The HashSet class assumes that if two objects are equal as defined by Object.equals(Object) there is no difference between these two objects.

I'd go as far as to say that if the object extra is meaningful, your design is not ideal.

Howrah answered 30/9, 2009 at 17:54 Comment(1)
Agreed with your rant :) The map would work, but I've decided against the extra overhead and will probably roll my own set (since this isn't production code, heh) +1Quietude
P
0

Set objects in those languages were mostly designed as set of value, not for mutable objects. They check that object put in them are unique by using equals. That is why contains and remove returns boolean, not the object: they check for or remove the value you pass to them.

And actually, if you do a contains(X) on a set, and expect to get a different object Y, that would means X and Y are equals (ie X.equals(Y) => true), but somewhat different, which seems wrong.

Patsis answered 29/9, 2009 at 20:55 Comment(2)
Sets are unique according to the comparison method you specify. Simply because I wish to use comparison method A in my set does not mean comparison method B has no value do me when I consider the same objects. Addressing your comments on mutability, see my answer to aperkins.Quietude
Even if objects are immutable, there may still be significant utility to returning a stored instance. Among other things, suppose one has a large number of immutable nested data structures (e.g. parsed from XML documents) and one wishes to replace references to identical-but-unshared data structures with references to shared data structures. A HashMap<T> with a look-up operation that could return the passed-in item would be ideal.Spaceport
Q
0

I was given an interesting suggestion as to a way to use a Map, by having my own objects define themselves as KeyValuePairs. While a good concept, unfortunately KeyValuePair is not an interface (why not?) and is a struct, which shoots that plan out of the air. In the end I will roll my own Set, as my constraints allow me this option.

Quietude answered 1/10, 2009 at 18:54 Comment(0)
L
0

Short answer; because the items cannot be guaranteed to be immutable.

I've hit the exact problem you describe, where the HashCode is based on fixed fields within the member class, but the class holds additional information that can be updated without changing the hash.

My solution was to implement a generic MyHashSet<T> based on ICollection<T> but wrapped round a Dictionary<int, List<T>> to provide the required lookup efficiency, where the int key is the HashCode of T. However, this shows that if the HashCode of the member objects can change then the dictionary lookup followed by equality comparison of items in the list will never find the changed items. There is no mechanism for forcing the members to be immutable so the only solution is to enumerate the lot.

Langsyne answered 14/3, 2012 at 12:53 Comment(0)
A
0

After wondering the same thing, and finely being able to look at the source code:

source: http://referencesource.microsoft.com/#System.Core/System/Collections/Generic/HashSet.cs

A set is a collection of unique items (objects or values). In the .net implementation an item is the same as another item (not unique) if the Equals method of the comparer returns true for the two items. Not if the two items have the same hash code. so a check of the existence of an item is a two step process. first using the hashset to minimize the number of items to compere, then the compression itself.

If you wish to retrieve an item, you must be able to supply the retrieving function with a unique identifier. you might know the hash code of the item you want. but that is not enough. as more than one item can have that same hash. you will also need to supply the item itself so that the Equal method can be called. and clearly if you have the item there is no reason to get it.

One could create a data structure that demands that no two unique items ever return the same hash code. and than you could get an item from it. it will be faster of adding*, and retrieving will be possible if you know the hash. if two items that are not equal but return the same hash are put into it the first will be overwritten. as far as I know this Type doesn't exist in .net , and no this is not the same as a dictionary.

*given that the GetHash method is the same.

Acidforming answered 15/11, 2014 at 7:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.