Inserting into an unordered_set with custom hash function
Asked Answered
G

2

33

I have the following code to make an unordered_set<Interval>. This compiles fine.

struct Interval {
  unsigned int begin;
  unsigned int end;
  bool updated;   //true if concat.  initially false
  int patternIndex;  //pattern index. valid for single pattern
  int proteinIndex;   //protein index.  for retrieving the pattern
};

struct Hash {
  size_t operator()(const Interval &interval);
};

size_t Hash::operator()(const Interval &interval){
  string temp = to_string(interval.begin) + to_string(interval.end) + to_string(interval.proteinIndex);
  return hash<string>()(temp);
}

unordered_set<Interval, string, Hash> test;

However, I cannot compile when I try to insert using this code:

for(list<Interval>::iterator i = concat.begin(); i != concat.end(); ++i){
  test.insert((*i));
}

Moreover, I cannot determine what the problem is from the error messages, for example:

note: candidate is:
note: size_t Hash::operator()(const Interval&)
note:   candidate expects 1 argument, 2 provided  

I thought I only provided 1 argument...

What is the problem with my insertion code?


Here's the new instantiation code: unordered_set<Interval, Hash> test; However, I'm still receiving a slew of error messages, for example:

note: candidate is:
note: size_t Hash::operator()(const Interval&) <near match>
note:   no known conversion for implicit ‘this’ parameter from ‘const Hash*’ to ‘Hash*’
Guardado answered 7/4, 2013 at 23:24 Comment(2)
#17016675Ongoing
@CiroSantilli刘晓波死六四事件法轮功 did you even read the question before commenting that? No you did not.Plautus
G
48

First problem:

You are passing string as the second template argument for your instantiation of the unordered_set<> class template. The second argument should be the type of your hasher functor, and std::string is not a callable object.

Perhaps meant to write:

unordered_set<Interval, /* string */ Hash> test;
//                      ^^^^^^^^^^^^
//                      Why this?

Also, I would suggest using names other than begin and end for your (member) variables, since those are names of algorithms of the C++ Standard Library.

Second problem:

You should keep in mind, that the hasher function should be qualified as const, so your functor should be:

struct Hash {
   size_t operator() (const Interval &interval) const {
   //                                           ^^^^^
   //                                           Don't forget this!
     string temp = to_string(interval.b) + 
                   to_string(interval.e) + 
                   to_string(interval.proteinIndex);
     return (temp.length());
   }
};

Third problem:

Finally, if you want std::unordered_set to be able to work with objects of type Interval, you need to define an equality operator consistent with your hash function. By default, if you do not specify any type argument as the third parameter of the std::unordered_set class template, operator == will be used.

You currently do not have any overload of operator == for your class Interval, so you should provide one. For example:

inline bool operator == (Interval const& lhs, Interval const& rhs)
{
    return (lhs.b == rhs.b) && 
           (lhs.e == rhs.e) && 
           (lhs.proteinIndex == rhs.proteinIndex); 
}

Conclusion:

After all the above modifications, your code becomes:

#include <string>
#include <unordered_set>
#include <list>

using namespace std;

struct Interval {
  unsigned int b;
  unsigned int e;
  bool updated;   //true if concat.  initially false
  int patternIndex;  //pattern index. valid for single pattern
  int proteinIndex;   //protein index.  for retrieving the pattern
};

bool operator == (Interval const& lhs, Interval const& rhs)
{
    return (lhs.b == rhs.b) && (lhs.e == rhs.e) && (lhs.proteinIndex == rhs.proteinIndex); 
}

struct Hash {
   size_t operator()(const Interval &interval) const {
     string temp = to_string(interval.b) + to_string(interval.e) + to_string(interval.proteinIndex);
     return (temp.length());
   }
};

int main()
{
   unordered_set<Interval, Hash> test;
  
  list<Interval> concat;
  for(list<Interval>::iterator i = concat.begin(); i != concat.end(); ++i){
    test.insert(*i);
  }

}
Grillo answered 7/4, 2013 at 23:30 Comment(7)
Is there a way to wrap operator == in a struct? As it is, I get errors about multiple definitions of ==. If I declare it in a struct in my header file and define it in my .cpp file (as I'm currently doing for other operators), it gives me errors about it requiring exactly one argument. Ex: bool IntervalEquality::operator==(const Interval&, const Interval&)’ must take exactly one argumentGuardado
@user2052561: The problem with multiple definition is probably happens because you have that function definition in one header and you are importing it in multiple translation units (i.e. #includeing it from several .cpp files). Just use the inline keyword to suppress this problem (as I show in my updated answer).Grillo
@user2052561: Alternatively, if you want to overload it as a member function, then you should remove the lhs parameter, which is implicitly. The this pointer will refer to the left hand Interval object.Grillo
It's worth noting that this hash function can produce lots of easily-avoidable collisions – an Interval with b == 12 and e == 3 will produce the same hash as an Interval with b == 1 and e == 23.Embolden
@ildjarn: Sure, that's true. My hash function does not have any ambition to be a good hash function - I just wanted to point out what was missing conceptually and provide a minimal working filler.Grillo
Understood – I guess my comment was directed more towards the OP than your answer, sorry for posting it in the wrong place. :-PEmbolden
@AndyProwl, I am not very sure whether the hash function and equal function are defined in the customer's class as what java does. Is there a better way that makes these functions related with the class closer ?Ragout
C
3

I think, Andy Prowl perfectly fixed the problems with your code. However, I would add the following member function to your Interval, which describes what makes two intervals identical:

std::string getID() const { return std::to_string(b) + " " + std::to_string(e) + " " + std::to_string(proteinIndex); }

Please note, that I also followed Andy Prowl's suggestion and renamed the members begin to b and end to e. Next, you can easily define the hash and comparison functions by using lambda expressions. As a result, you can define your unordered_set as follows:

auto hash = [](const Interval& i){ return std::hash<std::string>()(i.getID()); };
auto equal = [](const Interval& i1, const Interval& i2){ return i1.getID() == i2.getID(); };
std::unordered_set<Interval, decltype(hash), decltype(equal)> test(8, hash, equal);

Finally, for reasons of readability, I converted your for loop into a range-based for loop:

std::list<Interval> concat {{1, 2, false, 3, 4}, {2, 3, false, 4, 5}, {1, 2, true, 7, 4}};

for (auto const &i : concat)
    test.insert(i);

for (auto const &i : test)
    std::cout << i.b << ", " << i.e << ", " << i.updated << std::endl;

Output (I just printed first three members of each Interval):

2, 3, 0
1, 2, 0

As you can see, there are only two intervals printed. The third one ({1, 2, true, 7, 4}) was not inserted to test, because its b, e, and proteinIndexare equal to that of the first interval ({1, 2, false, 3, 4}).

Code on Ideone

Chemisette answered 8/2, 2019 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.