Python custom set intersection
Asked Answered
F

6

5

So, there exists an easy way to calculate the intersection of two sets via set.intersection(). However, I have the following problem:

class Person(Object):                    
    def __init__(self, name, age):                                                      
        self.name = name                                                                
        self.age = age                                                                  

l1 = [Person("Foo", 21), Person("Bar", 22)]                                             
l2 = [Person("Foo", 21), Person("Bar", 24)]                                             

union_list = list(set(l1).union(l2))                                           
# [Person("Foo", 21), Person("Bar", 22), Person("Bar", 24)]

(Object is a base-class provided by my ORM that implements basic __hash__ and __eq__ functionality, which essentially adds every member of the class to the hash. In other words, the __hash__ returned will be a hash of every element of the class)

At this stage, I would like to run a set intersection operation by .name only, to find, say, Person('Bar', -1).intersection(union_list) #= [Person("Bar", -1), Person("Bar", 22), Person("Bar", 24)]. (the typical .intersection() at this point would not give me anything, I can't override __hash__ or __eq__ on the Person class, as this would override the original set union (I think)

What's the best way to do this in Python 2.x?

EDIT: Note that the solution doesn't have to rely on a set. However, I need to find unions and then intersections, so it feels like this is amenable to a set (but I'm willing to accept solutions that use whatever magic you deem worthy, so long as it solves my problem!)

Foundling answered 30/5, 2012 at 8:48 Comment(11)
I don't understand your desired result. Could you please explain what the result should contain?Effeminize
Err crap, that should be .union, not .intersection. I've updated the question -- let me know if this is clearer?Foundling
I'm still a bit confused since the example code does not have the result you claim.Effeminize
Your example is incorrect - the sets don't work like you think because you didn't define hashing and equality methods on your class.Murther
Oh, of course -- sorry, in my actual code this is a database class and thus the ORM takes care of hashing and whatnot. I'll update my example to reflect thisFoundling
Your other examples don't make sense either. The default intersection would not give [Person("Bar", 24)], it would give l2. And I don't understand why you expect to get [Person("Bar", 22), Person("Bar", 24)] from your operation.Murther
That's what I would like for it to give; I would like to use python's set intersection to intersect on only one member of a class (when the __hash__ and __eq__ functions are already overridden (and I require the behaviour they already have))Foundling
But why would you expect that result? Why wouldn't Person("Foo", 21) be part of the result? I don't understand what your "intersect by name" operation means.Murther
Right, I see where the question is ill-specified. I'll fix this (I swear I'll get this right eventually). - EDIT: Updated.Foundling
Better, but now I don't see why Person('Bar', -1) is not part of the result..Murther
Which ORM is this? Couldn't you do this using your ORM/database?Shantel
F
2

I hate answering my own questions, so I'll hold off on marking this as the 'answer' for a little while yet.

Turns out the way to do this is as follows:

import types
p = Person("Bar", -1)
new_hash_method = lambda obj: hash(obj.name)
p.__hash__ = types.MethodType(new_hash_method, p)
for i in xrange(0, len(union_list)):
    union_list[i].__hash__ = types.MethodType(new_hash_method, union_list[i])
set(union_list).intersection(p)

It's certainly dirty and it relies on types.MethodType, but it's less intensive than the best solution proposed so far (glglgl's solution) as my actual union_list can contain potentially in the order of thousands of items, so this will save me re-creating objects every time I run this intersection procedure.

Foundling answered 31/5, 2012 at 6:58 Comment(2)
Does this actually work though? The documentation indicates that magic methods like __hash__ are looked up on the class, not the instance. docs.python.org/3/reference/datamodel.html#special-lookupCareerism
Actually, looks like it does work for old style classes, but not for new style classes: docs.python.org/2/reference/…Careerism
P
7

Sounds like

>>> class Person:
...     def __init__(self, name, age):
...         self.name = name
...         self.age = age
...     def __eq__(self, other):
...         return self.name == other.name
...     def __hash__(self):
...         return hash(self.name)
...     def __str__(self):
...         return self.name
...
>>> l1 = [Person("Foo", 21), Person("Bar", 22)]
>>> l2 = [Person("Foo", 21), Person("Bar", 24)]
>>> union_list = list(set(l1).union(l2))
>>> [str(l) for l in union_list]
['Foo', 'Bar']

is what you want, since name is your unique key?

Potiche answered 30/5, 2012 at 9:18 Comment(2)
Ah, no, the ORM I'm using already provides a eq and hash method (and, as such, set.union() already produces 'sane' results). I'm looking for a way to do an intersection operation that only uses one of the class's members as the key, and as such can't override __hash__ or __eq__.Foundling
I see, then perhaps glglgl's solution would be suitable?Gnni
F
2

I hate answering my own questions, so I'll hold off on marking this as the 'answer' for a little while yet.

Turns out the way to do this is as follows:

import types
p = Person("Bar", -1)
new_hash_method = lambda obj: hash(obj.name)
p.__hash__ = types.MethodType(new_hash_method, p)
for i in xrange(0, len(union_list)):
    union_list[i].__hash__ = types.MethodType(new_hash_method, union_list[i])
set(union_list).intersection(p)

It's certainly dirty and it relies on types.MethodType, but it's less intensive than the best solution proposed so far (glglgl's solution) as my actual union_list can contain potentially in the order of thousands of items, so this will save me re-creating objects every time I run this intersection procedure.

Foundling answered 31/5, 2012 at 6:58 Comment(2)
Does this actually work though? The documentation indicates that magic methods like __hash__ are looked up on the class, not the instance. docs.python.org/3/reference/datamodel.html#special-lookupCareerism
Actually, looks like it does work for old style classes, but not for new style classes: docs.python.org/2/reference/…Careerism
S
2

How about:

d1 = {p.name:p for p in l1}
d2 = {p.name:p for p in l2}

intersectnames = set(d1.keys()).intersection(d2.keys)
intersect = [d1[k] for k in intersectnames]

It might be faster to throw intersectnames at your ORM, in which case you wouldn't build dictionaries, just collect names in lists.

Shantel answered 31/5, 2012 at 7:15 Comment(0)
M
1

You'll have to override __hash__ and the comparision methods if you want to use sets like this.

If you don't, then

Person("Foo", 21) == Person("Foo", 21)

will always be false.

If your objects are managed by an ORM, then you'll have to check how it compares objects. Usually it only looks at the objects id and comparision only works if both objects are managed. If you try to compare an object you got from the ORM with an instance you created yourself before it's persisted to the db, then they are likely to be different. Anyway, an ORM shouldn't have problems with you supplying your own comparision logic.

But if for some reasons you can't override __hash__ and __eq__, then you can't use sets for intersection and union with the original objects. You could:

  • calculate the intersection/union yourself
  • create a wrapper class which is comparable:

    class Person:                    
        def __init__(self, name, age):                                                      
            self.name = name                                                                
            self.age = age                                                                  
    
    l1 = [Person("Foo", 21), Person("Bar", 22)]                                             
    l2 = [Person("Foo", 21), Person("Bar", 24)]                                             
    
    class ComparablePerson:
        def __init__(self, person):
            self.person = person
    
        def __hash__(self):
            return hash(self.person.name) + 31*hash(self.person.age)
    
        def __eq__(self, other):
            return (self.person.name == other.person.name and
                    self.person.age == other.person.age)
        def __repr__(self):
            return "<%s - %d>" % (self.person.name, self.person.age)
    
    c1 = set(ComparablePerson(p) for p in l1)
    c2 = set(ComparablePerson(p) for p in l2)
    
    print c1
    print c2
    print c1.union(c2)
    print c2.intersection(c1)
    
Manuel answered 30/5, 2012 at 8:59 Comment(1)
See my comment (on the original question); the override is already dealt with by an ORM. I'll update the question to reflect this.Foundling
Z
1

This is clunky, but...

set(p for p in union_list for q in l2 if p.name == q.name and p.age != q.age) | (set(p for p in l2 for q in union_list if p.name == q.name and p.age != q.age))
# {person(name='Bar', age=22), person(name='Bar', age=24)}
Zonda answered 30/5, 2012 at 9:9 Comment(0)
T
1

If you want the age to be irrelevant with respect to comparing, you should override __hash__() and __eq__() in Person although you have it in your Object.

If you need this behaviour only in this (and similiar) contexts, you could create a wrapper object which contains the Person and behaves differently, such as

class PersonWrapper(Object):
    def __init__(self, person):
        self.person = person
    def __eq__(self, other):
        if hasattr(other, 'person'):
            return self.person.name == other.person.name
        else:
            return self.person.name == other.name
    def __hash__(self):
        return hash(self.person.name)

and then do

union_list = list(set(PersonWrapper(i) for i in l1).union(PersonWrapper(i) for i in l2))
# [Person("Foo", 21), Person("Bar", 22), Person("Bar", 24)]

(untested)

Thielen answered 30/5, 2012 at 9:50 Comment(4)
The issue is I need the __hash__ and __eq__ methods the way they are, otherwise .union() won't work the way it does.Foundling
Hmm, interesting. So there's no way to do this without reconstructing objects? I know C++ gives me the option to pass a custom comparator; Python doesn't have the same ability?Foundling
There is a way to do so with functions like sorted() where you can give a cmp function as well as a key function, but not with sets, alas...Thielen
Damn. I've added an edit to the question to point out that the solution doesn't have to rely on a set, however I get the feeling this isn't going to change anything and I'm still going to have to use list comprehensions or genexps or something.Foundling

© 2022 - 2024 — McMap. All rights reserved.