Test if set is a subset, considering the number (multiplicity) of each element in the set

Asked 4/3, 2013 at 18:24 Answered 22/8, 2019 at 11:37

Solved python set subset multiset multiplicity

I know I can test if set1 is a subset of set2 with:

{'a','b','c'} <= {'a','b','c','d','e'} # True

But the following is also True:

{'a','a','b','c'} <= {'a','b','c','d','e'} # True

How do I have it consider the number of times an element in the set occurs so that:

{'a','b','c'}     <= {'a','b','c','d','e'}      # True
{'a','a','b','c'} <= {'a','b','c','d','e'}      # False since 'a' is in set1 twice but set2 only once
{'a','a','b','c'} <= {'a','a','b','c','d','e'}  # True because both sets have two 'a' elements

I know I could do something like:

A, B, C = ['a','a','b','c'], ['a','b','c','d','e'], ['a','a','b','c','d','e']
all([A.count(i) == B.count(i) for i in A]) # False
all([A.count(i) == C.count(i) for i in A]) # True

But I was wondering if there was something more succinct like set(A).issubset(B,count=True) or a way to stay from list comprehensions. Thanks!

Someday answered 4/3, 2013 at 18:24 Comment(4)

{'a','a','b','c'} is exactly the same as {'a','b','c'}. That is the point of sets. – Academicism 4/3, 2013 at 18:26

In Python, set cannot have duplicate items. You may wan;t to use collection.Counter which can be thought as a multiset – Tab 4/3, 2013 at 18:26

Were you looking for a multi-set instead? Then use collections.Counter(). – Academicism 4/3, 2013 at 18:26

Counter doesn't do it since Counter('aabc') <= Counter('aaabcd') gives True but so does Counter('abce') <= Counter('aabcd'). There is clearly no 'e' in the second string. – Someday 4/3, 2013 at 18:40

As @DSM deleted his solution , I will take the opportunity to provide a prototype based on which you can expand

>>> class Multi_set(Counter):
    def __le__(self, rhs):
        return all(v == rhs[k] for k,v in self.items())


>>> Multi_set(['a','b','c']) <= Multi_set(['a','b','c','d','e'])
True
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','b','c','d','e'])
False
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','a','b','c','d','e'])
True
>>>

Tab answered 4/3, 2013 at 18:46 Comment(3)

== does not seem to match __le__? – Uncleanly 4/3, 2013 at 19:40

In my opinion there should be <= instead of ==. – Hunker 12/11, 2015 at 18:4

This isn't multiset <=. That would be all(v <= rhs[k] for k,v in self.items()). – Fellmonger 10/3, 2016 at 12:16

Since Python 3.10, "Counters support rich comparison operators for equality, subset, and superset":

def issubset(X, Y):
    return Counter(X) <= Counter(Y)

Old: As stated in the comments, a possible solution using Counter:

from collections import Counter

def issubset(X, Y):
    return len(Counter(X)-Counter(Y)) == 0

Bronwynbronx answered 4/3, 2013 at 18:49 Comment(1)

This is the best answer. It's simple and doesn't require the creation of a second class. – Frederickfredericka 15/4, 2017 at 21:8

The short answer to your question is there is no set operation that does this, because the definition of a set does not provide those operations. IE defining the functionality you're looking for would make the data type not a set.

Sets by definition have unique, unordered, members:

>>> print {'a', 'a', 'b', 'c'}
set(['a', 'c', 'b'])
>>> {'a', 'a', 'b', 'c'} == {'a', 'b', 'c'}
True

Brodsky answered 4/3, 2013 at 18:39 Comment(2)

So then if I was to test the characters in a string instead, is there a way to test membership of each string1 character in string2 taking into account the number of times it occurs? Is there an escape from list comprehensions here? – Someday 4/3, 2013 at 18:44

I'm not sure what you mean by "taking into account the number of times it occurs". Do you mean if 'a' occurs x times in string1, then 'a' should occur > x times in string2? And list comprehensions are a critical part of the Python language. Embrace them :) (You can start by writing them as a for loop if you do not understand them) – Brodsky 4/3, 2013 at 19:5

As @DSM deleted his solution , I will take the opportunity to provide a prototype based on which you can expand

>>> class Multi_set(Counter):
    def __le__(self, rhs):
        return all(v == rhs[k] for k,v in self.items())


>>> Multi_set(['a','b','c']) <= Multi_set(['a','b','c','d','e'])
True
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','b','c','d','e'])
False
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','a','b','c','d','e'])
True
>>>

Tab answered 4/3, 2013 at 18:46 Comment(3)

== does not seem to match __le__? – Uncleanly 4/3, 2013 at 19:40

In my opinion there should be <= instead of ==. – Hunker 12/11, 2015 at 18:4

This isn't multiset <=. That would be all(v <= rhs[k] for k,v in self.items()). – Fellmonger 10/3, 2016 at 12:16

Combining previous answers gives a solution which is as clean and fast as possible:

def issubset(X, Y):
    return all(v <= Y[k] for k, v in X.items())

No instances created instead of 3 in @A.Rodas version (both arguments must already be of type Counter, since this is the Pythonic way to handle multisets).
Early return (short-circuit) as soon as predicate is falsified.

Deafanddumb answered 20/6, 2017 at 19:52 Comment(0)

For those that are interested in the usual notion of multiset inclusion, the easiest way to test for multiset inclusion is to use intersection of multisets:

from collections import Counter

def issubset(X, Y):
    return X & Y == X

issubset(Counter("ab"), Counter("aab"))  # returns True
issubset(Counter("abc"), Counter("aab")) # returns False

This is a standard idea used in idempotent semirings.

Nuts answered 22/8, 2019 at 11:37 Comment(0)

Recommended topics

Hot tags