Test if set is a subset, considering the number (multiplicity) of each element in the set
Asked Answered
S

5

5

I know I can test if set1 is a subset of set2 with:

{'a','b','c'} <= {'a','b','c','d','e'} # True

But the following is also True:

{'a','a','b','c'} <= {'a','b','c','d','e'} # True

How do I have it consider the number of times an element in the set occurs so that:

{'a','b','c'}     <= {'a','b','c','d','e'}      # True
{'a','a','b','c'} <= {'a','b','c','d','e'}      # False since 'a' is in set1 twice but set2 only once
{'a','a','b','c'} <= {'a','a','b','c','d','e'}  # True because both sets have two 'a' elements

I know I could do something like:

A, B, C = ['a','a','b','c'], ['a','b','c','d','e'], ['a','a','b','c','d','e']
all([A.count(i) == B.count(i) for i in A]) # False
all([A.count(i) == C.count(i) for i in A]) # True

But I was wondering if there was something more succinct like set(A).issubset(B,count=True) or a way to stay from list comprehensions. Thanks!

Someday answered 4/3, 2013 at 18:24 Comment(4)
{'a','a','b','c'} is exactly the same as {'a','b','c'}. That is the point of sets.Academicism
In Python, set cannot have duplicate items. You may wan;t to use collection.Counter which can be thought as a multisetTab
Were you looking for a multi-set instead? Then use collections.Counter().Academicism
Counter doesn't do it since Counter('aabc') <= Counter('aaabcd') gives True but so does Counter('abce') <= Counter('aabcd'). There is clearly no 'e' in the second string.Someday
T
1

As @DSM deleted his solution , I will take the opportunity to provide a prototype based on which you can expand

>>> class Multi_set(Counter):
    def __le__(self, rhs):
        return all(v == rhs[k] for k,v in self.items())


>>> Multi_set(['a','b','c']) <= Multi_set(['a','b','c','d','e'])
True
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','b','c','d','e'])
False
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','a','b','c','d','e'])
True
>>> 
Tab answered 4/3, 2013 at 18:46 Comment(3)
== does not seem to match __le__?Uncleanly
In my opinion there should be <= instead of ==.Hunker
This isn't multiset <=. That would be all(v <= rhs[k] for k,v in self.items()).Fellmonger
B
13

Since Python 3.10, "Counters support rich comparison operators for equality, subset, and superset":

def issubset(X, Y):
    return Counter(X) <= Counter(Y)

Old: As stated in the comments, a possible solution using Counter:

from collections import Counter

def issubset(X, Y):
    return len(Counter(X)-Counter(Y)) == 0
Bronwynbronx answered 4/3, 2013 at 18:49 Comment(1)
This is the best answer. It's simple and doesn't require the creation of a second class.Frederickfredericka
B
3

The short answer to your question is there is no set operation that does this, because the definition of a set does not provide those operations. IE defining the functionality you're looking for would make the data type not a set.

Sets by definition have unique, unordered, members:

>>> print {'a', 'a', 'b', 'c'}
set(['a', 'c', 'b'])
>>> {'a', 'a', 'b', 'c'} == {'a', 'b', 'c'}
True
Brodsky answered 4/3, 2013 at 18:39 Comment(2)
So then if I was to test the characters in a string instead, is there a way to test membership of each string1 character in string2 taking into account the number of times it occurs? Is there an escape from list comprehensions here?Someday
I'm not sure what you mean by "taking into account the number of times it occurs". Do you mean if 'a' occurs x times in string1, then 'a' should occur > x times in string2? And list comprehensions are a critical part of the Python language. Embrace them :) (You can start by writing them as a for loop if you do not understand them)Brodsky
T
1

As @DSM deleted his solution , I will take the opportunity to provide a prototype based on which you can expand

>>> class Multi_set(Counter):
    def __le__(self, rhs):
        return all(v == rhs[k] for k,v in self.items())


>>> Multi_set(['a','b','c']) <= Multi_set(['a','b','c','d','e'])
True
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','b','c','d','e'])
False
>>> Multi_set(['a','a','b','c']) <= Multi_set(['a','a','b','c','d','e'])
True
>>> 
Tab answered 4/3, 2013 at 18:46 Comment(3)
== does not seem to match __le__?Uncleanly
In my opinion there should be <= instead of ==.Hunker
This isn't multiset <=. That would be all(v <= rhs[k] for k,v in self.items()).Fellmonger
D
1

Combining previous answers gives a solution which is as clean and fast as possible:

def issubset(X, Y):
    return all(v <= Y[k] for k, v in X.items())
  • No instances created instead of 3 in @A.Rodas version (both arguments must already be of type Counter, since this is the Pythonic way to handle multisets).
  • Early return (short-circuit) as soon as predicate is falsified.
Deafanddumb answered 20/6, 2017 at 19:52 Comment(0)
N
0

For those that are interested in the usual notion of multiset inclusion, the easiest way to test for multiset inclusion is to use intersection of multisets:

from collections import Counter

def issubset(X, Y):
    return X & Y == X

issubset(Counter("ab"), Counter("aab"))  # returns True
issubset(Counter("abc"), Counter("aab")) # returns False

This is a standard idea used in idempotent semirings.

Nuts answered 22/8, 2019 at 11:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.