Check whether an array is a subset of another
Asked Answered
S

10

170

Any idea on how to check whether that list is a subset of another?

Specifically, I have

List<double> t1 = new List<double> { 1, 3, 5 };
List<double> t2 = new List<double> { 1, 5 };

How to check that t2 is a subset of t1, using LINQ?

Sleekit answered 2/12, 2008 at 3:42 Comment(1)
If the lists are sorted (as in your example), this should be possible in O(n+m) time.Fornication
L
284
bool isSubset = !t2.Except(t1).Any();
Lasser answered 2/12, 2008 at 4:12 Comment(5)
I've created extension method geekswithblogs.net/mnf/archive/2011/05/13/…Rowland
@Bul Ikana Working of this code is simple, extension method internally calls the Equals and GetHashCode of the overridden object class methods if there's no IEqualityComparer provided for the job.Arellano
If the lists are length n and m, what's the time complexity of this algorithm?Fornication
Would be nice if this was boiled down to a linq method called ContainsAllArdine
I wonder if an intersection would be faster - t1.Intersect(t2).Count() == t2.Count()Hippogriff
M
71

Use HashSet instead of List if working with sets. Then you can simply use IsSubsetOf()

HashSet<double> t1 = new HashSet<double>{1,3,5};
HashSet<double> t2 = new HashSet<double>{1,5};

bool isSubset = t2.IsSubsetOf(t1);

Sorry that it doesn't use LINQ. :-(

If you need to use lists, then @Jared's solution works with the caveat that you will need to remove any repeated elements that exist.

Myronmyrrh answered 2/12, 2008 at 3:57 Comment(4)
Exactly. You want a set operation, use the class designed for them. Cameron's solution is creative, but not as clear/expressive as the HashSet.Elfreda
Um I disagree because the question specifically says "use LINQ".Donal
@JaredPar: So what? Is it not better to show someone the right way than the way they want to go?Cle
A list maintains its order but a set does not. If the order is important this would give incorrect results.Kalil
I
23

This is a significantly more efficient solution than the others posted here, especially the top solution:

bool isSubset = t2.All(elem => t1.Contains(elem));

If you can find a single element in t2 that isn't in t1, then you know that t2 is not a subset of t1. The advantage of this method is that it is done all in-place, without allocating additional space, unlike the solutions using .Except or .Intersect. Furthermore, this solution is able to break as soon as it finds a single element that violates the subset condition, while the others continue searching. Below is the optimal long form of the solution, which is only marginally faster in my tests than the above shorthand solution.

bool isSubset = true;
foreach (var element in t2) {
    if (!t1.Contains(element)) {
        isSubset = false;
        break;
    }
}

I did some rudimentary performance analysis of all the solutions, and the results are drastic. These two solutions are about 100x faster than the .Except() and .Intersect() solutions, and use no additional memory.

Intractable answered 2/11, 2014 at 7:44 Comment(8)
That is exactly what !t2.Except(t1).Any() is doing. Linq is working back to forth. Any() is asking an IEnumerable if there is at least one element. In this scenario t2.Except(t1) is only emitting the first element of t2 which is not in t1. If the first element of t2 is not in t1 it finishes fastest, if all elements of t2 are in t1 it runs the longest.Glinys
While playing around with some sort of benchmark, I found out, when you take t1={1,2,3,...9999} and t2={9999,9998,99997...9000}, you get the following measurements: !t2.Except(t1).Any(): 1ms -> t2.All(e => t1.Contains(e)): 702ms. And it get's worse the bigger the range.Glinys
That is incorrect. t2.Except(t1) is a set subtraction operation, and returns ALL the remaining elements of t2 minus t1. It creates an entire new collection, and requires additional memory to do so. It is not emitting only the first element of t2 which is not in t1, it is emitting ALL elements of t2 that are not in t1. In order to return the set of ALL elements not in t1, it must traverse the entire set. For example, if t2={1,2,3,4,5,6,7,8} and t1={2,4,6,8}, then t2.Except(t1) returns {1,3,5,7}.Intractable
This is not the way Linq works. t2.Except (t1) is returning an IEnumerable not a Collection. It only emits all of the possible items if you iterate completely over it, for example by ToArray () or ToList () or use foreach without breaking inside. Search for linq deferred execution to read more about that concept.Glinys
From the documentation of Except (): This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic. msdn.microsoft.com/en-us/library/bb300779(v=vs.110).aspxGlinys
I'm fully aware of how deferred execution works in Linq. You can defer the execution all you want, but when you want to determine if t2 is a subset of t1, you'll need to iterate the entire list to figure it out. There is no getting around that fact.Intractable
Lets take the example from your comment t2={1,2,3,4,5,6,7,8} t1={2,4,6,8} t2.Except(t1) => first element of t2 = 1 => difference of 1 to t1 is 1 (checked against {2,4,6,8}) => Except() emits first element 1 => Any() gets an element => Any() results in true => no further check of elements in t2.Glinys
This is a good answer, but tarnished by the misconception outlined by @abto. I suggest an edit to focus on the actual technique independently of the alternatives.Neurosurgery
D
13

If you are unit-testing you can also utilize the CollectionAssert.IsSubsetOf method :

CollectionAssert.IsSubsetOf(subset, superset);

In the above case this would mean:

CollectionAssert.IsSubsetOf(t2, t1);
Damaging answered 27/8, 2014 at 11:40 Comment(0)
B
9

I have spent some time benchmarking 3 working solutions in this answer with BenchmarkDotNet.

  • !t2.Except(t1).Any() called Except().Any()
  • t2.IsSubsetOf(t1) called HashSet
  • t2.All(elem => t1.Contains(elem)) called All(=>Contains)

I have varied 3 params:

  • count of supersets
  • length of subset
  • variability in items

All supersets were random length and contained items in range [0; Variability). Subsets had fixed length and contained items in range [0; Variability).

The winner is All(=>Contains) method - it is sometimes up to 30x times faster than using HashSets and 50x faster than Except().Any().

Code can be found on github

UPDATE As mentioned in comments by @Gert Arnold I had a bug in my benchmark. I called ToHashSet() inside every iteration. So i build also a collection of prebuilt hashsets and added a pure-hashset test named prebuilt HashSet. It is not an absolute winner and sometimes it looses All(=>Contains) algorithm. It seems to me the point is in variability (and duplicates) in supersets. When variability in values is low - hashset wins, as it removes it from data.

The final table goes here:

|             Method | SupersetsInIteration | SubsetLength | Variability |          Mean |        Error |       StdDev |        Median |
|------------------- |--------------------- |------------- |------------ |--------------:|-------------:|-------------:|--------------:|
|     Except().Any() |                 1000 |           10 |         100 |   1,485.89 μs |     7.363 μs |     6.887 μs |   1,488.36 μs |
|        ToHashSet() |                 1000 |           10 |         100 |   1,015.59 μs |     5.099 μs |     4.770 μs |   1,015.43 μs |
| prebuilt HashSet   |                 1000 |           10 |         100 |      38.76 μs |     0.065 μs |     0.054 μs |      38.78 μs |
|    All(=>Contains) |                 1000 |           10 |         100 |     105.46 μs |     0.320 μs |     0.267 μs |     105.38 μs |
|     Except().Any() |                 1000 |           10 |       10000 |   1,912.17 μs |    38.180 μs |    87.725 μs |   1,890.72 μs |
|        ToHashSet() |                 1000 |           10 |       10000 |   1,038.70 μs |    20.028 μs |    40.459 μs |   1,019.35 μs |
| prebuilt HashSet   |                 1000 |           10 |       10000 |      28.22 μs |     0.165 μs |     0.155 μs |      28.24 μs |
|    All(=>Contains) |                 1000 |           10 |       10000 |      81.47 μs |     0.117 μs |     0.109 μs |      81.45 μs |
|     Except().Any() |                 1000 |           50 |         100 |   4,888.22 μs |    81.268 μs |    76.019 μs |   4,854.42 μs |
|        ToHashSet() |                 1000 |           50 |         100 |   4,323.23 μs |    21.424 μs |    18.992 μs |   4,315.16 μs |
| prebuilt HashSet   |                 1000 |           50 |         100 |     186.53 μs |     1.257 μs |     1.176 μs |     186.35 μs |
|    All(=>Contains) |                 1000 |           50 |         100 |   1,173.37 μs |     2.667 μs |     2.227 μs |   1,173.08 μs |
|     Except().Any() |                 1000 |           50 |       10000 |   7,148.22 μs |    20.545 μs |    19.218 μs |   7,138.22 μs |
|        ToHashSet() |                 1000 |           50 |       10000 |   4,576.69 μs |    20.955 μs |    17.499 μs |   4,574.34 μs |
| prebuilt HashSet   |                 1000 |           50 |       10000 |      33.87 μs |     0.160 μs |     0.142 μs |      33.85 μs |
|    All(=>Contains) |                 1000 |           50 |       10000 |     131.34 μs |     0.569 μs |     0.475 μs |     131.24 μs |
|     Except().Any() |                10000 |           10 |         100 |  14,798.42 μs |   120.423 μs |   112.643 μs |  14,775.43 μs |
|        ToHashSet() |                10000 |           10 |         100 |  10,263.52 μs |    64.082 μs |    59.942 μs |  10,265.58 μs |
| prebuilt HashSet   |                10000 |           10 |         100 |   1,241.19 μs |     4.248 μs |     3.973 μs |   1,241.75 μs |
|    All(=>Contains) |                10000 |           10 |         100 |   1,058.41 μs |     6.766 μs |     6.329 μs |   1,059.22 μs |
|     Except().Any() |                10000 |           10 |       10000 |  16,318.65 μs |    97.878 μs |    91.555 μs |  16,310.02 μs |
|        ToHashSet() |                10000 |           10 |       10000 |  10,393.23 μs |    68.236 μs |    63.828 μs |  10,386.27 μs |
| prebuilt HashSet   |                10000 |           10 |       10000 |   1,087.21 μs |     2.812 μs |     2.631 μs |   1,085.89 μs |
|    All(=>Contains) |                10000 |           10 |       10000 |     847.88 μs |     1.536 μs |     1.436 μs |     847.34 μs |
|     Except().Any() |                10000 |           50 |         100 |  48,257.76 μs |   232.573 μs |   181.578 μs |  48,236.31 μs |
|        ToHashSet() |                10000 |           50 |         100 |  43,938.46 μs |   994.200 μs | 2,687.877 μs |  42,877.97 μs |
| prebuilt HashSet   |                10000 |           50 |         100 |   4,634.98 μs |    16.757 μs |    15.675 μs |   4,643.17 μs |
|    All(=>Contains) |                10000 |           50 |         100 |  10,256.62 μs |    26.440 μs |    24.732 μs |  10,243.34 μs |
|     Except().Any() |                10000 |           50 |       10000 |  73,192.15 μs |   479.584 μs |   425.139 μs |  73,077.26 μs |
|        ToHashSet() |                10000 |           50 |       10000 |  45,880.72 μs |   141.497 μs |   125.433 μs |  45,860.50 μs |
| prebuilt HashSet   |                10000 |           50 |       10000 |   1,620.61 μs |     3.507 μs |     3.280 μs |   1,620.52 μs |
|    All(=>Contains) |                10000 |           50 |       10000 |   1,460.01 μs |     1.819 μs |     1.702 μs |   1,459.49 μs |
|     Except().Any() |               100000 |           10 |         100 | 149,047.91 μs | 1,696.388 μs | 1,586.803 μs | 149,063.20 μs |
|        ToHashSet() |               100000 |           10 |         100 | 100,657.74 μs |   150.890 μs |   117.805 μs | 100,654.39 μs |
| prebuilt HashSet   |               100000 |           10 |         100 |  12,753.33 μs |    17.257 μs |    15.298 μs |  12,749.85 μs |
|    All(=>Contains) |               100000 |           10 |         100 |  11,238.79 μs |    54.228 μs |    50.725 μs |  11,247.03 μs |
|     Except().Any() |               100000 |           10 |       10000 | 163,277.55 μs | 1,096.107 μs | 1,025.299 μs | 163,556.98 μs |
|        ToHashSet() |               100000 |           10 |       10000 |  99,927.78 μs |   403.811 μs |   337.201 μs |  99,812.12 μs |
| prebuilt HashSet   |               100000 |           10 |       10000 |  11,671.99 μs |     6.753 μs |     5.986 μs |  11,672.28 μs |
|    All(=>Contains) |               100000 |           10 |       10000 |   8,217.51 μs |    67.959 μs |    56.749 μs |   8,225.85 μs |
|     Except().Any() |               100000 |           50 |         100 | 493,925.76 μs | 2,169.048 μs | 1,922.805 μs | 493,386.70 μs |
|        ToHashSet() |               100000 |           50 |         100 | 432,214.15 μs | 1,261.673 μs | 1,180.169 μs | 431,624.50 μs |
| prebuilt HashSet   |               100000 |           50 |         100 |  49,593.29 μs |    75.300 μs |    66.751 μs |  49,598.45 μs |
|    All(=>Contains) |               100000 |           50 |         100 |  98,662.71 μs |   119.057 μs |   111.366 μs |  98,656.00 μs |
|     Except().Any() |               100000 |           50 |       10000 | 733,526.81 μs | 8,728.516 μs | 8,164.659 μs | 733,455.20 μs |
|        ToHashSet() |               100000 |           50 |       10000 | 460,166.27 μs | 7,227.011 μs | 6,760.150 μs | 457,359.70 μs |
| prebuilt HashSet   |               100000 |           50 |       10000 |  17,443.96 μs |    10.839 μs |     9.608 μs |  17,443.40 μs |
|    All(=>Contains) |               100000 |           50 |       10000 |  14,222.31 μs |    47.090 μs |    44.048 μs |  14,217.94 μs |

Birdie answered 17/10, 2020 at 10:7 Comment(2)
The comparison isn't fair. You include ToHashSet() in the measured code. The results would be quite different if you'd create hashsets and compare only IsSubsetOf. No doubt HashSet will win easily.Harpy
Good point, thank you! I will rerun tests on prepared hashsets in global setup. I have included it into benchmark code as we do no usually use this specific data structure in common business code, but i will check it.Birdie
M
6

@Cameron's solution as an extension method:

public static bool IsSubsetOf<T>(this IEnumerable<T> a, IEnumerable<T> b)
{
    return !a.Except(b).Any();
}

Usage:

bool isSubset = t2.IsSubsetOf(t1);

(This is similar, but not quite the same as the one posted on @Michael's blog)

Mizell answered 24/2, 2012 at 18:2 Comment(0)
C
1

Building on the answers from @Cameron and @Neil I wrote an extension method that uses the same terminology as the Enumerable class.

/// <summary>
/// Determines whether a sequence contains the specified elements by using the default equality comparer.
/// </summary>
/// <typeparam name="TSource">The type of the elements of source.</typeparam>
/// <param name="source">A sequence in which to locate the values.</param>
/// <param name="values">The values to locate in the sequence.</param>
/// <returns>true if the source sequence contains elements that have the specified values; otherwise, false.</returns>
public static bool ContainsAll<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values)
{
    return !values.Except(source).Any();
}
Custody answered 23/10, 2017 at 8:41 Comment(0)
H
0

Given then we have 2 arrays, array1 and array2 and we want to check that all array1 values exists in array2:

array1.All(ar1 => array2.Any(ar2 => ar2.Equals(ar1)));

Note: ar2.Equals(ar1) can be replaced if equality is described in other way.

Hotshot answered 4/11, 2020 at 14:16 Comment(0)
D
-1

Try this

static bool IsSubSet<A>(A[] set, A[] toCheck) {
  return set.Length == (toCheck.Intersect(set)).Count();
}

The idea here is that Intersect will only return the values that are in both Arrays. At this point if the length of the resulting set is the same as the original set, then all elements in "set" are also in "check" and therefore "set" is a subset of "toCheck"

Note: My solution does not work if "set" has duplicates. I'm not changing it because I don't want to steal other people's votes.

Hint: I voted for Cameron's answer.

Donal answered 2/12, 2008 at 3:44 Comment(2)
This works if they are indeed sets, but not if the second "set" contains repeated elements since it is really a list. You may want to use HashSet<double> to ensure that it has set semantics.Myronmyrrh
doesn't work when both Arrays have elements, which are not in the other Array.Panorama
U
-1

Here we check that if there is any element present in the child list(i.e t2) which is not contained by the parent list(i.e t1).If none such exists then the list is subset of the other

eg:

bool isSubset = !(t2.Any(x => !t1.Contains(x)));
Unpeopled answered 7/6, 2018 at 4:11 Comment(1)
not Any not is the same as plain All.Ironbound

© 2022 - 2024 — McMap. All rights reserved.