Any idea on how to check whether that list is a subset of another?
Specifically, I have
List<double> t1 = new List<double> { 1, 3, 5 };
List<double> t2 = new List<double> { 1, 5 };
How to check that t2 is a subset of t1, using LINQ?
Any idea on how to check whether that list is a subset of another?
Specifically, I have
List<double> t1 = new List<double> { 1, 3, 5 };
List<double> t2 = new List<double> { 1, 5 };
How to check that t2 is a subset of t1, using LINQ?
bool isSubset = !t2.Except(t1).Any();
t1.Intersect(t2).Count() == t2.Count()
–
Hippogriff Use HashSet instead of List if working with sets. Then you can simply use IsSubsetOf()
HashSet<double> t1 = new HashSet<double>{1,3,5};
HashSet<double> t2 = new HashSet<double>{1,5};
bool isSubset = t2.IsSubsetOf(t1);
Sorry that it doesn't use LINQ. :-(
If you need to use lists, then @Jared's solution works with the caveat that you will need to remove any repeated elements that exist.
This is a significantly more efficient solution than the others posted here, especially the top solution:
bool isSubset = t2.All(elem => t1.Contains(elem));
If you can find a single element in t2 that isn't in t1, then you know that t2 is not a subset of t1. The advantage of this method is that it is done all in-place, without allocating additional space, unlike the solutions using .Except or .Intersect. Furthermore, this solution is able to break as soon as it finds a single element that violates the subset condition, while the others continue searching. Below is the optimal long form of the solution, which is only marginally faster in my tests than the above shorthand solution.
bool isSubset = true;
foreach (var element in t2) {
if (!t1.Contains(element)) {
isSubset = false;
break;
}
}
I did some rudimentary performance analysis of all the solutions, and the results are drastic. These two solutions are about 100x faster than the .Except() and .Intersect() solutions, and use no additional memory.
!t2.Except(t1).Any()
is doing. Linq is working back to forth. Any()
is asking an IEnumerable
if there is at least one element. In this scenario t2.Except(t1)
is only emitting the first element of t2
which is not in t1
. If the first element of t2
is not in t1
it finishes fastest, if all elements of t2
are in t1
it runs the longest. –
Glinys t1={1,2,3,...9999}
and t2={9999,9998,99997...9000}
, you get the following measurements: !t2.Except(t1).Any(): 1ms -> t2.All(e => t1.Contains(e)): 702ms
. And it get's worse the bigger the range. –
Glinys t2.Except (t1)
is returning an IEnumerable
not a Collection
. It only emits all of the possible items if you iterate completely over it, for example by ToArray ()
or ToList ()
or use foreach
without breaking inside. Search for linq deferred execution to read more about that concept. –
Glinys Except ()
: This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic. msdn.microsoft.com/en-us/library/bb300779(v=vs.110).aspx –
Glinys t2={1,2,3,4,5,6,7,8}
t1={2,4,6,8}
t2.Except(t1)
=> first element of t2 = 1 => difference of 1 to t1 is 1 (checked against {2,4,6,8}) => Except()
emits first element 1 => Any()
gets an element => Any()
results in true => no further check of elements in t2. –
Glinys If you are unit-testing you can also utilize the CollectionAssert.IsSubsetOf method :
CollectionAssert.IsSubsetOf(subset, superset);
In the above case this would mean:
CollectionAssert.IsSubsetOf(t2, t1);
I have spent some time benchmarking 3 working solutions in this answer with BenchmarkDotNet.
!t2.Except(t1).Any()
called Except().Any()t2.IsSubsetOf(t1)
called HashSett2.All(elem => t1.Contains(elem))
called All(=>Contains)I have varied 3 params:
All supersets were random length and contained items in range [0; Variability)
. Subsets had fixed length and contained items in range [0; Variability)
.
The winner is All(=>Contains) method - it is sometimes up to 30x times faster than using HashSets and 50x faster than Except().Any().
Code can be found on github
UPDATE
As mentioned in comments by @Gert Arnold I had a bug in my benchmark. I called ToHashSet()
inside every iteration.
So i build also a collection of prebuilt hashsets and added a pure-hashset test named prebuilt HashSet
.
It is not an absolute winner and sometimes it looses All(=>Contains)
algorithm. It seems to me the point is in variability (and duplicates) in supersets. When variability in values is low - hashset wins, as it removes it from data.
The final table goes here:
| Method | SupersetsInIteration | SubsetLength | Variability | Mean | Error | StdDev | Median |
|------------------- |--------------------- |------------- |------------ |--------------:|-------------:|-------------:|--------------:|
| Except().Any() | 1000 | 10 | 100 | 1,485.89 μs | 7.363 μs | 6.887 μs | 1,488.36 μs |
| ToHashSet() | 1000 | 10 | 100 | 1,015.59 μs | 5.099 μs | 4.770 μs | 1,015.43 μs |
| prebuilt HashSet | 1000 | 10 | 100 | 38.76 μs | 0.065 μs | 0.054 μs | 38.78 μs |
| All(=>Contains) | 1000 | 10 | 100 | 105.46 μs | 0.320 μs | 0.267 μs | 105.38 μs |
| Except().Any() | 1000 | 10 | 10000 | 1,912.17 μs | 38.180 μs | 87.725 μs | 1,890.72 μs |
| ToHashSet() | 1000 | 10 | 10000 | 1,038.70 μs | 20.028 μs | 40.459 μs | 1,019.35 μs |
| prebuilt HashSet | 1000 | 10 | 10000 | 28.22 μs | 0.165 μs | 0.155 μs | 28.24 μs |
| All(=>Contains) | 1000 | 10 | 10000 | 81.47 μs | 0.117 μs | 0.109 μs | 81.45 μs |
| Except().Any() | 1000 | 50 | 100 | 4,888.22 μs | 81.268 μs | 76.019 μs | 4,854.42 μs |
| ToHashSet() | 1000 | 50 | 100 | 4,323.23 μs | 21.424 μs | 18.992 μs | 4,315.16 μs |
| prebuilt HashSet | 1000 | 50 | 100 | 186.53 μs | 1.257 μs | 1.176 μs | 186.35 μs |
| All(=>Contains) | 1000 | 50 | 100 | 1,173.37 μs | 2.667 μs | 2.227 μs | 1,173.08 μs |
| Except().Any() | 1000 | 50 | 10000 | 7,148.22 μs | 20.545 μs | 19.218 μs | 7,138.22 μs |
| ToHashSet() | 1000 | 50 | 10000 | 4,576.69 μs | 20.955 μs | 17.499 μs | 4,574.34 μs |
| prebuilt HashSet | 1000 | 50 | 10000 | 33.87 μs | 0.160 μs | 0.142 μs | 33.85 μs |
| All(=>Contains) | 1000 | 50 | 10000 | 131.34 μs | 0.569 μs | 0.475 μs | 131.24 μs |
| Except().Any() | 10000 | 10 | 100 | 14,798.42 μs | 120.423 μs | 112.643 μs | 14,775.43 μs |
| ToHashSet() | 10000 | 10 | 100 | 10,263.52 μs | 64.082 μs | 59.942 μs | 10,265.58 μs |
| prebuilt HashSet | 10000 | 10 | 100 | 1,241.19 μs | 4.248 μs | 3.973 μs | 1,241.75 μs |
| All(=>Contains) | 10000 | 10 | 100 | 1,058.41 μs | 6.766 μs | 6.329 μs | 1,059.22 μs |
| Except().Any() | 10000 | 10 | 10000 | 16,318.65 μs | 97.878 μs | 91.555 μs | 16,310.02 μs |
| ToHashSet() | 10000 | 10 | 10000 | 10,393.23 μs | 68.236 μs | 63.828 μs | 10,386.27 μs |
| prebuilt HashSet | 10000 | 10 | 10000 | 1,087.21 μs | 2.812 μs | 2.631 μs | 1,085.89 μs |
| All(=>Contains) | 10000 | 10 | 10000 | 847.88 μs | 1.536 μs | 1.436 μs | 847.34 μs |
| Except().Any() | 10000 | 50 | 100 | 48,257.76 μs | 232.573 μs | 181.578 μs | 48,236.31 μs |
| ToHashSet() | 10000 | 50 | 100 | 43,938.46 μs | 994.200 μs | 2,687.877 μs | 42,877.97 μs |
| prebuilt HashSet | 10000 | 50 | 100 | 4,634.98 μs | 16.757 μs | 15.675 μs | 4,643.17 μs |
| All(=>Contains) | 10000 | 50 | 100 | 10,256.62 μs | 26.440 μs | 24.732 μs | 10,243.34 μs |
| Except().Any() | 10000 | 50 | 10000 | 73,192.15 μs | 479.584 μs | 425.139 μs | 73,077.26 μs |
| ToHashSet() | 10000 | 50 | 10000 | 45,880.72 μs | 141.497 μs | 125.433 μs | 45,860.50 μs |
| prebuilt HashSet | 10000 | 50 | 10000 | 1,620.61 μs | 3.507 μs | 3.280 μs | 1,620.52 μs |
| All(=>Contains) | 10000 | 50 | 10000 | 1,460.01 μs | 1.819 μs | 1.702 μs | 1,459.49 μs |
| Except().Any() | 100000 | 10 | 100 | 149,047.91 μs | 1,696.388 μs | 1,586.803 μs | 149,063.20 μs |
| ToHashSet() | 100000 | 10 | 100 | 100,657.74 μs | 150.890 μs | 117.805 μs | 100,654.39 μs |
| prebuilt HashSet | 100000 | 10 | 100 | 12,753.33 μs | 17.257 μs | 15.298 μs | 12,749.85 μs |
| All(=>Contains) | 100000 | 10 | 100 | 11,238.79 μs | 54.228 μs | 50.725 μs | 11,247.03 μs |
| Except().Any() | 100000 | 10 | 10000 | 163,277.55 μs | 1,096.107 μs | 1,025.299 μs | 163,556.98 μs |
| ToHashSet() | 100000 | 10 | 10000 | 99,927.78 μs | 403.811 μs | 337.201 μs | 99,812.12 μs |
| prebuilt HashSet | 100000 | 10 | 10000 | 11,671.99 μs | 6.753 μs | 5.986 μs | 11,672.28 μs |
| All(=>Contains) | 100000 | 10 | 10000 | 8,217.51 μs | 67.959 μs | 56.749 μs | 8,225.85 μs |
| Except().Any() | 100000 | 50 | 100 | 493,925.76 μs | 2,169.048 μs | 1,922.805 μs | 493,386.70 μs |
| ToHashSet() | 100000 | 50 | 100 | 432,214.15 μs | 1,261.673 μs | 1,180.169 μs | 431,624.50 μs |
| prebuilt HashSet | 100000 | 50 | 100 | 49,593.29 μs | 75.300 μs | 66.751 μs | 49,598.45 μs |
| All(=>Contains) | 100000 | 50 | 100 | 98,662.71 μs | 119.057 μs | 111.366 μs | 98,656.00 μs |
| Except().Any() | 100000 | 50 | 10000 | 733,526.81 μs | 8,728.516 μs | 8,164.659 μs | 733,455.20 μs |
| ToHashSet() | 100000 | 50 | 10000 | 460,166.27 μs | 7,227.011 μs | 6,760.150 μs | 457,359.70 μs |
| prebuilt HashSet | 100000 | 50 | 10000 | 17,443.96 μs | 10.839 μs | 9.608 μs | 17,443.40 μs |
| All(=>Contains) | 100000 | 50 | 10000 | 14,222.31 μs | 47.090 μs | 44.048 μs | 14,217.94 μs |
ToHashSet()
in the measured code. The results would be quite different if you'd create hashsets and compare only IsSubsetOf
. No doubt HashSet
will win easily. –
Harpy @Cameron's solution as an extension method:
public static bool IsSubsetOf<T>(this IEnumerable<T> a, IEnumerable<T> b)
{
return !a.Except(b).Any();
}
Usage:
bool isSubset = t2.IsSubsetOf(t1);
(This is similar, but not quite the same as the one posted on @Michael's blog)
Building on the answers from @Cameron and @Neil I wrote an extension method that uses the same terminology as the Enumerable class.
/// <summary>
/// Determines whether a sequence contains the specified elements by using the default equality comparer.
/// </summary>
/// <typeparam name="TSource">The type of the elements of source.</typeparam>
/// <param name="source">A sequence in which to locate the values.</param>
/// <param name="values">The values to locate in the sequence.</param>
/// <returns>true if the source sequence contains elements that have the specified values; otherwise, false.</returns>
public static bool ContainsAll<TSource>(this IEnumerable<TSource> source, IEnumerable<TSource> values)
{
return !values.Except(source).Any();
}
Given then we have 2 arrays, array1
and array2
and we want to check that all array1
values exists in array2
:
array1.All(ar1 => array2.Any(ar2 => ar2.Equals(ar1)));
Note: ar2.Equals(ar1)
can be replaced if equality is described in other way.
Try this
static bool IsSubSet<A>(A[] set, A[] toCheck) {
return set.Length == (toCheck.Intersect(set)).Count();
}
The idea here is that Intersect will only return the values that are in both Arrays. At this point if the length of the resulting set is the same as the original set, then all elements in "set" are also in "check" and therefore "set" is a subset of "toCheck"
Note: My solution does not work if "set" has duplicates. I'm not changing it because I don't want to steal other people's votes.
Hint: I voted for Cameron's answer.
Here we check that if there is any element present in the child list(i.e
t2
) which is not contained by the parent list(i.et1
).If none such exists then the list is subset of the other
eg:
bool isSubset = !(t2.Any(x => !t1.Contains(x)));
not Any not
is the same as plain All
. –
Ironbound © 2022 - 2024 — McMap. All rights reserved.