Compare-Object
is capable of finding what elements are missing from one collection relative to the other, vice versa, or both.
However, it can be slow, and given that you mention large lists, it sounds like you're looking for a solution that performs well.
However, collections with 1,000 items are likely not a problem in practice.
Something like the following may therefore be sufficient to get all entries in $BunchoEmail
that aren't also in $GoogleUsers
(substitute =>
for <=
to reverse the logic):
(Compare-Object -PassThru $BunchoEmail $GoogleUsers).
Where({ $_.SideIndicator -eq '<=' })
Getting those entries that aren't in both collections (that are unique to either collection) is even easier:
Compare-Object -PassThru $BunchoEmail $GoogleUsers
As for improving performance:
Combining type [System.Collections.Generic.HashSet`1]
with LINQ enables a fast and concise solution:
Note:
Use of HashSet
implies that the results are reported in no particular order; to get them in sorted order, use [System.Collections.Generic.SortedSet[string]]
instead. (There is no built-in type for maintaining the insertion order as of .NET 6).
The solutions below are true set operations, i.e. they report distinct differences, unlike Compare-Object
. E.g., if unique email [email protected]
is present twice in a collection, the solutions below report it only once, whereas Compare-Object
reports both instances.
Unlike Compare-Object
, the HashSet
and SortedSet
types are case-sensitive by default; you can pass an equality comparer to the constructor for case-insensitive behavior, using System.StringComparer
; e.g.:
[System.Collections.Generic.HashSet[string]]::new(
[string[]] ('foo', 'FOO'),
[System.StringComparer]::InvariantCultureIgnoreCase
)
To get all entries in $BunchoEmail
that aren't also in $GoogleUsers
, use [System.Linq.Enumerable]::Except()
(reverse the operands for the inverse solution):
[Linq.Enumerable]::Except(
[System.Collections.Generic.HashSet[string]] $BunchoEmail,
[System.Collections.Generic.HashSet[string]] $GoogleUsers
)
Note: You could also use a hash set's .ExceptWith()
method, but that requires storing one of the hash sets in an auxiliary variable, which is then updated in place - analogous to the .SymmetricExceptWith()
solution below.
Getting those entries that aren't in both collections (that are unique to either collection, called the symmetric difference in set terms) requires a bit more effort, using a hash set's .SymmetricExceptWith()
method:
# Load one of the collections into an auxiliary hash set.
$auxHashSet = [System.Collections.Generic.HashSet[string]] $BunchoEmail
# Determine the symmetric difference between the two sets, which
# updates the calling set in place.
$auxHashSet.SymmetricExceptWith(
[System.Collections.Generic.HashSet[string]] $GoogleUsers
)
# Output the result
$auxHashSet
[System.Collections.Generic.HashSet[string]]
can also handle duplicated items in the collection without any exception thrown which is very nice. – Isolde