Is there a way to check if a Stream contains all collection elements?
Asked Answered
E

3

14

For example, I need something like:

Collection<String> collection = /* ... */;
Stream<Object> stream = /* ... */;
boolean containsAll = stream.map(Object::toString).containsAll(collection);

Of course, I could accumulate all elements of the stream into another Collection using collect() method and the call Collection.containsAll(), but what if the stream is too big and it is inefficient to process all its elements?

Eph answered 7/10, 2019 at 12:33 Comment(2)
Personally I think it's actually better to go with Set<String> temp = source.stream().map(Object::toString).collect(toSet()); boolean containsAll = temp.containsAll(collection);Undersigned
@OusmaneD. What if the stream is too big as OP assumes? Imagine the case when the stream is lazily generated and not all elements are persisted in the memory. E.g. using Files::lines you can process huge files even though they could not fit in your memory. Collecting into a set will cause OutOfMemoryError in such a case.Nonparticipation
N
11

This should do the trick:

Set<String> set = new HashSet<>(collection);
boolean containsAll = set.isEmpty() || stream.map(Object::toString)
                                             .anyMatch(s -> set.remove(s) && set.isEmpty());

The solution might look confusing, but the idea is straightforward:

  1. In order to prevent multiple iterations over collection we wrap it into a HashSet. (In case your stream is a parallel one, then you will have to use a concurrent hash set. See this post for more details)
  2. If the collection (or set) is empty then we return true without processing the stream
  3. For each entry of stream we try to remove it from set. In case the result of Set::remove is true (hence it was contained by set) and the set is empty after removal, we can conclude that stream contained all the elements of initial collection.
  4. The terminal operation Stream::anyMatch is a short-circuiting one. So it will stop iterating over stream once the set is empty. In worst case we will process the entire stream.

Perhaps this is a bit more readable form:

Set<String> set = new HashSet<>(collection);
boolean containsAll = set.isEmpty() || stream.map(Object::toString)
                                             .filter(set::remove)
                                             .anyMatch(__ -> set.isEmpty());

If the collection can contain duplicates and there is a requirement to check if stream contains all of them, then we will need to maintain a concurrent map of counters.

Map<String, AtomicLong> map = new ConcurrentHashMap<>();
collection.forEach(s -> map.computeIfAbsent(s, __ -> new AtomicLong()).incrementAndGet());
boolean containsAll = map.isEmpty() || stream.map(Object::toString)
                                             .filter(map::containsKey)
                                             .filter(s -> map.get(s).decrementAndGet() == 0)
                                             .filter(s -> map.remove(s) != null)
                                             .anyMatch(__ -> map.isEmpty());

The code slightly changed but the idea is the same.

Nonparticipation answered 7/10, 2019 at 15:55 Comment(5)
Seems like you could also wrap it in a List to support checking if the stream contains multiple instances of an element. List.remove would return true until all copies are removed.Equidistant
@SeanVanGorder Actually, this is a very interesting case. For the sake of simplicity I ignored that in my solution. But using List we'll loose the performance boost HashSet offers.Nonparticipation
You could just check if the original collection is a Set and wrap it in HashSet instead of ArrayList in that case. This map count preprocessing seems like a trade-off, it might be faster for huge non-Set collections but probably worse for small ones.Equidistant
This makes sense. Programming is all about trade-offs. Different data structures are suitable for different cases. It is still up to software engineer to pick the one which fits better the task he/she tries to resolve.Nonparticipation
This blew my mind. I never thought if using stream like this. Thanks a lot. From a college studentVite
T
4

Regardless of how big the Stream is, you will have to process all of its elements if it does not contain all the elements of the Collection.

You could save processing time if a small prefix of the Stream contains all the elements of Collection, and the Collection is much smaller than the Stream.

boolean containsAll = 
    stream.map(Object::toString)
          .filter(s -> collection.contains(s)) // it would be wise to convert collection to a Set
          .limit(collection.size())
          .count() == collection.size();

Note that if the Stream may contain multiple copies of the same element of the Collection, you may have to add a .distinct() operation after the filter().

Troublous answered 7/10, 2019 at 12:39 Comment(5)
I don't really understand the benefit of not using collect over any other operation(count) in this case. Since the stream processing for the complete elements would anyway take place(worst case), is there a possible optimization to get away with that much of memory being used?Laughry
@Laughry would it process all the elements if, for example, the stream has 1000000 elements but all the elements of the Collection appear in the first 50 elements of the Stream? Of course, if not all the elements of the Collection appear in the Stream (or if the last of them appears near the end of the Stream), this solution wouldn't help.Troublous
What if the stream has some null entries? If the stream and collection are sorted, will it be faster to process containsAll?Hebephrenia
@Troublous By containsAll I meant your method, finding if a collection contains all elements of a stream. Thanks for clarification.Hebephrenia
@Hebephrenia I misunderstood your question. If the Stream and Collection are sorted, it should be possible to make an efficient solution (that doesn't process the entire Stream) even if the Stream doesn't contain all elements of the Collection, since we can identify the point in the Stream after which there can be no more elements that belong to the Collection. This can be done with takeWhile.Troublous
P
3

Create a Set from Collection<String> to make search operation faster O(1)

Set<String> set = new HashSet<>(collection);

Then use allMatch to check every item in stream contains in set or not

boolean containsAll = stream.map(Object::toString)
                            .allMatch(s -> set.contains(s));

Another way :

Filter by not contained in set and use limit(1) for optimization

boolean isContains = stream.map(Object::toString)
                           .filter(s -> !set.contains(s))
                           .limit(1)
                           .count() > 0;
Padova answered 29/6, 2020 at 20:4 Comment(1)
This is not what OP asked for. If set contains ("foo", "baa"), and your stream contains 100 x "foo", your function returns the wrong solution.Ribbentrop

© 2022 - 2024 — McMap. All rights reserved.