In java 8, what's the best way to check if a List contains any duplicate?
My idea was something like:
list.size() != list.stream().distinct().count()
Is it the best way?
In java 8, what's the best way to check if a List contains any duplicate?
My idea was something like:
list.size() != list.stream().distinct().count()
Is it the best way?
Your code would need to iterate over all elements. If you want to make sure that there are no duplicates simple method like
public static <T> boolean areAllUnique(List<T> list){
Set<T> set = new HashSet<>();
for (T t: list){
if (!set.add(t))
return false;
}
return true;
}
would be more efficient since it can give you false
immediately when first non-unique element would be found.
This method could also be rewritten using Stream#allMatch
which also is short-circuit (returns false immediately for first element which doesn't fulfill provided condition)
(assuming non-parallel streams and thread-safe environment)
public static <T> boolean areAllUnique(List<T> list){
Set<T> set = new HashSet<>();
return list.stream().allMatch(t -> set.add(t));
}
which can be farther shortened as @Holger pointed out in comment
public static <T> boolean areAllUnique(List<T> list){
return list.stream().allMatch(new HashSet<>()::add);
}
return list.stream().allMatch(new HashSet<>()::add);
–
Sisal allMatch
solutions or only allMatch(new HashSet<>()::add)
? –
Abseil Stream
, it’s acceptable to rely on the programmer to recognize whether that stream use is multi-threaded or not. Of course, if in doubt, using a ConcurrentMap
might help. And a stateful predicate might not be the best thing, but is sometimes unavoidable, distinct tests being the common example. Maybe you remember this one ;^) –
Sisal stream.seguential().allMatch(new HashSet<>()::add);
? Will it be always correct? And I guess no overhead? –
Mines new HashSet<>(list)
will iterate over entire list
, while .allMatch(new HashSet<>()::add)
is short-circuit, it will return false
at first occurrence of duplicate element. –
Abseil anyMatch
would mean Predicate should return false
for every successful adding to set and true otherwise. So we can't use nice new HashSet<>()::add
if we want to add negation there (unless we will explicitly cast it to Predicate<T>
to be able to use negate()
method like list.stream().anyMatch(((Predicate<T>) new HashSet<>()::add).negate())
. Without it we would need to create Set
before stream and use lambda like list.stream().anyMatch(elem -> !set.add(elem))
. –
Abseil final HashSet
the predicate refers, but how come the elements are being added to the same HashSet with the current way? –
Edana list.stream().allMatch(t -> set.add(t));
you can also do list.stream().allMatch(set::add))
–
Edana expression::name
will evaluate expression
and capture its result when the instance of the functional interface (here, a Predicate
) is created and then reused throughout the entire lifetime of that object. See What is the equivalent lambda expression for System.out::println or this Q&A… –
Sisal allMatch
solution isn't guaranteed to work (though it might for current stream implementations), since the API makes no guarantees that sequential streams will perform all operations in the same thread (though current implementations might), and HashSet.add
is not thread safe. –
Seasoning Out of the three following one-liners
return list.size() == new HashSet<>(list).size();
return list.size() == list.stream().distinct().count();
return list.stream().allMatch(ConcurrentHashMap.newKeySet()::add);
the last one (#3) has the following benefits:
Additional notes:
• ConcurrentHashMap.newKeySet()
is a way to create a concurrent hash set.
• Previous version of this answer suggested the following code: return list.stream().sequential().allMatch(new HashSet<>()::add);
— but, as M. Justin commented, calling BaseStream#sequential()
doesn't guarantee that the operations will be executed from the same thread and doesn't eliminate the need for synchronization.
• Performance benefit from short-circuiting won't usually matter, because we typically use such code in assertions and other correctness checks, where items are expected to be unique; in contexts where duplicates are really a possibility, we typically not only check for their presence but also to find and handle them.
• M. Justin's provides a clean (not using side-effects) short-circuiting solution for Java 22, but that's not a one-liner.
HashSet.add
is not thread safe. –
Seasoning forEach
call with side-effects - which, by design, is most of them. You had better be able to safely assume List.stream()
is run on the initiating thread, or the entire Java ecosystem would crumble. There's a lot of code out there that makes this assumption. –
Halfbeak You can use the counting collector.
Stream.of(1, 3, 4, 6, 7, 5, 6)
.collect(Collectors.groupingBy(
Function.identity(), Collectors.counting()))
.entrySet().stream().anyMatch(e -> e.getValue() > 1)
Given array arr,
arr.length != Arrays.stream(arr).distinct().count()
will help check for duplicates
Started this class as a StreamTool, but I think there must be an even better way with reduce or similar:
public class StreamTool {
/**
* Whether stream records are unique in that stream.
* @param <T> Type of records
* @param records
* @return true if there are no duplicates, false otherwise
*/
public static <T> boolean isUnique(Stream<T> records) {
return records.allMatch(new HashSet<>()::add);
}
}
Here is a short-circuiting solution that uses a custom gatherer, using the JEP 461: Stream Gatherers Java 22 preview language feature:
List<Integer> list = List.of(1, 2, 4, 1, 3, 4, 4, 1);
// A parallel stream must be used until JDK-8328316 is fixed
boolean hasDuplicate =
list.stream().parallel().gather(hasDuplicate()).findAny().orElseThrow();
static <T> Gatherer<T, ?, Boolean> hasDuplicate() {
class State {
boolean hasDuplicate = false;
final Set<T> elements = new HashSet<>();
}
Gatherer.Integrator<State, T, Boolean> integrator =
(state, element, downstream) -> {
if (state.elements.add(element)) {
return true;
} else {
state.hasDuplicate = true;
return false;
}
};
return Gatherer.ofSequential(
State::new,
integrator,
(state, downstream) -> downstream.push(state.hasDuplicate)
);
}
This custom gatherer produces a stream with a single Boolean
element indicating whether a duplicate was found. It keeps track of each element encountered in a Set
, and whether a duplicate has been found as a boolean
. It short circuits immediately with true
when a duplicate element is encountered, or returns false
if the end of the stream is reached without encountering any duplicates.
Note that this doesn't work on sequential streams in Java 22 due to JDK-8328316, but it does work in Java 23 starting with b16.
This gatherer can be made parallel (at the cost of some increased complexity) by adding a combiner which can merge two states together, e.g.:
BinaryOperator<State> combiner = (s1, s2) -> {
if (s1.hasDuplicate || s2.hasDuplicate) {
s1.elements.clear(); // No further need to keep these in memory
s1.hasDuplicate = true;
} else {
int totalSize = s1.elements.size() + s2.elements.size();
s1.elements.addAll(s2.elements);
s1.hasDuplicate = s1.elements.size() != totalSize;
}
return s1;
};
return Gatherer.of(
State::new,
integrator,
combiner,
(state, downstream) -> downstream.push(state.hasDuplicate)
);
An intermediate operation that transforms a stream of input elements into a stream of output elements, optionally applying a final action when the end of the upstream is reached. […]
[…]
There are many examples of gathering operations, including but not limited to: grouping elements into batches (windowing functions); de-duplicating consecutively similar elements; incremental accumulation functions (prefix scan); incremental reordering functions, etc. The class
Gatherers
provides implementations of common gathering operations.API Note:
A
Gatherer
is specified by four functions that work together to process input elements, optionally using intermediate state, and optionally perform a final action at the end of input. They are:
- creating a new, potentially mutable, state (
initializer()
)- integrating a new input element (
integrator()
)- combining two states into one (
combiner()
)- performing an optional final action (
finisher()
)
Stream.gather(Gatherer<? super T,?,R> gatherer)
:
Returns a stream consisting of the results of applying the given gatherer to the elements of this stream.
Gatherer.ofSequential(initializer, integrator, finisher)
Returns a new, sequential,
Gatherer
described by the giveninitializer
,integrator
,combiner
, andfinisher
.
Gatherer.of(initializer, integrator, combiner, finisher)
Returns a new, sequential,
Gatherer
described by the giveninitializer
,integrator
, andfinisher
.
false
emitted. I'd probably define a duplicates
-Gatherer like so: ``` static <T> Gatherer<T, ?, T> duplicates() { return Gatherer.ofSequential( HashSet::new, (state, element, downstream) -> state.add(element) || downstream.push(element) ); } ``` And then do: ``` var hasDuplicates = stream.gather(duplicates).findFirst().isPresent(); ``` –
Pallbearer Collector
(but not Stream.toList
): list.stream().gather(hasDuplicate()).collect(Collectors.toList()).getFirst()
. –
Seasoning Use set.add()
it is faster.
Set<T> items = new HashSet<>();
list.stream().filter(n -> !items.add(n))
.collect(Collectors.toSet());
new HashSet
and other via Collections.toSet()
? Also above code could be reduce to new HashSet<>(list)
, but question is not about removing duplicates, it is about checking if duplicate elements exist. –
Abseil © 2022 - 2025 — McMap. All rights reserved.