Why there is no ConcurrentHashSet against ConcurrentHashMap
Asked Answered
B

11

698

HashSet is based on HashMap.

If we look at HashSet<E> implementation, everything is been managed under HashMap<E,Object>.

<E> is used as a key of HashMap.

And we know that HashMap is not thread safe. That is why we have ConcurrentHashMap in Java.

Based on this, I am confused that why we don't have a ConcurrentHashSet which should be based on the ConcurrentHashMap?

Is there anything else that I am missing? I need to use Set in a multi-threaded environment.

Also, If I want to create my own ConcurrentHashSet can I achieve it by just replacing the HashMap to ConcurrentHashMap and leaving the rest as is?

Bainbridge answered 9/8, 2011 at 7:14 Comment(7)
After looking at the API, if I were to guess I would say that it seems to come down to 2 factors, (1) avoiding having to create a class in Java API for every little bit of functionality needed (2) Providing convenience classes for more frequently used objects. I personally prefer LinkedHashMap and LinkedHashSet since they guarantee order is the same as insertion order, the only reason for using a set is to avoid duplication, often I still want to maintain insertion order.Erysipelas
@Ali, I personally prefer LinkedHashMap and LinkedHashSet you will go far :)Colombi
A bit old question, but as it is the first result in Google, may be useful to know that ConcurrentSkipListSet already has the implementation of ConcurrentHashMap. See docs.oracle.com/javase/7/docs/api/java/util/concurrent/…Dania
What I saw from Java source ConcurrentSkipListSet is built on ConcurrentSkipListMap, which implements ConcurrentNavigableMap and ConcurrentMap.Bainbridge
possible duplicate of is Java HashSet thread-safe for read only?Rejection
Very good point of view. Perhaps the guys at java thought that always is good to associate things.. so they promote fiercely this idea!!Vizcacha
Now I need this feature too, and I have the same question as you have. I used ConcurrentSkipListSet for sorting as well, the bottom storage is a map, but unfortunately not a Hash one. I don't think this has an excuse to not support it, maybe comes in future edition.Kerril
V
745

There's no built in type for ConcurrentHashSet because you can always derive a set from a map. Since there are many types of maps, you use a method to produce a set from a given map (or map class).

Prior to Java 8, you produce a concurrent hash set backed by a concurrent hash map, by using Collections.newSetFromMap(map)

In Java 8 (pointed out by @Matt), you can get a concurrent hash set view via ConcurrentHashMap.newKeySet(). This is a bit simpler than the old newSetFromMap which required you to pass in an empty map object. But it is specific to ConcurrentHashMap.

Anyway, the Java designers could have created a new set interface every time a new map interface was created, but that pattern would be impossible to enforce when third parties create their own maps. It is better to have the static methods that derive new sets; that approach always works, even when you create your own map implementations.

Vinculum answered 9/8, 2011 at 7:17 Comment(19)
Am I right to say that if you create the set this way from ConcurrentHashMap, you lose the benefits you'd get from ConcurrentHashMap ?Unbreathed
There are no benefits to lose. newSetFromMap's implementation is found starting on line 3841 in docjar.com/html/api/java/util/Collections.java.html. It's just a wrapper....Vinculum
@zneak, isn't the add operation on a Set equivalent to what putIfAbsent does? The other operations in ConcurrentMap don't offer any functionality that doesn't exist in Set. Set already has a remove method and I don't see how a ConcurrentMap-like replace operation would make sense on a Set object.Tinsel
@Andrew, I think the motivation behind using a "ConcurrentSet" stems from not the API but rather the implementation - thread safety but without a universal lock - multiple concurrent reads for instance.Photoelectron
@UstamanSangat: hmmm, I think the comment I was responding to was deleted at some point. I vaguely remember it asking about why there wasn't a ConcurrentSet interface (and my argument was that it wouldn't provide any new methods; it'd only be a marker interface, like RandomAccess. Admittedly Set itself is basically a marker interface since it doesn't provide new methods; just new constraints).Tinsel
If your elements are Comparable, why not simply use ConcurrentSkipListSet?Selestina
ConcurrentSkipList has lots of (size) overhead and the lookups are slower.Chism
My complaint is that having to go this route, you lose the HashSet(Collection<? extends E> c) constructor, with its nice syntactic sugar for instance converting a List to a Set. Or am I wrong?Ressler
@rogerdpack: Good point. I just ran into this now. newSetFromMap could do that work for us.Pennipennie
One thing I found with the ConcurrentHashMap.newKeySet (and the Collections alternative route) is that it doesn't cater for null keys and will NPE on a remove call (which HashSet will not do).Caulk
The real problem when the JDK not providing a true ConcurrentSet interface is that you'll be missing some atomic operations on that Set, namely addIfAbsent, computeIfAbsent, etc. With JDK8 you can create those yourself because ConcurrentHashMap.KeySetView gives you access to the underlying ConcurrentMap. With JDK7 you're stuck and may have to resort to creating your own wrapper.Jut
take care when using this approach, since some methods are not implemented correctly. Just follow the links: Collections.newSetFromMap creates a SetFromMap. e.g. the SetFromMap.removeAll method delegates to the KeySetView.removeAll, that inherits from ConcurrentHashMap$CollectionView.removeAll. This method is highly inefficient in bulk removing elements. imagine removeAll(Collections.emptySet()) traverses all elements in the Map without doing anything. Having a ConcurrentHashSet that is corretly implemented will be better in most cases.Demott
To follow up on @Demott comment, HashSet gets it's removeAll from AbstractSet, which will at least iterate on the smaller of the two collections... so it's not super terrible how the CollectionView works... especially if you look a bit closer at the perf characteristics of concurrent map... by always using it's own iterator, it can use Iterator.remove to detach an item, whereas the AbstractSet version might iterate the argument and then call .remove(), which causes a complete hash table lookup (versus Iterator.remove, which can avoid the expensive checks / graph transversals).Enfleurage
Basically, if the set/map knows it is expensive to do a "cold remove()", it can make the (right) choice to avoid pathological performance when both collections are large, at the expense of (maybe) poorer performance when collections are small or empty. Especially empty, since you, the caller, can optimize by filtering out empty queries before you make them.Enfleurage
@Enfleurage my comment relates to Java8 and before. they have adressed this with Java9+, so the method indeed makes these checks now.Demott
Ah, cool. many thanks for the update. My company hasn't made the leap to java9+ just yet, so always good to pick up tidbit for when we have to bite the bullet (soon). :-)Enfleurage
Then why ConcurrentSkipListSet exists, when you can derive a set from a ConcurrentSkipListMap?Risner
That's a good question for the author, Doug Lea. He definitely decided it was a good idea to build ConcurrentSkipListSet by wrapping a ConcurrentSkipListMap -- see line 65 of fuseyism.com/classpath/doc/java/util/concurrent/…. I suppose one could figure out why he did that by comparing the source code of the map and set (perhaps in this case there was a benefit to wrapping the map?) but it might be easier to ask Doug directly. I suspect it has something to do with skiplists being sorted sets as opposed to unrestricted sets that can use hashing.Vinculum
@NickNick that’s a question we could expand to the entire Collection API. HashSet is just a wrapper around HashMap, LinkedHashSet a wrapper around LinkedHashMap, TreeSet is just a wrapper around TreeMapWearable
E
129

Prior to Java 8:

Set<String> mySet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

Java 8 or greater:

ConcurrentHashMap.KeySetView<String, Boolean> mySet = ConcurrentHashMap.newKeySet();
Eudocia answered 23/6, 2014 at 9:21 Comment(3)
As Ray Toal points out in his answer, the preferred way to this in 2021 is definitely ConcurrentHashMap.newKeySet() (which does a similar thing behind the scenes), as it's more succinct.Digged
this is thread-safe right?Gettings
yes, it is thread safeEudocia
B
103

With Guava 15 you can also simply use:

Set s = Sets.newConcurrentHashSet();
Bergerac answered 1/2, 2015 at 20:35 Comment(4)
This is always a nightmare. If you have a set or a map that does not indicate whether or not something is thread safe you find all kind of hazards and desasters happen in maintaince. I always would want a type that indicates thread safety for collections (or not).Calpac
The method description is literally "Creates a thread-safe set backed by a hash map"Bergerac
As I said, there is a ConcurrentSet<E> missing. ConcurrentHashMap comes along with a ConcurrentMap interface to indicate this. This is the very same reason I always add this ConcurrentSet interface as well.Calpac
The ConcurrentMap interface is much more than a marker interface - it contains concurrency-specific methods such as putIfAbsent, remove and replace.Pinter
C
86

Like Ray Toal mentioned it is as easy as:

Set<String> myConcurrentSet = ConcurrentHashMap.newKeySet();
Chronometry answered 7/7, 2016 at 17:28 Comment(1)
This seems to require Java 8. Looking at implementation, this also seems to be just a wrapper of ConcurrentHashMap.Papandreou
R
23

It looks like Java provides a concurrent Set implementation with its ConcurrentSkipListSet. A SkipList Set is just a special kind of set implementation. It still implements the Serializable, Cloneable, Iterable, Collection, NavigableSet, Set, SortedSet interfaces. This might work for you if you only need the Set interface.

Regardant answered 2/10, 2013 at 16:17 Comment(4)
Note that ConcurrentSkipListSet's elements should be ComparableSelestina
If you need to extend from a Set that is concurrent, this is the only solution here that will work.Capacity
ConcurrentSkipListMap adds unnecessary performance penalty of having tree as base data structure, instead of using HashTable, even when you do not need sorting/navigation functionality.Aforesaid
don't use ConcurrentSkipListSet unless you want a SortedSet. A usual operation like add or remove should be O(1) for a HashSet, but O(log(n)) for a SortedSet.Demott
R
22

As pointed by this the best way to obtain a concurrency-able HashSet is by means of Collections.synchronizedSet()

Set s = Collections.synchronizedSet(new HashSet(...));

This worked for me and I haven't seen anybody really pointing to it.

EDIT This is less efficient than the currently aproved solution, as Eugene points out, since it just wraps your set into a synchronized decorator, while a ConcurrentHashMap actually implements low-level concurrency and it can back your Set just as fine. So thanks to Mr. Stepanenkov for making that clear.

http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#synchronizedSet-java.util.Set-

Rejection answered 5/11, 2014 at 18:56 Comment(1)
the synchronizedSet method just creates the decorator under Collection to wrap methods that could be thread-safe by synchronization the whole collection. But ConcurrentHashMap is implemented using non-blocking algorithms and "low-level" synchronisations without any locks of the whole collection. So wrapers from Collections.synchronized... is worse in multi-threads environments for performance reasons.Woodring
P
14

You can use guava's Sets.newSetFromMap(map) to get one. Java 6 also has that method in java.util.Collections

Poinciana answered 9/8, 2011 at 7:17 Comment(6)
it's available in java.utll.Collections and set of CHM is usually a bad thing anyways.Colombi
yeah, I noticed it is added in Java 6, so added it to the answerPoinciana
The main this is that if it is ThreadSafe, and I really doubt that.Bainbridge
@Talha, it's thread safe, however thread safety alone means nothingColombi
Sometimes it means everything. Its shaldom a performance problem unless it is part of an algorithm which are usually implemented in a way that the need for concurrent mapping is minimized.Calpac
Indeed, I have faced visibility issues with Sets.newSetFromMap(map), and since then switched to ConcurrentHashMap.newKeySet().Bedford
W
8
import java.util.AbstractSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;


public class ConcurrentHashSet<E> extends AbstractSet<E> implements Set<E>{
   private final ConcurrentMap<E, Object> theMap;

   private static final Object dummy = new Object();

   public ConcurrentHashSet(){
      theMap = new ConcurrentHashMap<E, Object>();
   }

   @Override
   public int size() {
      return theMap.size();
   }

   @Override
   public Iterator<E> iterator(){
      return theMap.keySet().iterator();
   }

   @Override
   public boolean isEmpty(){
      return theMap.isEmpty();
   }

   @Override
   public boolean add(final E o){
      return theMap.put(o, ConcurrentHashSet.dummy) == null;
   }

   @Override
   public boolean contains(final Object o){
      return theMap.containsKey(o);
   }

   @Override
   public void clear(){
      theMap.clear();
   }

   @Override
   public boolean remove(final Object o){
      return theMap.remove(o) == ConcurrentHashSet.dummy;
   }

   public boolean addIfAbsent(final E o){
      Object obj = theMap.putIfAbsent(o, ConcurrentHashSet.dummy);
      return obj == null;
   }
}
Wingspread answered 25/9, 2014 at 5:53 Comment(2)
I like the idea to use Boolean.TRUE instead of an dummy object. It is a little bit more elegant. Also using NULL is also possible since it would be available in the key set even if mapped to null.Calpac
@MartinKersten fyi, ConcurrentHashMap doesn't allow null valuesCrossroad
S
2

Why not use: CopyOnWriteArraySet from java.util.concurrent?

Schurman answered 7/10, 2016 at 14:56 Comment(3)
Because CopyOnWriteArraySet copies the entire collection on any state mutation, which is not always wanted due to the performance impact. It's designed to work only in special cases.Eberhard
Additionally CopyOnWriteArraySet.contains() has a run-time of O(n) (has to check ever entry) where as HashSet/HashMap has O(1).Hannus
because this works only one way: if you're reading data owned/managed by another thread. if you need to pass data to that thread, you'd have to additionally update its atomic ref to updated array - but that makes little senseGimlet
H
0

We have ConcurrentHashSet in io.vertx which can be used against java.util ConcurrentHashMap. https://www.javadoc.io/static/io.vertx/vertx-core/3.0.0/io/vertx/core/impl/ConcurrentHashSet.html

Halfmast answered 17/1, 2023 at 8:33 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Secularity
A
0

Imho what we really need is a common ConcurrentSet interface which should be on par with ConcurrentMap interface.

All existing code should align on this new interface to clarify a bit when we have a Set that is concurrent (thread-safe) or not...

for example:

ConcurrentSet<String> set = ConcurrentHashMap.newKeySet();
ConcurrentSet<String> set = Collections.synchronizedSet(new HashSet<>());
ConcurrentSet<String> set = new ConcurrentSkipListSet<>();

Alexandro answered 1/3 at 12:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.