Java Virtual threads and ConcurrentHashMap

Asked 22/1, 2024 at 16:55 Answered 23/1, 2024 at 12:22

I stumbled upon a problem with Virtual threads and ConcurrentHashMap.

As the code below demonstrates, if you enter a lock within the computeIfAbsent, the VT may not awake, ever. It really depends how the VT are scheduled on carrier threads, but I can reproduce this easily.

I'm aware that ConcurrentHashMap uses synchronized, and VT doesn't play well with synchronized, but nowhere it says to avoid ConcurrentHashMap, as that would be quite disappointing.

class Scratch {
    public static void main(String[] args) throws InterruptedException {
        var lock = new ReentrantLock();
        var map = new ConcurrentHashMap<Integer, String>();

        var callables = new ArrayList<Callable<Void>>();
        for (int i = 0; i < 32 /* more than threads in FJ pool */; i++) {
            callables.add(() -> {
                System.out.println("Spawned thread " + Thread.currentThread().threadId());
                map.computeIfAbsent(1, k -> {
                    System.out.println("on lock() " + Thread.currentThread().threadId());
                    lock.lock();
                    try {
                        System.out.println("within lock() " + Thread.currentThread().threadId());
                    } finally {
                        lock.unlock();
                    }
                    return "";
                });
                System.out.println("Exiting " + Thread.currentThread().threadId());
                return null;
            });
        }

        new Thread(() -> {
            System.out.println("locking " + Thread.currentThread().threadId());
            lock.lock();
            try {
                Thread.sleep(2000);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
            System.out.println("unlocking " + Thread.currentThread().threadId());
            lock.unlock();
            System.out.println("unlocked " + Thread.currentThread().threadId());
        }).start();

        while (!lock.isLocked()) {
            Thread.sleep(10);
        }

        // it works with Executors.newCachedThreadPool()
        try (var es = Executors.newVirtualThreadPerTaskExecutor()) {
            es.invokeAll(callables);
        }
        System.out.println("done");
    }
}

Dagda answered 22/1, 2024 at 16:55 Comment(6)

Note that System.out.println is synchronised and so all threads are contending for very same monitor. Try comment out the println inside the computeIfAbsent – Disremember 22/1, 2024 at 17:18

Exactly! Virtually all logging frameworks have some sort of locking. It's very common to have some logging inside computeIfAbsent(). Any non-trivial java software has that. Where do VT fit in non-trivial software? – Dagda 22/1, 2024 at 18:37

I don't think ConcurrentHashMap itself is much of a problem but performing long tasks in locking operations like computeIfAbsent is. – Sturgeon 22/1, 2024 at 19:5

Fix spelling in title. – Flay 23/1, 2024 at 6:18

I don't get the point of the lock. If you remove it the example seems to block in the same manner. – Critta 23/1, 2024 at 10:47

computeIfAbsent: “Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple.” Performing operations acquiring a lock within the function surely does not qualify. And it defeats the entire purpose of a ConcurrentHashMap. – Postage 23/1, 2024 at 18:0

I reproduced this problem with a simpler example.

class Scratch {
    public static void main(String[] args) throws InterruptedException {
        Object a = new Object();
        var callables = new ArrayList<Callable<Long>>();
        for (int i = 0; i < 32 /* more than threads in FJ pool */; i++) {
            callables.add(() -> {
                Long id = Thread.currentThread().threadId();
                System.out.println( "starting " + id);
                synchronized(a){
                    System.out.println( "grabbed " + id );
                }
                System.out.println("Exiting " + id);
                return id;
            });
        }

        System.out.println("invoking!");
        try (var es = Executors.newVirtualThreadPerTaskExecutor()) {
            es.invokeAll(callables);
        }
        System.out.println("done");
    }
}

This will cause a deadlock with virtual threads, but not with a fixed thread pool or a cached thread pool. The issue appears to be caused by the interplay between the System.out.println and synchronize.

Can this be avoided by synchronizing on the Object before calling println?

eg. If you want to log while holding the monitor to the map, ie map.computeIfAbsent make sure you always have the monitor when you log.

Object logLock = new Object(); //declared at the same level as your lock.

Then inside of your callables.

synchronized(logLock){
    System.out.println("Spawned thread " + Thread.currentThread().threadId());
}

synchronize(logLock){
    map.computeIfAbsent(...)
}

From testing your example, just doing the first synchronized(logLock) solves the problem and I never get the deadlock. It does seem like a bug.

Critta answered 23/1, 2024 at 12:22 Comment(5)

Without the logLock, the code is effectively synch(out); synch(b) {synch(out); } and likely to deadlock. – Disremember 23/1, 2024 at 12:39

@Disremember That was my original thought, if the VT version some how wasn't releasing out you get the double locking deadlock. Where you can avoid the dead lock by making sure you always acquire the locks in the same order. – Critta 23/1, 2024 at 12:43

I am able to reproduce this in 90% of the attempts, and every time the test hangs there is a pinned thread that never reaches "starting" point. Can this situation be correctly called a "deadlock"? It seems to me no, in the sense of Dining Philosophers. – Inbred 23/1, 2024 at 12:56

Interestingly, such an example has been found half a year ago already. Performing computeIfAbsent inside a synchronized block defeats the entire purpose of ConcurrentHashMap. But the function passed to computeIfAbsent should not perform blocking operations. – Postage 23/1, 2024 at 18:6

Pinning is going to happen only in native code, soon. – Dagda 25/6, 2024 at 7:30

The JDK core libs in general do not specify or even footnote an impl detail in their javadoc about all sorts of highly useful things in regards to threading: They rarely document their color, and indeed don't document their locks very well. They should - that's directly impacting on a library caller's code and thus must be documented. Point is, even if you think 'this is a bug, can it be fixed', the 'fix' would be: "Now the javadoc is updated to tell you that, indeed, it uses synchronized".

Situation: No need to modify it

If the map is fully formed before you start the first job (i.e. no method is called on that instance of ConcurrentHashMap that modifies it - no put, no computeIfAbsent, no clear, and so on), then you can just continue to use CHM: Its read-only methods (.get, .getOrDefault, size(), that sort of stuff) do not synchronized () {} on anything.

The answer in this scenario would be: "Nope, you're good. Don't worry that synchronized is in the source code of ConcurrentHashMap.java - it will not affect you at all here".

Situation: Modify during job

If the task does require the ability for a job (or some other process, after the first job starts) to modify the map, then fundamentally, you need to set up synchronization: You can't just tell the system to just make a few million jobs and run em in whatever way it pleases (such as: Make a few million VTs, toss em all in a queue, get a bunch of carriers to muddle through the lot), because fundamentally the job at hand just cannot be done that way: That's what 'synchronized' is all about, really: If you have a need to send data from one job to another job, then that's actually really complicated, and requires an externally synced system (such as a DB, or a message queue), or synchronization primitives that tell the CPU to do the job (synchronized, volatile, or primitive constructs such as Compare-And-Set, used by stuff like AtomicReference, and available in various core APIs).

All those primitives have a significant cost associated to them. This should be obvious: The amount of time a CPU needs to read some data from one of its L1 cache blocks is on the order of magnitude of a single cycle of a single pipeline, whereas attempting to send any data to another core is hundreds of such cycles. The sheer distance between cores already means it cannot, by way of the speed of light, be done in a single cycle.

CHM can't magic that away.

VTs are indeed 'not great' at this situation to put it mildly. Simply don't use them here. Which sounds like useless advice. But perhaps not quite:

Solution 1 - split up a single job

Can you insulate the part of the code that needs to interact with that map (which is, by definition or we wouldn't be in this section of the answer, a map shared amongst the VTs), extracting out a whole bunch of code that can be put in map-reduce terms (The code gets the data it needs to do its job as a parameter, and reports its result by returning it - it does not need to modify any fields, nor does it need to read any fields that are being modified by stuff outside of the job's code, i.e. does not need that map at all)?

If that's possible, then do that: Have a separate handler that does the task of extracting what it needs from the CHM, registering a job, and processing job-now-done reports, re-integrating the result of the job back into the CHM. At which point it might be worthwhile to reduce that part of the total task (the code that interacts with the map) to a single thread, as it'll likely run mostly with single threaded performance due to all the synchronized going on, and then just use a plain jane HashMap.

Solution 2 - different data structures

If that's not an option, you need a different data concept than what CHM provides. Can this stuff be put in terms of database accesses? Databases like postgres can be tasked to apply optimistic locking. Which may be a disaster here (if you do a lot of interaction with that map, in which case, it's probably fundamentally impossible to meaningfully speed it up with concurrency!), but if the job isn't 'primarily reading and writing that map', that might offer faster performance than CHM.

Solution 3 - different job delineation

If there are 1000 jobs and it is possible to group these jobs into, say, 8 subsets with the property that any job in subset A does not affect what subset B has to do (perhaps it might modify your CHM, but not in a way that has any effect whatsoever on jobs from subset B, C, D, E, F, G, or H ), then instead you can create 8 jobs instead of a thousand, each job being fully isolated (no need for a CHM, the job makes the map, fills the map, and returns it as the final product), then simply combine your 8 sub results into one larger result.

Solution 4 - put the hammer away

If none of the above solutions are an answer, then the job just fundamentally is what it is. VTs aren't magic 'go fast' juice. They are a way to take a job that can be done quickly on current hardware, but where prior to the introduction of VTs, it was somewhat annoying and complicated to write the code in the java language such that it would hit the performance peak that the problem in theory could reach - and make it much, much easier to write it.

They don't open doors to letting you write code that you never could before. This is more generally true of all language features (languages are turing complete. You can do everything a computer can do in em - just, some jobs are needlessly convoluted to write in a language, and a new lang feature might address that).

If you already had code that worked without VTs, then maybe that was already near peak performance, and refactoring it to use VTs neither makes it go any faster, nor makes it easier to maintain/understand.

Solution 5 - don't worry about it

Perhaps you are using CHM not at all for its 'concurrency' aspects, but solely for its API: CHM has various methods that normal maps do not have, that really have nothing to do with 'concurrent' specifically. For example, chm.keySet(defaultValue) gives you a set which has an implementation of add: Adding things to the keyset adds them to the map, with as 'value' part that default value. HashMap does not have this method. I'm not sure why not.

Similarly, there's stuff like reduceEntriesToInt, which you may like. Probably you can replace that by using a plain jane HashMap and using hashMap.entrySet().stream().reduce instead.

However, if this is what you're doing:

Each job makes a CHM. This object is never shared with any other job. It's made by the method, never assigned to a field, just used and ditched.
It's not a plain HashMap only because you need those methods CHM has that HM doesn't.

Then CHM ends up with 'useless synchronize' - being the only thread that ever syncs on an object, with no other thread even having a reference to the monitor, let alone an attempt to synchronized() on it. Hotspot should be quite good at eliminating those monitor acquires. Do you have an actual performance problem (as in, you measured it, and it's slower than you thought it should be), or, are you merely afraid that the maxim 'VTs and synchronized don't play nice with each other' therefore means using job-local CHMs is a problem? 'VTs and synchronized don't play nice with each other' is an oversimplification, because the actual maxim is 'VTs and waiting on acquiring monitors don't play nice with each other' - and that wouldn't happen if each CHM is job-local.

Run your profiler to make sure, but, the JVM does lots of 'useless synchronizing', and Hotspot is all about optimizing common patterns. It's good news:

useless sync is a common pattern.
useless sync is easy to optimize (just... don't acquire anything)
useless sync is easy enough to detect.
Therefore, JVM hotspot engines are good at eliminating that cost and will do it, and do it cheaply.

Either #4 is correct, or 'JVM hotspot engineers are incompetent' is correct. Could be, but that's an extraordinary claim with no facts in evidence to support it.

Metal answered 22/1, 2024 at 17:34 Comment(2)

Thank you. I needed to hear #4. VT executors are not a drop-in replacement for ordinary thread-pool-executors. And I think JVM hotspot management is incompetent. – Dagda 22/1, 2024 at 18:34

+1 for the link to the article; that was a great read. – Illegitimacy 22/1, 2024 at 18:43