What is the name of this locking technique?
Asked Answered
L

6

5

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):

synchronized boolean containsSpecial() {
   return troveMap.contains(key);
}

Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).

I noticed that by changing the above to:

boolean containsSpecial() {
    if ( troveMap.contains(key) ) {
        // most of the time (>90%) we shall pass here, dodging lock-acquisition
        return true;
    }
    synchronized (this) {
        return troveMap.contains(key);
    }
}

I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).

Does this optimization look correct (knowing that once a key is there it shall stay there forever)?

What is the name for this technique?

EDIT

The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):

synchronized void addSpecialKeyValue( key, value ) {
    ....
}
Linn answered 7/12, 2011 at 17:54 Comment(5)
what does the code that updates the map look like?Rapture
@Dmitry Beransky: it's fully synchronized, I'll edit the question : )Linn
It looks really similar to the double-checked locking. In this case the checking is the operation you're after.Audiogenic
If this code works correct (which depends on contains working correctly while an insert is in progress (which I doubt by the way)), your synchronized block is really pointless. This is because synchronized may ensure that the returned value is correct, but there is no gurantee it hasn't changed when the return value is used. So the only reason to synchronize this method would be if troveMap.contains(key) is not guranteed to work while an update is in progress, in which case you can't call it without locking.Rhabdomancy
This is not safe. According to Rob Eden (Trove developer) retrievals are not guaranteed to preserve the internal structure of the map. See my edited answer.Actin
F
7

This code is not correct.

Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.

If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.

In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.

Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.

Floorboard answered 7/12, 2011 at 18:23 Comment(5)
Read-write lock would be more correct, but wouldn't it incur essentially the same locking penalty (with the exception that now multiple reads can happen at the same time?)Mathers
@MarkPeters I think (from a quick look at the code) that ReentrantReadWriteLock is nonblocking for reads if the read lock is available. Should be pretty speedy.Miletus
@erickson: +1... I wasn't aware of the potential infinite loop or runtime exception of seemingly innocent code! Oh well, time to read on the read-write locks : )Linn
@CedricMartin Please see my edit; it will be important to profile synchronization versus a read-write lock because Trove may be fast relative to locking.Floorboard
Besides the inner workings of the particular map implementation which might be perceived in an inconsistent state, the lack of synchronization applies to the key instances as well, on which the map may invoke equals. This incorrect code does not only require the keys to be thread safe, but also immune to unsafe publication, which may be true for immutable classes, but often fails with classes using locks or synchronized, as they usually do not guard their constructor.Bord
P
6

It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.

Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.

So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.

Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?

Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.

Phonation answered 7/12, 2011 at 18:2 Comment(27)
but my case is special... Once true, it shall always stay true. Are you sure the code is broken here? In addition to that the write/update are fully synchronized. Once a key is in there, it stays for good. My idea was that either I'd read true and know the key is in there for sure (once it's there, it cannot leave) or the key ain't there, so I acquire a full lock and see the updated value if there's been a write (write being fully synchronized).Linn
First, the double-checked locking problem is a thing of the past (not that you don't have to worry about it; it is just easy to avoid now). Second, he's not reassigning troveMap and it's append-only. How would it fail, even if there were thread-local copies?Mathers
@CedricMartin: Of what type is your troveMap?Phonation
@jmg: TObjectInt and TIntInt, I did this in two different spots. Why does it matter which type the Trove map is? (I'm very interested : )Linn
@MarkPeters: How, is it a thing of the past? In the past people believed it to be correct. Today, we know that it requires at least a volatile modifier, if the field in question is primitive. But there is no such thing as a volatile for a whole object structure, i.e. that what troveMap points to.Phonation
This is not DCL, so I don't think the scenario you are referring to can happen. That's why I asked for the other piece of the code. Additions into the map are done under a synchronized function, so the new state is being copied back into global memoryRapture
In your answer, you say therefore, it never synchronizes and it never gets the updated memory. Isn't that categorically false in @Cedric's example? If the first lookup says the key doesn't exist, it immediately reaches a synchronization point and checks again. How does that not make it safe?Mathers
@jmg: You don't need volatile since it's not being reassigned.Mathers
@DmitryBeransky: But is it copied back from the global memory into the local memory of the reader?Phonation
@MarkPeters: But surely reassigments happen when inserting objects. It is true that every insertion is going to be fine. But readers which never synchronize might never see updates or they might see inconsistent states.Phonation
@DmitryBeransky: Exactly that is my point. When they enter a synchronize block, but what if they never enter a synchronize block. The OP said the contains check should almost always result in true.Phonation
@jmg: You're completely missing the point of Cedric's code. If the reader doesn't see the update, it immediately goes into a synchronize block which then eliminates memory copy problems. Yes, it might not see the update the first time around, but Cedric has crafted the code so that that is harmless.Mathers
@jmg: they never enter the synchronize block only when the the local memory has already been updated with true.Rapture
@MarkPeters: Ok, then I have to ask more. Where can I find that trove map? Is it impossible that contains() return a false positive because it was running on an inconsistent view of the state of the map?Phonation
@jmg: That's a relevant point. Certain things (e.g. assignment) are atomic in Java so unless TroveMap goes fully out of its way to screw with you, it should never have a false positive. It's not likely, but it is a concern.Mathers
@Phonation Assuming it's based on a regular map, a false positive is not possible. No matter how out-of-date the map is, it can never return true if the value doesn't exist. Just because something isn't synchronized doesn't mean everything goes to hell (unless the map hasn't been initialized yet) - it just means you get an earlier version.Confiscable
@MarkPeters: One assignment of an int is atomic in Java, true. But how many assignments happen for one addition of an element to the map? They have to be consistent as a whole, or contains() has to be written such that it works correctly after every partial execution of a JIT-compiled version of the add method.Phonation
@MarkPeters - Assignments of doubles and longs are not necessarily atomic. If the map is using long keys, OP's code is definitely not safe.Actin
@MarkPeters: You said: a false positive is not likely. So, it is very likely when you run your program long enough on enough machines. And then it might silently corrupt your result. If you are fine with that do it.Phonation
@jmg: What I was trying to say was that it was unlikely that the implementation would allow false positives, not that it was unlikely to occur in practice. I'm for one removing my downvote, I think there was some valid confusion but the main point of what you're trying to say lives on. (Err... well I will if/when you edit your answer; vote is locked in).Mathers
An append might completely change the internal structure of the map. The contains code might be iterating through entries while another thread modifies the chain of entries. SO, IMHO, it's not thread-safe.Rossetti
@JBNizet has it right. The big problem here isn't the false negatives or false positives, it's that you may be getting some plain-ol-fashioned race conditions that will break the internals of the map.Miletus
@KristianAntonsen You can get a version that "never existed," (as far as the thread that made the modification is concerned) which is pretty close to everything going to hell. Operations can be reordered such that threads see things the code would seem to imply are impossible.Miletus
@yshavit: By never existed, you mean that troveMap hasn't been initialized, right?Confiscable
@KristianAntonsen no, I mean it's in a state that it was never in according to the modifying thread. Very trivial example would be if a method did something like this.a = 5; this.b = 3 (assuming both fields start at 0). According to that method, there was never any time in which this.a = 0 && this.b == 3, but without correct synchronization/barriers, another thread could see such a state.Miletus
Just to complete my thought, imagine it's a tad more complex: this.conditionA = isConditionA(); if (conditionA) { this.conditionB = isConditionB(); } (assume both fields start false). According to this code, if conditionB is true, then you can assume conditionA is also true, and some other method may indeed make that assumption. But improper synchronization can make a thread see conditionB && (!conditionA), and now you're in big trouble.Miletus
@KristianAntonsen: yshavit is right. Such surprising states can arise due to reordering of statements by the compiler (JIT or not) and because of the complexity of cache hierarchies.Phonation
M
2

This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.

Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.

My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:

synchronized (this) {
    troveMap.put(key, value);
    troveMap = troveMap;
}

That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.

Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?

Miletus answered 7/12, 2011 at 18:34 Comment(1)
Erm, never mind on the bloom filter. I was thinking the common case is that the key is not in the map, but re-reading your comment, the common case is that it is in the map. A Bloom filter wouldn't help here.Miletus
T
1

I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads

boolean containsSpecial() {
    return troveMap.contains(key);
}

void addSpecialKeyValue( key, value ) {
    troveMap.putIfAbsent(key,value);
}

another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes

ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
    rwlock.readLock().lock();
    try{
        return troveMap.contains(key);
    }finally{
        rwlock.readLock().release();
    }
}

void addSpecialKeyValue( key, value ) {
    rwlock.writeLock().lock();
    try{
        //...
        troveMap.put(key,value);
    }finally{
        rwlock.writeLock().release();
    }
}
Tufted answered 7/12, 2011 at 18:9 Comment(3)
I use Trove because the map is too gigantic and I need a map backed by primitives. The (un)boxing / object wrappers make it too costly both memory and speedwise.Linn
@CedricMartin check the edit for another suggestion which will likely be faster than full locking (as you have now)Tufted
Trove isn't concurrent; it doesn't have putIfAbsent() or equivalent.Floorboard
A
1

Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.

You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:

Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.

If your map implementation uses object references as keys, then I don't see how you can get in trouble.

EDIT

The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:

Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.

So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.

Actin answered 7/12, 2011 at 18:19 Comment(1)
Even if it doesn't modify its internal structure on retrieval, it surely does on insertion, and that could happen concurrent to a retrieval. It's just not safe to use from multiple threads without external locking.Floorboard
V
0

Why you reinvent the wheel? Simply use ConcurrentHashMap.putIfAbsent

Vincenty answered 7/12, 2011 at 18:7 Comment(2)
because, as I wrote, the map is gigantic and I'm using Trove because both speedwise and performance-wise it runs circles around the default maps (which are backed by amazingly innefficient Object/wrappers constantly (un)boxing).Linn
Sorry for my fast judgement. Does addSpecialKeyValue( key, value ) invokes containsSpecial() again?Vincenty

© 2022 - 2024 — McMap. All rights reserved.