In Java, how can I ensure safe and consistent concurrent usage of a boolean flag while minimizing time performance impact?

Asked 17/6, 2020 at 19:46 Answered 21/7, 2020 at 11:35

In my scenario, I have DirtyArray objects that are basically primitive array wrappers that set a boolean "dirty" flag when a write access happens.

public class DirtyArray {
    private byte[] data;

    public DirtyArray(byte[] data) {
        this.data = data;
    }

    private boolean dirty = false;

    public void setValue(int index, byte value) {
        dirty = true;
        data[index] = value;
    }

    public boolean isDirty() {
        return dirty;
    }
}

The dirty flag only ever goes from false to true.

I need to make this safe for concurrent use: There are one or more threads that may modify the array (setValue). There are one or more threads that catch DirtyArray before it is GCed and are supposed to write it off to disk if it has been modified (isDirty).

Now, if I understand correctly, it is not safe to do this like above: Effectively, from the point of view of the isDirty thread, the data[index]=value store could be reordered before the dirty=true store. Therefore, seeing isDirty()==false does not guarantee that data has not been modified.

Is this correct?

Assuming that yes, then making the dirty flag volatile should fix this problem. However, in the following benchmark, I see a ~50x-100x slowdown when doing that.

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void touchAll()
{
    for (int i = 0; i < numEntities; i++)
        bytes.setValue(i, ( byte ) 2);
}

Using instead AtomicBoolean with the memory ordering get/set variants introduced in Java 9, I have this variant:

public class DirtyArray {
    private byte[] data;

    public DirtyArray(byte[] data) {
        this.data = data;
    }

    private AtomicBoolean dirty = new AtomicBoolean();

    public void setValue(int index, byte value) {
        if (!dirty.getPlain())
            dirty.setRelease(true);
        data[index] = value;
    }

    public boolean isDirty() {
        return dirty.getAcquire();
    }
}

which has the same performance (in the above benchmark) as the original non-volatile version.

Is this safe to do? That is, does it guarantee that when data is modified I will see isDirty()==true? (In my understanding it should be, but only because dirty only ever goes from false to true, never back.)

Are there other variation to achieve this guarantee, possibly even allowing to reset dirty to false, and ideally without negatively impacting performance?

Update

I agree with the general assessment of the answers so far that the only way to guarantee consistency between changed data array and dirty flag is to synchronize both setValue and isDirty. The race conditions that Pak Uula pointed out are the real problem, not so much getting the dirty flag visible. So basically the question I asked above is the wrong question...

For more context: this is about storing pixels for transparently cached images in https://github.com/imglib. It is used in very tight loops and taking the hit from synchronization is not really an option. The typical usage scenario is:

Multiple threads modify the image (which is backed by many DirtyArrays).
The isDirty() check happens on another thread that catches DirtyArray before it is garbage-collected (PhantomReference on the holder of the DirtyArray) and if it is dirty, write it to disk.

My opinion now is that this should be approached on a coarser level than individual setValue() calls. There are kind of "natural" synchronization points that happen because threads switch between DirtyArrays by getting them from a ConcurrentHashMap (glossing over the details of course), threads are in a thread pool and take jobs from a shared queue, or threads in some other way wait for each other. At these synchronization points, effects of earlier (in program order) setValue()s must become visible. So I tend to just use the plain unsynchronized version and rely on synchronization on the coarser level.

The only thing that gives me slight headaches is that the clean-up is triggered by garbage-collection and I have to make sure that (the holder of) DirtyArray is not collected before coarse level synchronization point. But I think I can make sure of that by keeping strong references and adding reachability fences if necessary.

Hyperopia answered 17/6, 2020 at 19:46 Comment(5)

"I see a ~50x-100x slowdown when doing that." --> that would only happen in a very contended scenario where multiple threads keep trying to write to the volatile field. In a more "normal" scenario where you have less contention and most threads only read from the variable, the difference would be much less. In your first snippet, with a volatile variable, using if (!dirty) dirty = true; may improve performance as you will have only one write to the variable. – Kiefer 23/6, 2020 at 21:19

I'd consider redesigning the app into transactional model. The worker threads set locks on their arrays for the duration of massive updates, and the backup thread can copy data only when those locks are released. For the beckup thread you can use 'pull' model, where it proactively studies the locks and looks for a dirty array, or waiting 'push' model, where working threads notify it to do backup. Or you can go on Plan B and agree with some inconhency in data - ignore the races for the sake of speed. The data might get slightly corrupted, but if is tolerable - go this way. It is simple and fast – Bourgogne 25/6, 2020 at 23:52

the release must be done in reverse: data[index] = value; dirty.setRelease..., you are releasing the writes, that will be guaranteed to be visible when the acquire happens. The problem is that release is not atomic, afaik, i.e. you need to observe the acquire part, to be sure. The volatile check with if (!dirty) dirty = true which @Kiefer provided, should help, a lot. especially on x86. otherwise, showing the entire JMH tests would help – Stabilize 30/6, 2020 at 3:36

Did I understand correctly that the only thread ever checking for isDirty is one that only triggers once, when the PhantomReference is about to be collected? If yes, do you really have a synchronization problem here? Because once that trigger happens, it is ensured that no other thread holds a reference to that DirtyArray which also means that you won't ever have a concurrent access at that point – Worrell 1/7, 2020 at 10:21

@codeflush.dev it's not about concurrent access, but about visibility. When one thread sets dirty = true it is not guaranteed that another thread sees that write unless there is some memory fence in between. – Hyperopia 1/7, 2020 at 11:2

AtomicBoolean (or any other of the atomic family) does not guarantee synchronization with a different variable. so no. the code does not guarantee that when data is modified, you will get isDirty()==true. The only guarantee you have is that all threads always see the same value of isDirty(). in fact, neither of the options listed make that guarantee.

the only way to make the guarantee is to have exclusive lock on the whole code block inside the set method: the if statement together with the assignment. this can be achieved with synchronized keyword (either on the method or in a code block) or use one of the locking mechanisms in java.util.concurrency

Heterosexual answered 17/6, 2020 at 19:55 Comment(3)

The java.util.concurrency.lock.* classes are no faster than primitive locks. Their utility is that they offer broader functionality than a primitive lock ... though not functionality that will help here. (AFAIK) – Mallard 24/6, 2020 at 8:40

I agree that synchronizing the whole block (as well as the read) is the only way to make setValue atomic. However, "AtomicBoolean does not guarantee synchronization with a different variable", this statement is wrong in my opinion (or I misunderstood). The JMM implies: If volatile write of variable X in thread A happens-before volatile read of X in thread B, then the effects of all actions in thread A before (in program order) the volatile write are visible to all actions after (in program order) the volatile read in thread B. The AtomicBoolean.get()/set() are volatile reads/writes. – Hyperopia 25/6, 2020 at 8:8

@tpietzsch, but the code, the setting of the atomic variable is typed before the setting of the array, so thread A can set the atomic bool, then go to sleep, and thread B will see isDirty() but the array did not change yet. if you try to reverse the order, you still have a problem of thread A setting the array but going to sleep before set the atomic bool. – Heterosexual 25/6, 2020 at 9:21

Several notes about your solution with AtomicBoolean.

What kind of thread-safety do you look for? I don't see why do you care that much about read-write ordering of dirty flag. Get/set operations for this flag might be perfectly ordered in time and still result in races.

Race condition

In both your examples I see a race condition between setValue and isDirty:

Thread 1 calls setValue, updates dirty flag and preempts to Thread 2 before setting data[index] = value.
Thread 2 calls isDirty, it returns true, but the array data is not updated yet, because Thread 1 was preempted.
If Thread 2 proceeds with actions on dirty array, e.g. copy it to another location, the copy would be non-coherent with the array in memory after Thread 1 resumes execution with data[index] = value.

Changing the order of operations solves this race, but introduces another one.

    public void setValue(int index, byte value) {
        data[index] = value;
        dirty = true;
    }

Consider the following sequence:

Thread 1 sets value at index 0. Flag dirty became true.
Thread 1 started setting value at index 1. data[1] = value got executed, and Thread 1 is preempted just before dirty = true.
Thread 2 check the flag and syncs the array to disk.
Thread 1 resumes execution, and sets the dirty flag.

The race is that the array is coherent with an external copy, but marked as dirty. This race might lead to excessive copy operations.

AtomicBoolean operation

Your use of AtomicBoolean is non-atomic. A thread could be preempted just between conditional and then statement. The atomic equivalent of the if construct used by you is dirty.compareAndSet(false, true);.

Actually, you don't need it this way. dirty.set(true) is sufficient since you uncoditionally set the dirty flag upon update any value.

But the sad story is that even AtomicBoolean.set doesn't save from the race condition. Thread could be preempted just between setting the dirty flag and updating the data. The race condition described above doesn't depent on atomicity of flag update.

Synchronized

Since the very beginning Java provides a pattern exactly for this case: synchronized.

public class SyncronizedDirtyArray {
    private byte[] data;

    public SyncronizedDirtyArray (byte[] data) {
        this.data = data;
    }

    private boolean dirty = false;

    public synchronized void setValue(int index, byte value) {
        dirty = true;
        data[index] = value;
    }

    public synchronized boolean isDirty() {
        return dirty;
    }
}

synchronized ensures that no thread can interfere between dirty = true; and data[index] = value;.

It might be slower compared to unsynchronized solution, but saves you from the race conditions. Try this solution with your benchmarks.

UPDATE: Benchmark

I made my own benchmark, very simple, but revealing.

I impemented several variants of DirtyArray with different sync primitives.

DirtyArrayNoSync: the plain implementation without any sync
DirtyArraySync: using synchronized
DirtyArrayLock: using ReentrantLock
DirtyArrayVolatile: using volatile boolean
DirtyArrayAtomic: the implementation from the topic starter
DirtyArrayAtomicSet: using AtomicBoolean.set and AtomicBoolean.get

The benchmar was to call setValue for every element in the array with 100 mln elements. The output is duration in milliseconds.

Configuration: OpenJDK 11.0.6, Core i7, Windows 10 Home

org.example.sync.DirtyArrayNoSync: 112
org.example.sync.DirtyArraySync: 2222
org.example.sync.DirtyArrayLock: 16752
org.example.sync.DirtyArrayVolatile: 7555
org.example.sync.DirtyArrayAtomic: 7591
org.example.sync.DirtyArrayAtomicSet: 3066

Yes, syncronization makes it 20 times slower, but it is the second fastest solution. All others, even with AtomicBoolean, are slower.

UPDATE 2.

Benchmark for jdk-14.0.1 showed practically same results:

org.example.sync.DirtyArrayNoSync: 102
org.example.sync.DirtyArraySync: 2323
org.example.sync.DirtyArrayLock: 16801
org.example.sync.DirtyArrayVolatile: 7942
org.example.sync.DirtyArrayAtomic: 7984
org.example.sync.DirtyArrayAtomicSet: 3320

Bourgogne answered 23/6, 2020 at 16:5 Comment(6)

"might be slower" is an understatement. this is why java v1 Hashtable (all synchronized Hashmap) and Vector (all synchronized ArrayList) were decommissioned. the problem with this "all synchronized" solution is that every access to isDirty() requires lock, even if setValue() is never called. the performance degradation is significant – Heterosexual 23/6, 2020 at 18:2

@SharonBenAsher, I added a benchmark. synchronized is slow, but faster than all the rest, including AtomicBoolean. – Bourgogne 24/6, 2020 at 1:19

Can you post your benchmark. And how many threads did you use? if you use a single thread in combination with synchronized, the system will make use of biased locking and perhaps even some adaptive spinning. But if you use multiple threads, having a non blocking solution is probably faster and less problematic. – Analyse 27/6, 2020 at 8:12

I will second the previous comment. How did you measure? without code that shows that, these are useless numbers. – Stabilize 30/6, 2020 at 3:0

@Analyse The benchmark was single threaded sequential loop. The objective was to measure the overhead from syncing. I'll upload it tomorrow from my office box. – Bourgogne 30/6, 2020 at 15:3

Single threaded benchmark.. hence biased locking makes synchronization very cheap. Repeat benchmark but with multiple threads – Analyse 30/6, 2020 at 16:59

I think that Sharon's answer has nailed it. All attempts to solve this using volatile or Atomic* classes will lead to race conditions where either isDirty() returns true before the array updates can be seen OR array updates can be see when isDirty() returns false.

The solution is to use locking. (In this case, primitive locking should be fastest. Pak's benchmarking seems to confirm this.)

But what I really wanted to say that you may well be worrying too much about the concurrency overheads of locking versus atomic types / volatile. In real applications, locking is a concern if the lock is contended, either because their are lots of threads trying to use it, or because the lock is held for a long time.

In this example, the second does not apply, a lock is likely to be held for 10 or less machine instructions. But the flipside is that your benchmark does not look remotely realistic.

If you were going to use the DirtyArray class like that, you would be better off with a setValue method that sets multiple values in one call.
If not ... you are most likely triggering far more cache flushing, etc than you would see in real life.

Another alternative is that you could relax your requirements. For example, does your application actually require isDirty() to correct at all times? If not, your initial solution may be good enough.

Finally, one thing that bugs be about your problem is how your application could make use of the strong consistency properties of the isDirty flag:

Suppose that you want to know if the array is dirty so that you can do something with the updated values in their current state. That implies that you need to stop the updates, but there is nothing in your class that will do that.
Suppose that you want to know if the array is dirty to determine if it (still) needs to be updated. Consider this attempt at doing this:
```
if (!da.isDirty()) {
    da.setValue(...);
}
```
There is a race condition in the above. Some other thread could jump in between the test and the setValue and make is own setValue call. So you would end up with two changes to the array.

Maybe the real version of this class has a more complicated API and/or other use-cases. But if that is so, I am not convinced that lessons learned from this question will translate into the more complicated problem.

Mallard answered 24/6, 2020 at 9:19 Comment(0)

You could try to use AtomicReferenceArray (see docs here). This array provides you guarantees with respect concurrent access.

I found an example of usage here which I copy-paste here for you to try:

import java.util.concurrent.atomic.AtomicReferenceArray;

public class TestThread {
   private static String[] source = new String[10];
   private static AtomicReferenceArray<String> atomicReferenceArray 
      = new AtomicReferenceArray<String>(source);

   public static void main(final String[] arguments) throws InterruptedException {

      for (int i = 0; i<atomicReferenceArray.length(); i++) {
         atomicReferenceArray.set(i, "item-2");
      }

      Thread t1 = new Thread(new Increment());
      Thread t2 = new Thread(new Compare());
      t1.start();
      t2.start();

      t1.join();
      t2.join();        
   }  

   static class Increment implements Runnable {
      
      public void run() {
         
         for(int i = 0; i<atomicReferenceArray.length(); i++) {
            String add = atomicReferenceArray.getAndSet(i,"item-"+ (i+1));
            System.out.println("Thread " + Thread.currentThread().getId() 
               + ", index " +i + ", value: "+ add);
         }
      }
   }

   static class Compare implements Runnable {
      
      public void run() {
         
         for(int i = 0; i<atomicReferenceArray.length(); i++) {
            System.out.println("Thread " + Thread.currentThread().getId() 
               + ", index " +i + ", value: "+ atomicReferenceArray.get(i));
            boolean swapped = atomicReferenceArray.compareAndSet(i, "item-2", "updated-item-2");
            System.out.println("Item swapped: " + swapped);
            
            if(swapped) {
               System.out.println("Thread " + Thread.currentThread().getId() 
                  + ", index " +i + ", updated-item-2");
            }
         }
      }
   }
}

Output:

Thread 9, index 0, value: item-2
Thread 10, index 0, value: item-1
Item swapped: false
Thread 10, index 1, value: item-2
Item swapped: true
Thread 9, index 1, value: updated-item-2
Thread 10, index 1, updated-item-2
Thread 10, index 2, value: item-3
Item swapped: false
Thread 10, index 3, value: item-2
Item swapped: true
Thread 10, index 3, updated-item-2
Thread 10, index 4, value: item-2
Item swapped: true
Thread 10, index 4, updated-item-2
Thread 10, index 5, value: item-2
Item swapped: true
Thread 10, index 5, updated-item-2
Thread 10, index 6, value: item-2
Thread 9, index 2, value: item-2
Item swapped: true
Thread 9, index 3, value: updated-item-2
Thread 10, index 6, updated-item-2
Thread 10, index 7, value: item-2
Thread 9, index 4, value: updated-item-2
Item swapped: true
Thread 9, index 5, value: updated-item-2
Thread 10, index 7, updated-item-2
Thread 9, index 6, value: updated-item-2
Thread 10, index 8, value: item-2
Thread 9, index 7, value: updated-item-2
Item swapped: true
Thread 9, index 8, value: updated-item-2
Thread 10, index 8, updated-item-2
Thread 9, index 9, value: item-2
Thread 10, index 9, value: item-10
Item swapped: false

Wicker answered 23/6, 2020 at 8:54 Comment(1)

I don't see how this solves the OP's problem. The problem is how to atomically set both an array element and a flag. Your code doesn't do that. More generally, I can't think of any way that you could use AtomicReferenceArray to solve the problem. That class gives you just atomic operations on individual array elements. – Mallard 24/6, 2020 at 8:37

I'd consider other options.

First, you should't use finalizers, cleaners or reference queues to implement such application requirements. For instance, although unlikely, someone can use a no-gc JVM instance. On a more practical matter, there's no guarantee that finalizers, cleaners and reference queues will run when the application is exiting.

You should use other mechanisms, such as parallel streams and/or completable futures, to handle this. You can actually use both approaches.

Then, you're locking or atomically updating on every pixel operation, which will without a doubt add a significant overhead. You should definitely make it more coarse and handle a bunch of pixels at a time. Although at first it seems like many of the embarrassingly-parallelizable algorithms could be run in parallel on each pixel, the truth is that doing so may add even more overhead.

You could have something in your library that allows configuring about how few pixels are worth the overhead of parallelizing work. Perhaps even have a benchmarking method that would try a binary search of time taken to process pixels sequentially and in parallel, until it finds an amount where the total parallel time is lower than sequential, and perhaps the overhead is low enough to justify, given a degree of parallelism.

In this particular case, if you target the same array, you may be victim of false sharing, where you might see worse performance due to frequent cache line invalidation. So, your benchmark may hopefully find a value very close to the cache line size.

You can't get a better than linear speedup. An n-core machine can't run better than 1/n the original sequential time, due at least to synchronization and cache invalidation, and generally to everything else that the machine is doing. Perhaps an overhead of about 1% to 5% might be good enough, for instance, a 2-core machine running the parallel approach in 52% of the sequential time.

Anyway, I believe you should definitely rethink this approach in these 2 points: not relying on GC to store your data, and having coarser operations properly synchronized. I'm assuming your operations are totally independent from each other, because otherwise this isn't about embarrassingly-parallel algorithms.

Ashcan answered 26/6, 2020 at 14:57 Comment(0)

It seems to me as if the dirty flag is only changed from false to true. If it is set to true, it will never be reset.

If you, as you suggest in your question, believe to manage the visibility of changes to other threads with other means than synchronization and volatility, would it not suffice if you synchronize around the state change of the dirty flag with something like this:

public void setValue(int index, byte value) {
    if (dirty) {
        data[index] = value;
    }
    else {
        synchronized(this) {
            dirty = true;
            data[index] = value;
        }
    }
}

This way, the synchronized overhead will only apply for the first method call and following method calls will be much faster. Just to make sure that I have pointed it out: With this code, modifications of the data array are not immediately visible to other threads.

If you want to make sure that changes to the data array are visible to other threads, you might consider further access methods, which are not only updating a single byte, but e.g. filling ranges with a given value or copying ranges from a source array like:

void fill(byte value, int offset, int length);
void setValue(byte[] source, int offset, int length);

With such methods, you might be able to properly synchronize and propagate changes without too much extra impact on the runtime.

You should also consider if this level of synchronization is good enough for other purposes as well. Your benchmark is perhaps not representative for real usage, but if you are updating the content of a DirtyArray by setting one byte at a time and you usually update several bytes at once, do you want other threads to access the object and see a partially updated byte array?

Rathenau answered 28/6, 2020 at 19:59 Comment(0)

So as you might know from the definition of the volatile variable it values is always visible to all threads.

Why volatile usually used wrong: what people usually mess up with a volatile, that they can't update it atomically without race conditions.

Let's take a counter as an example:

class Counter {
  volatile int counter = 0;

  void incrementCounter() {
    counter = counter + 1; 
  } 
}

This wouldn't work, because there are 2 different operations that are happening with a counter: read and write. And if there is something happen to the value between reading and write counter will lose that value:

read counter inside the function | value is 100
counter is changed in other thread/place | value is 105
increment the counter value and write it back | value is 101

Volatile in your code

You DON'T have read and write operations over the flag in one place, so your code in setValue is already atomic.

Also, there are no issues with concurrency between writing and reading from volatile variables in your code. According to the JMM, writing and reading from volatile variables are sententially consistent, so you don't need to do any actions to synchronize it.

Fixing performance

As you noticed volatile might be slower than nonvolatile.

Because volatile is synchronized between threads, JVM has to do some extra work. But the good part that it's done during the write operation. So if you add the read check before modification it should make it faster.

Disconnect answered 24/6, 2020 at 2:0 Comment(0)

I'll keep it short and concise.

You can go for either (1) synchronized methods to modify and/or read your boolean field, or (2) just use AtomicBoolean, which is inherently thread-safe.

Silverside answered 29/6, 2020 at 17:24 Comment(0)

You could use an approach similar to what AtomicMarkableReference does.

public class DirtyArray {
    private final Cell[] cells;

    public DirtyArray(byte[] data) {
        cells = new Cell[data.length];
        Arrays.setAll(cells, i -> new Cell(data[i], false));
    }

    public void setValue(int index, byte value) {
        Cell cell = cells[index];
        if (cell.value != value || !cell.dirty) {
            cells[index] = new Cell(value, true);
        }
    }

    public boolean isDirty() {
        for (Cell cell : cells) {
            if (cell.dirty) {
                return true;
            }
        }
        return false;
    }

    private static class Cell {
        final byte value;
        final boolean dirty;

        Cell(byte value, boolean dirty) {
            this.value = value;
            this.dirty = dirty;
        }
    }
}

Belita answered 21/7, 2020 at 11:35 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++