In my scenario, I have DirtyArray
objects that are basically primitive array wrappers that set a boolean "dirty" flag when a write access happens.
public class DirtyArray {
private byte[] data;
public DirtyArray(byte[] data) {
this.data = data;
}
private boolean dirty = false;
public void setValue(int index, byte value) {
dirty = true;
data[index] = value;
}
public boolean isDirty() {
return dirty;
}
}
The dirty flag only ever goes from false
to true
.
I need to make this safe for concurrent use:
There are one or more threads that may modify the array (setValue
).
There are one or more threads that catch DirtyArray
before it is GCed and are supposed to write it off to disk if it has been modified (isDirty
).
Now, if I understand correctly, it is not safe to do this like above:
Effectively, from the point of view of the isDirty
thread, the data[index]=value
store could be reordered before the dirty=true
store.
Therefore, seeing isDirty()==false
does not guarantee that data
has not been modified.
Is this correct?
Assuming that yes, then making the dirty
flag volatile
should fix this problem.
However, in the following benchmark, I see a ~50x-100x slowdown when doing that.
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void touchAll()
{
for (int i = 0; i < numEntities; i++)
bytes.setValue(i, ( byte ) 2);
}
Using instead AtomicBoolean with the memory ordering get/set variants introduced in Java 9, I have this variant:
public class DirtyArray {
private byte[] data;
public DirtyArray(byte[] data) {
this.data = data;
}
private AtomicBoolean dirty = new AtomicBoolean();
public void setValue(int index, byte value) {
if (!dirty.getPlain())
dirty.setRelease(true);
data[index] = value;
}
public boolean isDirty() {
return dirty.getAcquire();
}
}
which has the same performance (in the above benchmark) as the original non-volatile version.
Is this safe to do? That is, does it guarantee that when data
is modified I will see isDirty()==true
?
(In my understanding it should be, but only because dirty
only ever goes from false
to true
, never back.)
Are there other variation to achieve this guarantee, possibly even allowing to reset dirty
to false
,
and ideally without negatively impacting performance?
Update
I agree with the general assessment of the answers so far that the only way to guarantee consistency between changed data
array and dirty
flag is to synchronize both setValue
and isDirty
. The race conditions that Pak Uula pointed out are the real problem, not so much getting the dirty flag visible. So basically the question I asked above is the wrong question...
For more context: this is about storing pixels for transparently cached images in https://github.com/imglib. It is used in very tight loops and taking the hit from synchronization is not really an option. The typical usage scenario is:
- Multiple threads modify the image (which is backed by many
DirtyArrays
). - The
isDirty()
check happens on another thread that catchesDirtyArray
before it is garbage-collected (PhantomReference
on the holder of theDirtyArray
) and if it is dirty, write it to disk.
My opinion now is that this should be approached on a coarser level than individual setValue()
calls. There are kind of "natural" synchronization points that happen because threads switch between DirtyArray
s by getting them from a ConcurrentHashMap
(glossing over the details of course), threads are in a thread pool and take jobs from a shared queue, or threads in some other way wait for each other. At these synchronization points, effects of earlier (in program order) setValue()
s must become visible. So I tend to just use the plain unsynchronized version and rely on synchronization on the coarser level.
The only thing that gives me slight headaches is that the clean-up is triggered by garbage-collection and I have to make sure that (the holder of) DirtyArray
is not collected before coarse level synchronization point. But I think I can make sure of that by keeping strong references and adding reachability fences if necessary.
if (!dirty) dirty = true;
may improve performance as you will have only one write to the variable. – Kieferdata[index] = value; dirty.setRelease...
, you are releasing the writes, that will be guaranteed to be visible when theacquire
happens. The problem is thatrelease
is not atomic, afaik, i.e. you need to observe the acquire part, to be sure. Thevolatile
check withif (!dirty) dirty = true
which @Kiefer provided, should help, a lot. especially on x86. otherwise, showing the entire JMH tests would help – StabilizeisDirty
is one that only triggers once, when the PhantomReference is about to be collected? If yes, do you really have a synchronization problem here? Because once that trigger happens, it is ensured that no other thread holds a reference to thatDirtyArray
which also means that you won't ever have a concurrent access at that point – Worrelldirty = true
it is not guaranteed that another thread sees that write unless there is some memory fence in between. – Hyperopia