Java: Make all fields either final or volatile?
Asked Answered
B

3

24

If I have an object which is shared between threads, it seems to me that every field should be either final or volatile, with the following reasoning:

  • if the field should be changed (point to another object, update the primitive value), then the field should be volatile so that all other threads operate on the new value. Merely a synchronization on the methods which access said field is insufficient because they might return a cached value.

  • if the field should never change, make it final.

However, I could not find anything about this, so I wonder whether this logic is flawed or just too obvious?

EDIT of course instead of volatile one might use a final AtomicReference or similar.

EDIT for example, see Is getter method an alternative to volatile in Java?

EDIT to avoid confusions: This question is about cache invalidation! If two threads operate on the same object, the fields of the objects can be cached (per thread), if they are not declared volatile. How can I guarantee that the cache is invalidated properly?

FINAL EDIT Thanks to @Peter Lawrey who pointed me to JLS §17 (Java memory model). As far as I see, it states that synchronization establishes a happens-before relation between operations, so that a thread sees the updates from another thread if those updates "happened-before", e.g. if getter and setter for a non-volatile field are synchronized.

Boggart answered 12/12, 2018 at 8:12 Comment(11)
Merely a synchronization on the methods which access said field is insufficient - no, synchronization is "stronger" than volatile. It means you won't even enter the method if another thread is in it, so you only enter after the other one exited and finished changing the value.Melisandra
the problem is that each thread can have its own cached version, so that thread 2 still may see the old version after thread 1 has updated it. tutorials.jenkov.com/java-concurrency/volatile.htmlBoggart
No, when properly using private fields, so that all access has to be done via synchronized methods, the visibility of the most recent values is already guaranteed. As @Melisandra correctly pointed out, it will be even stronger than declaring a variable volatile. As the article you’ve linked points out, volatile is not always enough.Schaaf
The notion that "volatile" is about "cache invalidation" is bad reasoning; it presupposes a specific memory model implementation, namely that memory is implemented with processor caches that copy the real memory, or that variables are implemented with registers that cache values. Reason about correctness within the abstract memory model that Java gives you, not any particular implementation of it.Dispel
To that point: volatile is not magic dust that you sprinkle on your program and threading issues go away. Volatile reads and writes may still be re-ordered with respect to each other, and this can still produce unexpected race conditions. Do not allow volatile fields to give you an unearned sense of security about the correctness of your program. Again, you need to reason about the correctness of the program in the context of the memory model of the language.Dispel
@Boggart The date class example using volatile in that tutorial... quite uncommon, non-idiomatic, hard to reason about, not recommended, unwarranted, and ill advised.Upheaval
@Melisandra "no, synchronization is "stronger" than volatile." ...if all methods are synchronized and all fields are accessed via these methodsUpheaval
@EricLippert read and write on volatile variables effectively establishes a synchronization on threads accessing this variable without an explicit lock / monitor. Of course, this does not imply any correctness or processing order, but for sure thread2 will not see any "old" value if it is physically executed after thread 1, which is permitted if the variable was not synchronized.Boggart
@Moritz: The notion that threads "physically execute" in an "order" with respect to each other is difficult to understand in a world where my machine has 32 hyperthreaded CPUs in it. In what "order" do those 32 threads "physically execute" with respect to each other? Are we guaranteed that the "order" observed by every thread is consistent with the order observed by every other thread? C# explicitly does not make this guarantee; I would be surprised if Java's memory model took the performance hit to guarantee that.Dispel
@Moritz: But more fundamentally, again your attitude is specifically the one I'm calling out as a bad attitude. Do not reason about processors and physical execution and all of that stuff which is subject to change based on the whims of hardware designers -- or, for that matter, virtual machine designers! Java runs in a VM which is not bound by the restrictions of hardware threads. Reason only about the guarantees given to you by the memory model, and you're more likely to write a correct multithreaded program.Dispel
@Moritz: In case it's not clear what I mean by "consistency" -- your reasoning about the "order" in which threads "physically execute" implies that events have a total order, even if it is maybe a weird and unexpected one. Suppose event A1 "happens before" B, and A2 "happens before" B also. We must not reason "the real order was either A1 A2 B or A2 A1 B, but we do not know which one ahead of time". In the C# memory model, it is possible for one thread to observe A1 A2 B and a different thread to observe A2 A1 B. There is no "global consistent order". Is that true in Java?Dispel
D
30

While I feel private final should probably have been the default for fields and variables with a keyword like var making it mutable, using volatile when you don't need it is

  • much slower, often around 10x slower.
  • usually doesn't give you the thread safety you need, but can make finding such bugs harder by making them less likely to appear.
  • unlike final which improves clarity by saying this shouldn't be altered, using volatile when it is not needed, is likely to be confusing as the reader tries to work out why it was made volatile.

if the field should be changed (point to another object, update the primitive value), then the field should be volatile so that all other threads operate on the new value.

While this is fine for reads, consider this trivial case.

volatile int x;

x++;

This isn't thread-safe. As it's the same as

int x2 = x;
x2 = x2 + 1; // multiple threads could be executing on the same value at this point.
x = x2;

What is worse is that using volatile would make this kind of bug harder to find.

As yshavit point's out, updating multiple fields is harder to work around with volatile e.g. HashMap.put(a, b) updates multiple references.

Merely a synchronization on the methods which access said field is insufficient because they might return a cached value.

synchronized gives you all the memory guarantees of volatile and more, which is why it's significantly slower.

NOTE: Just synchronized-ing every method isn't always enough either. StringBuffer has every method synchronized but is worst than useless in a multi-threaded context as it's use is likely to be error-prone.

It's too easy to assume that achieving thread safety is like sprinkling fairy dust, add some magic thread safety and your bugs go away. The problem is that thread safety is more like a bucket with many holes. Plug the biggest holes and the bugs can appear to go away, but unless you plug them all, you don't have thread safety, but it can be harder to find.

In terms of synchronized vs volatile, this states

Other mechanisms, such as reads and writes of volatile variables and the use of classes in the java.util.concurrent package, provide alternative ways of synchronization.

https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html

Demandant answered 12/12, 2018 at 8:47 Comment(14)
Can you please explain why, given that thread1 and thread2 hold a reference to the same object, and access is via synchronized methods only, thread 2 sees the updates of thread 1 guaranteed? I refer to e.g. tutorials.jenkov.com/java-concurrency/volatile.html This and other resources say (as far as I understand) that the fields can still be cached in different cache lines - that is, thread 2 just has to wait for the getX() to enter but it will still return the cached version. Is there any guarantee that the thread's cache is updated after obtaining the monitor or the like?Boggart
@Boggart synchronization would be pointless without memory barriers to provide those guarantees. While the value is cached within the synchronized block, it will return the latest value on read and ensure anything written is visible to all threads.Demandant
makes totally sense to me, but do you by chance know any point let's say in JLS where this is stated?Boggart
@Boggart I have added a reference to synchronization holding a monitor and a lock, and stating voaltile is an alternative.Demandant
As far as I see, from JLS §17 it follows that if accessor methods are synchronized, this introduces a happens-before relation between the accesses, and thus the second operation will see the update. Right?Boggart
@Boggart correct. synchronized has a full fence on entry and exit.Demandant
This is a good answer, but I would suggest a quick note that classes often use state from multiple fields, which volatile can't address. This is similar to your x++ case, but even less subtle, and even harder to work around (ie, you don't get "oh that's easy, just use an AtomicInteger").Standup
@Boggart "fields can still be cached in different cache lines" caches are coherent on all common architectures and probably all CPU where a full blown Java implementation is availableUpheaval
@Upheaval how would that be achieved if you have threads on 2 CPUs? would chaching then be pointless? Anyways, if threads are synced (e.g. sync'd accessors or volatile data), then Java takes care about proper visibility of updates as you would expect.Boggart
@Boggart Do you assume that distinct caches keep their data private for as long as possible until a flush operation? That would be hugely inefficient. A lot of data would need to be pushed to other caches which would stall the thread using the cache doing the pushing! It would probably make multithread programs (those programs that use at least one thread and use thread synchronisation) a lot less inefficient than single thread programs (those program that are known to have a single thread and thus don't use synchronisation).Upheaval
@Boggart In any modern normal system, caches don't even have functions for synchronisation as they are kept synchronisation at all time. (You can flush a cache but that's different, rarely used. Code that generate executable CPU code might need to do a flush of the local code cache but that's a different topic.) Taking care of stale data in the cache, without waiting, does not make caching pointless, it makes it useful. It eliminates the potential issue of a backlog of stale data that would pause the current thread when doing a mutex release or volatile write.Upheaval
Also in Java non volatile fields or global variables that are subject to a R/W data race (one thread writes it another read it without the use of a mutex or anything that provides ordering) is not guaranteed to have unsynchronized/stale cache semantic! That stale semantic would guarantee that an unsynchronized read can get a stale value or the "current" value (BTW current isn't even a well defined concept in MT) and that it can get stale values until it reads the latest, or a later value, thus making progress in time, as if you were getting the newspapers of the previous days in order.Upheaval
That's NOT guaranteed in Java thus showing that the risk of a non volatile read of even an atomic, integrity guaranteed variable (f.ex. a reference, which can never have an invalid value) is not caused by the CPU cache. The Java semantics allow non progress of reads so while(1) { int r = g; print(r); } is allowed to print 12121... if non volatile global shared g had value 1 then 2!Upheaval
That would be the result of a simple optimization where the compiler assume g doesn't usually change (and you don't care about its last value), make a local copy of g called g_local, then unroll the loop by 2, thus changing the program to int g_local = g; while(1) { int r1 = g; print(r1); int r2 = g; print(r2); } then decide for whatever reason to only do g_local/g substitution once ending up with the bizarre but legal int g_local = g; while(1) { int r1 = g_local; print(r1); int r2 = g; print(r2); } (A real print function is likely to have synchronisation, we assume it doesn't.)Upheaval
Z
8

Making fields you don't need to change final is a good idea, irrespective of threading concerns. It makes instances of the class easier to reason about, because you can know what state it is in more easily.

In terms of making the other fields volatile:

Merely a synchronization on the methods which access said field is insufficient because they might return a cached value.

You would only see a cached value if you accessing the value outside a synchronized block.

All accesses would need to be correctly synchronized. The end of one synchronized block is guaranteed to happen before the start of another synchronized block (when synchronizing on the same monitor).

There are at least a couple of cases where you would still need to use synchronization:

  • You would want to use synchronization if you had to read and then update one or more fields atomically.
    • You may be able to avoid synchronization for certain single field updates, e.g. if you can use an Atomic* class instead of a "plain old field"; but even for a single field update, you could still require exclusive access (e.g. adding one element to a list whilst removing another).
  • Also, volatile/final may be insufficient for non-thread safe values, like an ArrayList or an array.
Zareba answered 12/12, 2018 at 8:30 Comment(0)
T
1

If an object is shared between threads, you have two clear options:

1. Make that object read only

So, updates (or cache) have no impact.

2. Synchronize on object itself

Cache invalidation is hard. Very hard. So if you need to guarantee no stale values, you should protect that value and protect the lock around said value.

Make the lock and values as private on shared object, so the operations here are a implementation detail.

To avoid dead locks, this operations should be as "atomic", as in to avoid interact with other any lock.

Teresita answered 12/12, 2018 at 13:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.