I know that the JVM memory model is made for lowest common denominator of CPUs, so it has to assume the weakest possible model of a cpu on which the JVM can run (eg ARM).
That's not correct. The JMM resulted from a compromise among a variety of competing forces: the desire for a weaker memory model so that programs can go faster on hardware that have weak memory models; the desire of compiler writers who want certain optimizations to be allowed; and the desire for the result of parallel Java programs to be correct and predictable, and if possible(!) understandable to Java programmers. See Sarita Adve's CACM article for a general overview of memory model issues.
Considering that x64 has a fairly strong memory model, what synchronization practices can I ignore assuming I know my program will only run on [x64] CPUs?
None. The issue is that the memory model applies not only to the underlying hardware, but it also applies to the JVM that's executing your program, and mostly in practice, the JVM's JIT compiler. The compiler might decide to apply certain optimizations that are allowed within the memory model, but if your program is making unwarranted assumptions about the memory behavior based on the underlying hardware, your program will break.
You asked about x64 and atomic 64-bit writes. It may be that no word tearing will ever occur on an x64 machine. I doubt that any JIT compiler would tear a 64-bit value into 32-bit writes as an optimization, but you never know. However, it seems unlikely that you could use this feature to avoid synchronization or volatile fields in your program. Without these, writes to these variables might never become visible to other threads, or they could arbitrarily be re-ordered with respect to other writes, possibly leading to bugs in your program.
My advice is first to apply synchronization properly to get your program correct. You might be pleasantly surprised. The synchronization operations have been heavily optimized and can be very fast in the common case. If you find there are bottlenecks, consider using optimizations like lock splitting, the use of volatiles, or converting to non-blocking algorithms.
UPDATE
The OP has updated the question to be a bit more specific about using volatile
instead of locks and synchronization.
It turns out that volatile
not only has memory visibility semantics. It also makes long
and double
access atomic, which is not the case for non-volatile
variables of those types. See the JLS section 17.7. You should be able to rely on volatile
to provide atomicity on any hardware, not just x64.
While I'm at it, for additional information about the Java Memory Model, see Aleksey Shipilev's JMM Pragmatics talk transcript. (Aleksey is also the JMH guy.) There's lots of detail in this talk, and some interesting exercises to test one's understanding. One overall takeaway of the talk is that it's often a mistake to rely on one's intuition about how the memory model works, e.g. in terms of cache lines or write buffers. The JMM is a formalism about memory operations and various contraints (synchronizes-with, happens-before, etc.) that determine ordering of those operations. This can have quite counterintuitive results. It's unwise to try to outsmart the JMM by thinking about specific hardware properties. It'll come back to bite you.