memory barrier and cache flush

Asked 1/7, 2012 at 14:47 Answered 1/7, 2012 at 15:30

Is there any archs where a memory barrier is implemented even with a cache flush? I read that memory barrier affects only CPU reordering but I read statements related to the memory barriers: ensures all the cpu will see the value..., but for me it means a cache flush/invalidation.

Citriculture answered 1/7, 2012 at 14:47 Comment(0)

On pretty much all modern architectures, caches (like the L1 and L2 caches) are ensured coherent by hardware. There is no need to flush any cache to make memory visible to other CPUs.

One could imagine hypothetically a system that was not cache coherent in hardware, but it wouldn't look anything like the current systems that run operating systems like Windows and Linux.

Memory barriers are needed on these architectures to do three things:

The CPU may pre-fetch a read that's invalidated by a write on another core. This must be prevented. (Though on x86, this is prevented in hardware. The pre-fetch is locked to the L1 cache line, so if another CPU invalidates the cache line, the pre-fetch is invalidated as well.)
The CPU may "post" writes and not put them in its L1 cache yet. These writes must be completed at least to L1 cache.
The CPU may re-order reads and writes on one side of the memory barrier with reads and writes on the other side. Depending on the type of memory barrier, some of these re-orderings must be prohibited. (For example, read x; read y; doesn't ensure the reads happen in that order. But read x; memory_barrier(); read y; typically does.)

Life answered 1/7, 2012 at 15:30 Comment(6)

I didn't mean that I have to explicitly invalidate the cache but that in hw (on some archs) the memory barrier means even a cache invalidate. I read it now from here: linuxjournal.com/article/8212 that for ALPHA (a very old arch) the smp_wmb() means implicitly a cache invalidation. – Citriculture 1/7, 2012 at 16:20

I guess you missed the concepts of store buffer and invalidation queue that weaken cache coherence guarantees. – Tucky 15/12, 2020 at 14:14

@Ucho The OP already understands that. "I read that memory barrier affects only CPU reordering". The store buffer and invalidation queue are reordering mechanisms in the CPU that have no effect on cache coherence guarantees. Those are how the CPU reorders. Those are the things affected by memory barriers. – Life 15/12, 2020 at 17:18

@DavidSchwartz If OP understand these concept there is no issue. Until buffer or in. queue is satisfied caches are incositent so without flush one cannot rely on coherence. – Tucky 16/12, 2020 at 16:17

@Ucho I don't see why. How does flushing the cache help if the CPU reorders reads or writes? – Life 17/12, 2020 at 4:14

If thread (CPU) is about to read shared memory and that time it updates its cache it cannot obtain value that wasn’t written yet. So it cannot help. – Tucky 18/12, 2020 at 9:35

The exact impact of a memory barrier depends on the specific architecture

CPUs employ performance optimizations that can result in out-of-order execution. The reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.

http://en.wikipedia.org/wiki/Memory_barrier

Current Intel architectures ensure automatic cache consistency across all CPU's, without explicit use of memory barrier or a cache flush instructions.

In symmetric multiprocessor (SMP) systems, each processor has a local cache. The memory system must guarantee cache coherence. False sharing occurs when threads on different processors modify variables that reside on the same cache line. This invalidates the cache line and forces an update, which hurts performance.

http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/

Dyeing answered 1/7, 2012 at 14:53 Comment(9)

Are you sure about cache coherence guarantee? I guess false sharing topic is not the right place to ask such one. I believe there exist store buffers and invalidation queues that must be handled to allow execution rely on cache state. – Tucky 15/12, 2020 at 14:10

Pretty sure, yes. Here's a brief description of the protocol Intel uses en.wikipedia.org/wiki/MESIF_protocol – Dyeing 16/12, 2020 at 21:16

In that case you’re wrong. Until value from store buffer is written into the cache, cache coherence protocol won’t take turn. Similarly if item is pending in invalidation queue cache state isn’t consistent. – Tucky 18/12, 2020 at 9:23

See for instance fgiesen.wordpress.com/2014/07/07/cache-coherency. For further reference search papers on store buffer, invalidation queue, memory barrier. – Tucky 19/12, 2020 at 18:48

From that reference, most ordinary hardware uses a snooping protocol which ensures multi-cache consistency.

As soon as one core writes to a memory location, the other cores know that their copies of the corresponding cache line are now stale and hence invalid.

– Dyeing 21/12, 2020 at 23:6

And for write-through caches, protocols like MESI signal the intention to write to memory to ensure consistency. – Dyeing 21/12, 2020 at 23:7

Yes, of course but these signal has not to be processed immediately. Caches do not respond to bus events immediately. – Tucky 23/12, 2020 at 11:3

Let's agree to disagree. Your own source says that modern CPU caches guarantee coherence, which is the premise we began debating. Not sure what else to say about it. – Dyeing 23/12, 2020 at 17:42

You wrote Current Intel architectures ensure automatic cache consistency across all CPU's, without explicit use of memory barrier or a cache flush instructions. But without store buffer write, cache coherence won’t take turn. It’s CPU which processes stores and loads so even Intel cache coherence protocol of choice cannot magically sync caches. That will do CPU when it choose to or it is instructed. It is not automatic process invoked with each store operation. Even more when value resided in some CPU store buffer no other CPU (invalidation queue) is informed about new value. – Tucky 23/12, 2020 at 20:42

Recommended topics

Hot tags