Is there any archs where a memory barrier is implemented even with a cache flush? I read that memory barrier affects only CPU reordering but I read statements related to the memory barriers: ensures all the cpu will see the value..., but for me it means a cache flush/invalidation.
On pretty much all modern architectures, caches (like the L1 and L2 caches) are ensured coherent by hardware. There is no need to flush any cache to make memory visible to other CPUs.
One could imagine hypothetically a system that was not cache coherent in hardware, but it wouldn't look anything like the current systems that run operating systems like Windows and Linux.
Memory barriers are needed on these architectures to do three things:
The CPU may pre-fetch a read that's invalidated by a write on another core. This must be prevented. (Though on x86, this is prevented in hardware. The pre-fetch is locked to the L1 cache line, so if another CPU invalidates the cache line, the pre-fetch is invalidated as well.)
The CPU may "post" writes and not put them in its L1 cache yet. These writes must be completed at least to L1 cache.
The CPU may re-order reads and writes on one side of the memory barrier with reads and writes on the other side. Depending on the type of memory barrier, some of these re-orderings must be prohibited. (For example,
read x; read y;
doesn't ensure the reads happen in that order. Butread x; memory_barrier(); read y;
typically does.)
The exact impact of a memory barrier depends on the specific architecture
CPUs employ performance optimizations that can result in out-of-order execution. The reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.
http://en.wikipedia.org/wiki/Memory_barrier
Current Intel architectures ensure automatic cache consistency across all CPU's, without explicit use of memory barrier or a cache flush instructions.
In symmetric multiprocessor (SMP) systems, each processor has a local cache. The memory system must guarantee cache coherence. False sharing occurs when threads on different processors modify variables that reside on the same cache line. This invalidates the cache line and forces an update, which hurts performance.
http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/
As soon as one core writes to a memory location, the other cores know that their copies of the corresponding cache line are now stale and hence invalid.
–
Dyeing store
s and load
s so even Intel cache coherence protocol of choice cannot magically sync caches. That will do CPU when it choose to or it is instructed. It is not automatic process invoked with each store
operation. Even more when value resided in some CPU store buffer no other CPU (invalidation queue) is informed about new value. –
Tucky © 2022 - 2024 — McMap. All rights reserved.