What is a memory fence?

Asked 13/11, 2008 at 9:30 Answered 12/12, 2020 at 9:48

148

What is meant by using an explicit memory fence?

Codfish answered 13/11, 2008 at 9:30 Comment(0)

146

For performance gains modern CPUs often execute instructions out of order to make maximum use of the available silicon (including memory read/writes). Because the hardware enforces instructions integrity you never notice this in a single thread of execution. However for multiple threads or environments with volatile memory (memory mapped I/O for example) this can lead to unpredictable behavior.

A memory fence/barrier is a class of instructions that mean memory read/writes occur in the order you expect. For example a 'full fence' means all read/writes before the fence are comitted before those after the fence.

Note memory fences are a hardware concept. In higher level languages we are used to dealing with mutexes and semaphores - these may well be implemented using memory fences at the low level and explicit use of memory barriers are not necessary. Use of memory barriers requires a careful study of the hardware architecture and more commonly found in device drivers than application code.

The CPU reordering is different from compiler optimisations - although the artefacts can be similar. You need to take separate measures to stop the compiler reordering your instructions if that may cause undesirable behaviour (e.g. use of the volatile keyword in C).

Custodial answered 13/11, 2008 at 10:2 Comment(11)

I do not think volatile is enough to stop the compiler reordering; AFAIK it only makes sure the compiler cannot cache the variable value. The Linux kernel uses a gcc extension (asm __volatile__("": : :"memory")) to create a full compiler optimization barrier. – Stephainestephan 13/11, 2008 at 10:36

true, volatile isn't thread aware but you can use it to stop the compiler applying certain optimisations - this is unrelated to fences ;) – Custodial 16/12, 2008 at 12:48

(.NET CLR) volatile reads are acquire fences, writes are release fences. Interlocked ops are full as is the MemoryBarrier method. – Edea 1/5, 2010 at 13:6

Interesting read about the volatile keyword in .net can be found here albahari.com/threading/part4.aspx#_NonBlockingSynch The site contains a lot of useful information on threading in c# – Sugden 1/7, 2010 at 9:36

developerWorks has a good [article][1] about PowerPC memory storage model. [1]: ibm.com/developerworks/systems/articles/powerpc.html – Socher 2/10, 2011 at 21:20

@Custodial What did you mean by to make maximum use of the available silicon? – Traps 1/9, 2013 at 14:3

@Gwaredd: What do you mean by "the hardware enforces instructions integrity"? Can you please elaborate on this a bit? – Percussive 8/10, 2013 at 4:9

@Gwaredd: Also, does "execute instructions out of order" mean the same thing as "accessing memory out of order"? I think memory fences is related to the order of memory accesses (rather than the order of instructions), no? – Percussive 8/10, 2013 at 4:13

@Stephainestephan well it depends on the languge. In Java, a volatile prevents the compiler of reordering statements. (Whereas in .NET it's a bit more complicated) – Cleavage 20/6, 2017 at 7:19

@Gwaredd: "all read/writes before the fence are committed before those after the fence" - Could you be more specific as to what does commit means here? Does it mean that the reads/writes are actually written to memory and hence, visible to other cores/processors in the system after the barrier instruction is completed? – Fairspoken 11/8, 2018 at 20:5

@Stephainestephan Volatile operations are morally I/O operations; compilers don't reorder two volatile writes anymore than they reorder printf("hello ");printf("world"); to produce "worldhello ". – Selfseeking 4/6, 2019 at 1:59

Copying my answer to another question, What are some tricks that a processor does to optimize code?:

The most important one would be memory access reordering.

Absent memory fences or serializing instructions, the processor is free to reorder memory accesses. Some processor architectures have restrictions on how much they can reorder; Alpha is known for being the weakest (i.e., the one which can reorder the most).

A very good treatment of the subject can be found in the Linux kernel source documentation, at Documentation/memory-barriers.txt.

Most of the time, it's best to use locking primitives from your compiler or standard library; these are well tested, should have all the necessary memory barriers in place, and are probably quite optimized (optimizing locking primitives is tricky; even the experts can get them wrong sometimes).

Stephainestephan answered 13/11, 2008 at 10:41 Comment(1)

How does it influence the flow of the reordering? When you said, Alpha is known for being the weakest, why weakest? Is not it better that, it reorder more, so as result it will be much faster execution? (I am not alpha user, but asking about the effect of very reordering vs restricted reordering). So what are the downsides of lot reordering (except of risk of undefined behaviour , but I would guess, most modern CPUs should have resolved good reordering and have implemented only defined reordering, otherwise, it would not make sense of decision they made). – Strathspey 15/2, 2020 at 21:37

In my experience it refers to a memory barrier, which is an instruction (explicit or implicit) to synchronize memory access between multiple threads.

The problem occurs in the combination of modern agressive compilers (they have amazing freedom to reorder instructions, but usually know nothing of your threads) and modern multicore CPUs.

A good introduction to the problem is the "The 'Double-Checked Locking is Broken' Declaration". For many, it was the wake-up call that there be dragons.

Implicit full memory barriers are usually included in platform thread synchronization routines, which cover the core of it. However, for lock-free programming and implementing custom, lightweight synchronization patterns, you often need just the barrier, or even a one-way barrier only.

Pomfret answered 13/11, 2008 at 9:41 Comment(0)

Wikipedia knows all...

Memory barrier, also known as membar or memory fence, is a class of instructions which cause a central processing unit (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.

CPUs employ performance optimizations that can result in out-of-order execution, including memory load and store operations. Memory operation reordering normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory model. Some architectures provide multiple barriers for enforcing different ordering constraints.

Memory barriers are typically used when implementing low-level machine code that operates on memory shared by multiple devices. Such code includes synchronization primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware.

Excessive answered 13/11, 2008 at 10:4 Comment(0)

memory fence(memory barrier) is a kind of lock-free mechanism for synchronisation multiple threads. In a single thread envirompment reordering is safe.

The problem is ordering, shared resource and caching. Processor or compiler is able to reorder a program instruction(programmer order) for optimisation. It creates side effects in multithread envirompment. That is why memory barrier was introduce to guarantee that program will work properly. It is slower but it fixes this type of issue

[Java Happens-before]

[iOS Memory Barriers]

Venice answered 12/12, 2020 at 9:48 Comment(0)

Recommended topics

Hot tags