Do memory fences slow down all CPU cores?

Asked 13/9, 2014 at 9:43 Answered 13/9, 2014 at 13:1

Somewhere, one time I read about memory fences (barriers). It was said that memory fence causes cache synchronisation between several CPU cores.

So my questions are:

How does the OS (or CPU itself) know which cores need to be synchronised?
Does it synchronise cache of all CPU cores?
If answer to (2) is 'yes' and assuming that sync operations are not cheap, does using memory fences slow down cores that are not used by my application? If for example I have a single threaded app running on my 8-core CPU, will it slow down all other 7 cores of the CPU, because some cache lines must be synced with all those cores?
Are the questions above totally ignorant and fences work completely differently?

Elnora answered 13/9, 2014 at 9:43 Comment(2)

The operating system is not involved with this, it is a processor detail. The OS is just another chunk of software that needs to deal with the need for fences, necessarily so in its thread scheduler. – Faintheart 13/9, 2014 at 10:54

fences actually don't sync the cache, they sync the program flow. – Dimension 17/9, 2014 at 14:7

The OS does not need to know, and each CPU core does what it's told: each core with a memory fence has to do certain operations before or after, and that's all. A core isn't synchronizing "with" other cores, it's synchronizing memory accesses relative to itself.
A fence in one core does not mean other cores are synchronized with it, so typically you would have two (or more) fences: one in the writer and one in the reader. A fence executed on one core does not need to impact any other cores. Of course there is no guarantee about this in general, just a hope that sane architectures will not unduly serialize multi-core execution.

Syrinx answered 13/9, 2014 at 11:49 Comment(1)

I see, thank you. I was confused with that 'sync cache' thing. Thought memory fence means all cores are notified that certain cache line must be invalidated. – Elnora 13/9, 2014 at 12:54

Generally, memory fences are used for ordering local operations. Take for instance this pseudo-assembler code:

load A
load B

Many CPU's do not guarantee that B is indeed loaded after A, B may be in a cache line that was loaded into cache earlier due to some other memory load. If you introduce a fence,

load A
readFence
load B

you have the guarantee that B is loaded from memory after A is. If B were in cache but older than A, it would be reloaded.

The situation with stores is the same the other way around. With

store A
store B

some CPUs may decide to write B to memory before they write A. Again, a fence between the two instructions may be needed to enforce ordering of the operations. Whether a memory fence is required always depends on the architecture.

Generally, you use memory fences in pairs:

If one thread wants to publish an object, it first constructs the object, then it performs a write fence before it writes the pointer to the object into a publicly known location.
The thread that wants to receive the object, reads the pointer from the publicly know memory location, then it executes a read fence to ensure that all further reads based on that pointer actually give the values the publishing thread intended.

If either fence is missing, the reader may read the value of one or more data members of the object before it was initialized. Madness ensues.

Quintilla answered 13/9, 2014 at 13:1 Comment(2)

You haven't addressed the question(s). Why is this the highest voted answer? – Poul 13/9, 2014 at 16:39

Well, I can't tell you that. But I did address the question, though indirectly: The misunderstanding of the OP seems to be that a memory fence is a global operation of some kind. That is, why I detailed how memory fences actually work and why they are purely local operations. I also showed how a pair of these local operations can be used to achieve global data consistency. No direct answer, that's true, but I hope a useful one. – Quintilla 13/9, 2014 at 19:4

If you have say eight cores, and each core is doing different things, then these cores wouldn't be accessing the same memory, and wouldn't have the same memory in a cache line.

If core #1 uses a memory fence, but no other core accesses the memory that core #1 accesses, then the other cores won't be slowed down at all. However, if core #1 writes to location X, uses a memory fence, then core #2 tries to read the same location X, the memory fence will make sure that core #2 throws away the value of location X if it was in a cache, and reads the data back from RAM, getting the same data that core #1 has written. That takes time of course, but that's what the memory fence was there for.

(Instead of reading from RAM, if the cores share some cache, then the data will be read from cache. )

Dasha answered 13/9, 2014 at 12:16 Comment(0)

Recommended topics

Hot tags