How can I experience "LFENCE or SFENCE can not pass earlier read/write"
Asked Answered
W

1

2

I'm doing something about function safety. I need verify some X86 CPU instructions, such as LFENCE, SFENCE and MFENCE.

Now I can experience MFENCE according to Intel SDM chapter 8.2.3.4 "loads may be reordered with earlier store to different location".

"xor %0, %0\n\t                 "
"movl $1, %1\n\t                "
"mfence\n\t                     "   
"movl %2, %0\n\t                "
: "=r"(r1), "=m" (X)             
: "m"(Y)                         
: "memory"); 
"xor %0, %0\n\t                 "
"movl $1, %1\n\t                "
"mfence\n\t                     "   
"movl %2, %0\n\t                "
: "=r"(r2), "=m" (Y)
: "m"(X)
: "memory");

Above code only experience MFENCE could prevent memory reordering.(by detect the different value of r1 and r2 before/after removing mfence in both processors)

So I'm wondering how can I verify LFENCE and SFENCE like above. I didn't find any logic in SDM.

Wicker answered 21/6, 2019 at 14:31 Comment(3)
Can you clarify how the code you've shown verifies the documented behavior of mfence? You actually need to write many tests to check every property of all of the three fence instructions for Intel and AMD processors, which is going to take a lot of effort.Spaceport
@HadiBrais: this code appears to reproduce the test from preshing.com/20120515/memory-reordering-caught-in-the-act. Where StoreLoad reordering on normal WB memory is visible on x86. It's pretty clear that's all they're trying to test.Carlin
Thanks Peter for the comments. The link exactly explained Hadi's question. @HadiBrais If you want you can clone my test code from github.com/ysun/acrn-unit-test.git with branch 'memory_ordering'Wicker
C
3

Related: Does the Intel Memory Model make SFENCE and LFENCE redundant?

sfence has no real effect unless you're using NT stores1. If you NT-store data and then a pointer to that data (or a "ready" flag), a reader can see the old value for the data even if they see the new pointer / flag value. sfence can be used to ensure that the two stores become observable in program order.

lfence is useless for memory ordering unless you're doing NT loads from a WC memory region (like video RAM). You'll have a very hard time creating a case where commenting it out creates a detectable different in memory ordering.

The main use for lfence is to serialize execution, not memory. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths


Since you asked about C not just asm, there's a related answer about when you should use _mm_sfence() and other intrinsics. When should I use _mm_sfence _mm_lfence and _mm_mfence (usually you really only need asm("" ::: "memory"); unless NT stores are in flight, because blocking compile-time reordering gives you acq / rel ordering without any runtime barrier instructions.)


Footnote 1: That's true for normal WB (WriteBack) memory cacheability settings. In user-space under a normal OS, that's what you always have unless you did something very special.

For other memory types (MTRR or PAT settings): NT stores on uncacheable memory have no special effect, and are still strongly ordered. NT stores on WC, WB, or WT memory (or normal stores to WC memory) are weakly ordered and make it useful to use sfence before storing a buffer_ready flag for another thread.

SSE4.1 movntdqa loads from WB memory are not weakly ordered. Unlike stores, it doesn't override the memory type's ordering semantics. On current CPUs, nothing special happens at all on WB memory; they're just a less-efficient movdqa laod. Only use them on WC memory.

Carlin answered 21/6, 2019 at 16:58 Comment(6)
Hi Peter, thanks for your answer. I have some questions here. For SFENCE I created a test case trying to produce memory re-ordering example code for SFENCEWicker
@PeterCorders As code shown. a. I created a array by instruction MOVNTI. b. And right after all MOVNTI, test value array[MAX] in another CPU core (a AP). c. As my understanding according to your answer, above MOVNTI might cause memory re-ordering. In other words, the value of array[MAX] might NOT flush to memory before read in step b. But it doesn't work as expected. Could you please correct me if my wrong understanding somewhere.Wicker
@YiSun: Yes, storing to an array with movnti and then writing a "buffer ready" flag with a normal store creates the possibility of another core seeing the flag store but stale data from the array. I'd suggest making the array not a multiple of the cache-line size or not aligned, so the last store doesn't complete a 64-byte chunk and trigger immediate flush. Especially if you have the writer go into a pause loop right after writing instead of doing more memory access, that might give the best chance of not flushing the WC buffer quickly.Carlin
would like illustrate more about the 'buffer ready' flag? How can I do that. I failed to search that. Great thanks !Wicker
@YiSun: That's a description, not a name you can google. Like std::atomic<bool> ready_flag = false; and write it with ready_flag.store(true, std::memory_order_release) after doing some NT stores.Carlin
(mo_release doesn't use SFENCE, because std::atomic assumes there aren't any NT stores going on. mo_release is how you get a regular store that's ordered at compile time, without the compiler inserting mfence or xchg)Carlin

© 2022 - 2024 — McMap. All rights reserved.