While doing some research about lock-free/wait-free algorithms, I stumbled upon the false sharing problem. Digging a bit more led me to Folly's source code (Facebook's C++ library) and more specifically to this header file and the definition of the FOLLY_ALIGN_TO_AVOID_FALSE_SHARING
macro (currently at line 130). What surprised me the most at first glance was the value: 128 (i.e.: instead of 64)...
/// An attribute that will cause a variable or field to be aligned so that
/// it doesn't have false sharing with anything at a smaller memory address.
#define FOLLY_ALIGN_TO_AVOID_FALSE_SHARING __attribute__((__aligned__(128)))
AFAIK, cache blocks on modern CPUs are 64 bytes long and actually, every resources I found so far on the matter, including this article from Intel, talk about 64 bytes aligning and padding to help work around false sharing.
Still, the folks at Facebook align and pad their class members to 128 bytes when needed. Then I found out the beginning of an explanation just above FOLLY_ALIGN_TO_AVOID_FALSE_SHARING
's definition:
enum {
/// Memory locations on the same cache line are subject to false
/// sharing, which is very bad for performance. Microbenchmarks
/// indicate that pairs of cache lines also see interference under
/// heavy use of atomic operations (observed for atomic increment on
/// Sandy Bridge). See FOLLY_ALIGN_TO_AVOID_FALSE_SHARING
kFalseSharingRange = 128
};
While it gives me a bit more details, I still feel I need some insights. I'm curious about how the sync of contiguous cache lines, or any RMW operation on them could interfere with each other under heavy use of atomic operations. Can someone please enlighten me on how this can even possibly happen?