Forcing padding where it's not needed would be bad design. Users can always pad if they have nothing useful to put in the rest of the cache line.
You probably want it in the same cache line as the data it's protecting if it's usually lightly contended; only one cache line to bounce around, instead of a 2nd cache miss when accessing the shared data after acquiring the lock. This is probably common with fine-grained locking where many objects have their own std::mutex
, and makes it more beneficial to keep it small.
(Heavily contended could create false sharing between readers trying to acquire the lock vs. the lock owner writing to the shared data after gaining ownership of the lock. Flipping the cache line to "shared", or invalidating, before the lock owner has a chance to write, would indeed slow things down).
Or the space in the rest of the line could be some very-rarely-used thing that needs to exist somewhere in the program, but maybe only used for error handling so its performance doesn't matter. If it couldn't share a line with a mutex, it would have to be taking up space somewhere else. (Maybe in some page of "cold" data, so this isn't a great example).
It's probably unlikely that you'd want to malloc
or new
a mutex itself, although one could be part of a class you dynamically allocate. Allocator overhead is a real thing, e.g. using 16 bytes of memory before the allocation for bookkeeping space. (Large allocations with glibc's malloc/new are often page-aligned + 16 bytes, making them misaligned wrt. all wider boundaries). Dynamic-allocator bookkeeping is a very good thing for a mutex to be sharing space with: it's probably not read or written by anything while the mutex is in use.
Non-lock-free std::atomic
objects typically use an array of locks (maybe just simple spinlocks, but could be std::mutex). If the latter, you don't expect two adjacent mutexes to be used simultaneously so it's good to pack them all together.
Also, increasing its size would be a very clunky way to try to ensure no false sharing. An implementation that wanted to make sure a std::mutex had a cache line to itself this would want to declare it with alignas(64)
to make sure its alignof()
was that. That would force padding to make sizeof(mutex) a multiple of alignof (in this case equal).
But note that std::hardware_destructive_interference_size
should be 128 on some modern x86-64, if you're going to fix a size for it, because of adjacent-line hardware prefetch in Intel's L2 caches. That's a weaker destructive effect than same cache-line, and that's too much space to waste.
std:mutex
is little more than a glorified structure and that the default minimum alignment of a structure depends on its largest field and not the size of the structure as a whole; then it's reasonable to assume thatsizeof(std:mutex)
has almost nothing to do with minimal alignment at all, and is even less indicative of optimal alignment. Instead, if you want 64 byte alignment you want 64 byte alignment regardless of structure size (e.g. using something likealignas(64)
); andsizeof()
is mostly irrelevant, andstd::alignment_of()
should be used instead. – Malacologyalignof(mutex) = sizeof(mutex) = std::hardware_destructive_interference_size
or something to make sure a mutex has a cache line to itself. (note that hw_destructive_... should be 128 on some modern x86-64, if you're going to fix a size for it, because of adjacent-line HW prefetch) – Universe