Consider the following attempt at a lockless hashtable for multithreaded search algorithms (inspired by this paper)
struct Data
{
uint64_t key;
uint64_t value;
};
struct HashEntry
{
uint64_t key_xor_value;
uint64_t value;
};
void insert_data(Data const& e, HashEntry* h, std::size_t tableOffset)
{
h[tableOffset].key_xor_value = e.key ^ e.value;
h[tableOffset].value = e.value;
}
bool data_is_present(Data const& e, HashEntry const* h, std::size_t tableOffset)
{
auto const tmp_key_xor_value = h[tableOFfset].key_xor_value;
auto const tmp_value = h[tableOffset].value;
return e.key == (tmp_key_xor_value ^ tmp_value);
}
The idea is that a HashEntry
struct stores the XOR-ed combination of the two 64-bit words of a Data
struct. If two threads have interleaved reads/writes to the two 64-bit words of a HashEntry
struct, the idea is that this can be detected by the reading thread by XOR-ing again and comparing against the original key
. So one might have a loss of efficiency by corrupted hash entries, but still have guaranteed correctness in case the decoded retrieved key matches the original.
The paper mentions that it is based on the following assumption:
For the remainder of this discussion, assume that 64 bit memory read/write operations are atomic, that is the entire 64 bit value is read/written in one cycle.
My questions are: is the above code without explicit use of std::atomic<uint64_t>
guaranteed to be thread-safe in C++11? Or can the individual 64-bit words be corrupted by simultaneous reads/writes? Even on 64-bit platforms? And how is this different from the old C++98 Standard?
Quotes from the Standard would be much appreciated.
UPDATE: based on this amazing paper by Hans Boehm on "benign" data races, a simple way to get bitten is for the compiler to cancel both XORs from insert_data()
and data_is_present()
to alway return true
, e.g. if it finds a local code fragment like
insert_data(e, h, t);
if (data_is_present(e, h, t)) // optimized to true as if in single-threaded code
read_and_process(e, h, t); // data race if other thread has written
HashEntry
could be on separate cache lines, so another legal thing is that one is fresh and the other is stale, forever, anddata_is_present
always returns false. Obviously anything's legal because a data race is UB, I'm just giving examples of things that could actually happen with only a plausible change to the hardware guarantees you're used to, or only a plausible amount of optimization by the compiler. The standard does not assume x64 ;-) – Plebeiandata_is_present()
always returningfalse
is inefficient but not incorrect. Returningtrue
when it is actuallyfalse
would lead to a real bug. If one would run this code on a x64 platform with a compiler knowing that target, how could that plausibly happen? I'm just asking because people using this type of code haven't really reported such behavior (of course, they might not have looked too hard). – Fresnovolatile
(if you usevolatile
and MSVC). – Plebeian