How can I make code that concurrently reads and modifies an array well-defined without introducing locking?
Asked Answered
J

1

7

I'm writing a program that computes an endgame tablebase for a chess variant. The algorithm for filling the tablebase works like this:

  1. Start with a huge array of unsigned char, each member representing one position (we always assume that it's white's turn). An array member is even if the position is lost, odd if it is won, 0xff if it invalid, 0xfe if it is a draw.
  2. Iterate over the array, marking each illegal position with 0xff, each position where white is mated with 0x00 and all other positions with 0x0fe.
  3. Iterate over the array, considering only positions marked 0xfe. Check if there is a move that leads to a position whose array member is even, if there is one, write one plus the number of that position into the member corresponding to the curren position. If all moves lead to positions indicated by odd numbers (i.e. this is a loosing position), mark this position as one plus the highest of these numbers to indicate how long the strongest defence takes.
  4. Repeat step three until the array doesn't change anymore.

For speed, I'd like to parallelize step three. Careful reading yields that in each iteration, we only ever write one value (the number of the iteration) into the array. The following strategy obtains:

  • Split the array into n parts, let one thread work on each part. Let the current iteration be i.
  • If a thread has to read a member from the array and it is equal to i, treat it as if it was set to 0xfe because that means that the member was just written concurrently by another thread.

Now obviously there is a race condition in this program but it doesn't matter as we always get the right result if there aren't any pink elephants (which can't exist if a char is written atomically). However, since on paper there is a race condition, the C compiler may declare my program to be undefined and format my hard disk.

What can I do to parallelize this algorithm without violating any constraints of the C memory model and without causing massive slowdown (e.g. by adding locks)?

Simplified problem description

Here is a simplified algorithm that demonstrates the same concept but is stripped of all the unimportant stuff:

  1. Start out with an array unsigned char a[n]. Each array member is either 0 or 1.
  2. For each array member that is set to 0: If some other array members are equal to 0 or 2, set this array member to 2. The array members checked depend on the index of the array member we want to update. There is no simple relationship between the index and the array members we need to check, it's essentially random.

Since we only ever change a 0 to a 2, it doesn't matter in which order we process the array entries, even though there is technically a race condition if we do so in parallel (since we concurrently read/write the same object). How can I tell the compiler that it shouldn't care about the race condition without sacrificing performance?

Juvenile answered 25/7, 2016 at 15:27 Comment(7)
This is an interesting question. How did the array get populated to begin with? If you hand responsibility for a chunk of the array to a thread, is there anything else touching the array at the same time?Sawmill
@Sawmill The array is initially populated in step two of the unsimplified algorithm. Each chunk of the array is only written to by the thread responsible for that chunk but will be randomly read by all other threads.Juvenile
I would duplicate your table n times, and merge the results into a single table later.Ofris
@Ofris That can't be done due to the huge size of the table (several gigabytes).Juvenile
You could probably decompose the table so that only the parts that require changing are abstracted into a smaller table, and the rest referenced from a read-only common table. But, see my comment to Chris's answer.Ofris
Now that I think about it, though, you only need two copies. The threads fill the new copy, and read from the old.Ofris
@Ofris Storing the array even twice is not an option. It is simply too large. It took me a lot of time to make sure that it fits into my computer's RAM even once.Juvenile
C
5

This is what the _Atomic type qualifier is for in C11. You would declare your array as

_Atomic unsigned char a[n];

which means that each element of the array can be read or written atomically.

Prior to C11, there's no standard way to do this, but generally, depending on the implementation, certain datatypes will be atomic for reads and writes. To know which those are, you'll have to look at the documentation for the implementation you are using.


Note that the default memory ordering for C11 _Atomic accesses is memory_order_seq_cst (sequential consistency), and if you don't need that, you can use atomic_load_explicit and atomic_store_explicit actions with a weaker memory ordering (ie memory_order_relaxed in your example)

Carrie answered 25/7, 2016 at 16:58 Comment(5)
That's a possibility, but then the compiler emits a memory barrier, which is something I don't need (I don't care for updated data).Juvenile
Perhaps atomic_load_explicit() with memory_order_relaxed?Ofris
@Ofris The load doesn't have a barrier, only the store has. Though, if I relax the memory order this might work.Juvenile
@Ofris memory_order_relaxed on both load and store does the trick. Would you mind posting that as an answer?Juvenile
@FUZxxl: Chris has already updated the answer. I don't have a problem with you accepting Chris's answer.Ofris

© 2022 - 2024 — McMap. All rights reserved.