fetch_add with acq_rel memory order

Asked 17/11, 2016 at 7:35 Answered 4/6, 2020 at 15:12

Consider an

std::atomic<int> x(0);

Let's suppose I have a function doing the following:

int x_old = x.fetch_add(1,std::memory_order_acq_rel);

Based on the description for acquire release memory ordering:

memory_order_relaxed Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation (see Relaxed ordering below)

memory_order_consume A load operation with this memory order performs a consume operation on the affected memory location: no reads or writes in the current thread dependent on the value currently loaded can be reordered before this load. Writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only (see Release-Consume ordering below)

memory_order_acquire A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below)

memory_order_release A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below).

memory_order_acq_rel A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

memory_order_seq_cst Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below)

Is it possible for 2 distinct threads to receive the same x_old value of 0? Or are they guaranteed to execute in a manner that x_old is 0 for only one of them, and is 1 for the other.

If it is true that x_old could be 0 for both of them, does changing the memory ordering to std::memory_order_seq_cst guarantee uniqueness of x_old?

Privett answered 17/11, 2016 at 7:35 Comment(9)

Memory ordering is not relevant. Two threads cannot possibly get the same value (assuming the expression is reached only once in each thread). – Reckless 17/11, 2016 at 9:4

Why should memory ordering not matter ? Couldn't both threads get the same x_old under acq_rel because there is no read-read ordering guaranteed for a read-modify-write operation? – Privett 17/11, 2016 at 9:31

Ordering is simply not relevant here. The first two values returned from fetch_add will always be: 0 and 1, but there is no guarantee which thread gets which value. This is true regardless of which memory_order you choose. – Reckless 17/11, 2016 at 10:15

In that case, what exactly is the difference between a seq_cst and acq_rel memory_ordering on a fetch_add operation? – Privett 17/11, 2016 at 10:29

Operation with seq_cst also prevents reordering of other acquire and release operations across it. – Reckless 17/11, 2016 at 10:33

@Reckless "Operation with seq_cst also prevents reordering of other acquire and release operations" why especially these? – Angarsk 15/11, 2019 at 21:5

@AdityaSihag "what exactly is the difference between a seq_cst" Comments are not for completely new Q. – Angarsk 15/11, 2019 at 21:7

@Reckless "Operation with seq_cst also prevents reordering of other acquire and release operations across it" You can't reorder w/ acq-rel operations, in general. – Angarsk 3/1, 2020 at 22:57

"memory_order_seq_cst Any operation with this memory order is both an acquire operation and a release operation" It isn't precisely true. Any operation that wouldn't accept either mo_acquire or mo_release (because it either doesn't observe a value or doesn't modify a value) is not magically made an acquire or a release operation by virtue of being memory_order_seq_cst. – Angarsk 3/1, 2020 at 23:6

Is it possible for 2 distinct threads to receive the same x_old value of 0?

It is not possible because the operation is atomic. It either happens in full, or not happen at all.

Ordering is concerned with preceding/following loads/stores and since you do not have any, ordering is irrelevant here. In other words, x.fetch_add(1, std::memory_order_relaxed); has the same effect here.

On current x86 is it the same lock xadd instruction regardless of memory_order, lock prefix provides both atomicity and ordering. For memory_order_relaxed the ordering part of lock is unnecessary.

Silence answered 17/11, 2016 at 11:7 Comment(6)

so x.exchange(1, std::memory_order_relaxed) is also atomic and ordering is irrelevant here ? – Kessia 22/10, 2018 at 8:11

@DerekZhang That's what the answer says, you got it right. – Silence 22/10, 2018 at 9:21

Thanks! I have anther question. thread A run x.store(1, std::memory_order_release) first, then thread B run x.load(std::memory_order_acquire) . x in thread B is not guaranteed to read 1 store by A. If I use memory_order_seq_cst, it will guaranteed to read 1. right? – Kessia 22/10, 2018 at 9:57

@DerekZhang It is best to post another question as a question, rather than a comment. – Silence 22/10, 2018 at 9:59

#52927415 Thank you ! – Kessia 22/10, 2018 at 10:13

"For memory_order_relaxed the ordering part of lock is unnecessary." The ordering is necessary for the lock operation to avoid deadlocks. You can't get anything cheaper. – Angarsk 3/1, 2020 at 23:9

Any operation that is performed on memory is done inside the processor. Even if it is an atomic operation, the processor will read, modify and write back the new value. If the operation fails (depending on implementation it may not be able to fail, but rather block), it repeats itself. If it succeeds, for the operation to be correct, the new value must be the immediately previous value, modified as requested and then stored. The value modified is returned to the user. There is no reason for the processor to read again from memory and return the value from a random time. If the value returned was not the immediately previous, the resulting operation would be incorrect.

You can test it using something like this:

long repeats = 1000000000;
long x = 0;
long sum = 0;
void *test_func(void*arg){
    long local_sum = 0;
    for (int i = 0; i < repeats; ++i) {
        local_sum += atomic_fetch_add_explicit(&x, 1, memory_order_relaxed);
    }
    atomic_fetch_add(&sum, local_sum);
    return NULL;
}

If the result is the same as a sequential execution, then all works fine.

    long correct_res = 0;
    for (int i = 0; i < repeats * no_threads; ++i) {
        correct_res = correct_res + i;
    }

And for the complete code:

#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>

long repeats = 1000000000;
long x = 0;
long sum = 0;
void *test_func(void*arg){
    long local_sum = 0;
    for (int i = 0; i < repeats; ++i) {
        local_sum += atomic_fetch_add_explicit(&x, 1, memory_order_relaxed);
    }
    atomic_fetch_add(&sum, local_sum);
    return NULL;
}

int main() {
    long correct_res = 0;
    for (int i = 0; i < repeats * 2; ++i) {
        correct_res = correct_res + i;
    }
    pthread_t pthread[2];
    pthread_create(&pthread[0], NULL, test_func, NULL);
    pthread_create(&pthread[1], NULL, test_func, NULL);

    pthread_join(pthread[0], NULL);
    pthread_join(pthread[1], NULL);
    printf("correct res : %ld\n res : %ld\n", correct_res, sum);
    if(correct_res == sum)
        printf("Success.\n");
    else
        printf("Failure.\n");
    return 0;
}

Berner answered 4/6, 2020 at 15:12 Comment(0)

Recommended topics

Hot tags