Is this C++ AtomicInt implementation correct?
Asked Answered
T

1

9

Premise: I'm working with an ARM embedded (almost bare-metal) environment where I don't even have C++11 (with std::atomic<int>) available, so please avoid answers like "just use standard C++ std::atomic<int>": I can't.

Is this ARM implementation of AtomicInt correct? (assume ARM architecture is ARMv7-A)

Do you see some synchronization issue? Is it volatile required/useful?

// File: atomic_int.h

#ifndef ATOMIC_INT_H_
#define ATOMIC_INT_H_

#include <stdint.h>

class AtomicInt
{
public:
    AtomicInt(int32_t init = 0) : atom(init) { }
    ~AtomicInt() {}

    int32_t add(int32_t value); // Implement 'add' method in platform-specific file

    int32_t sub(int32_t value) { return add(-value); }
    int32_t inc(void)          { return add(1);      }
    int32_t dec(void)          { return add(-1);     }

private:
    volatile int32_t atom;
};

#endif
// File: arm/atomic_int.cpp

#include "atomic_int.h"

int32_t AtomicInt::add(int32_t value)
{
    int32_t res, prev, tmp;

    asm volatile(

    "try:    ldrex   %1, [%3]\n"     // prev = atom;
    "        add     %0, %1, %4\n"   // res = prev + value;
    "        strex   %2, %0, [%3]\n" // tmp = outcome(atom = res); // may fail
    "        teq     %2, #0\n"       // if (tmp)
    "        bne     try"            //     goto try; /* add failed: someone else modified atom -> retry */

    : "=&r" (res), "=&r" (prev), "=&r" (tmp), "+mo" (atom)  // output (atom is both in-out)
    : "r" (value)                                           // input
    : "cc");                                                // clobbers (condition code register [CPSR] changed)

    return prev; // safe return (local variable cannot be changed by other execution contexts)
}

Also, I'm trying to achieve some code reuse, that is why I isolated just one basic function to implement in platform-specific code (add() method inside arm/atomic_int.cpp).

Is atomic_int.h really portable as it is across different platforms/architectures/compilers? Is this approach feasible? (With feasible I mean feasible for every platform to guarantee atomicity by implementing just the add() method).

here is the corresponding ARM GCC 8.3.1 implementation of the same function. Apparently, the only real difference is the presence of dmb before and after. Are they really needed in my case? Why? Do you have an example where my AtomicInt (without dmb) fails?

UPDATE: fixed implementation, removed get() method to solve atomicity and alignment issues. Now the add() behaves like a standard fetchAndAdd().

Trover answered 11/10, 2019 at 10:20 Comment(17)
volatile keyword in C++ means don't optimize via variable. So get() method benefits from it. Though, in general, volatile is about to deprycates in C++. If your system cannot built-in syncronize 32-bit data, then you have little choice but to use mutexes - spinlock in the very least.Flosi
What version of the arm architecture are you using? armv-7?Dextro
This doesn't address the question, but names that contain two consecutive underscores (__ATOMIC_INT_H_) and names that begin with an underscore followed by a capital letter are reserved for use by the implementation. Don't use them in your code.Carchemish
The member name atomic is probably best not used to avoid confusion with std::atomic, although it begs the question why you wouldn't just use that in any case.Footstool
Added ARM architecture, renamed __ATOMIC_INT_H_ identifier.Trover
Renamed atomic with atom. @clifford if you're asking me why I'm not using std::atomic, it is because the cross-compiler I'm using does not support C++11.Trover
@Trover : Fair enough. Is platform portability really an issue given the in-line assembler - you'd have to rewrite that in any case for each platform. Your compiler probably has __attribute__ or #pragma directives to force a specific alignment, or the compiler documentation will tell you what it guarantees or not otherwise. But yes, the implementation will be platform, architecture and/or compiler specific.Footstool
@Footstool I'm working on a sort of framework and I wanted to achieve complete portability for all the rest of the software that will use framework classes (like AtomicInt). At the same time, I want the framework itself to be "easily" ported, i.e. minimizing and isolating the framework code that is platform/architecture/compiler specific, so that I will have to rewrite only specific portions of the framework when porting. So my question here is dual: did I isolate the necessary/sufficient portion of AtomicInt? Did I implemented that portion correctly for ARMv7-A?Trover
The code in @Maxim Egorushkin's answer has a memory barrier before returning. Seems you don't need one before the ldrex, but the ARM documentation I found isn't too clear.Presentative
@Presentative I have your same doubts, documentation it's not very clear to me. I'm not sure about dmb, and I'm also not sure if clrex is needed to reset exclusive monitors during context switches (or is it done automatically) on Cortex-A.Trover
The get function also needs to be in inline asm or you can't guarantee atomic read. It's not enough that the variable is 32 bits - a C compiler is likely to use multiple instructions to for example temporarily store the value in a register.Abessive
@Lundin: you are right, in fact I removed the get() method. Please focus on implementation v2 without get() method.Trover
Seems to me that you need to add a memory barrier as volatile doesn't guarantee that. And (as far as I know) asm volatile("...") is a compiler specific extension so it can be compiler specific.Missend
@TarickWelling: for asm volatile("...") it is ok for me to be compiler specific, I only want atomic_int.h to be portable (question edited: I further clarified my portability issue). For memory barrier, I would like to understand why and in which cases it is needed.Trover
If the inline asm has all the needed specification, and everything is access with inline asm, then inherently no variable needs to be volatile: volatile applies to access in C/C++, not in code linked (external assembly code) or fused (inline assembly) into C/C++.Kernel
@Kernel good point, but I'm not sure if all implementations will use inline assembly so I would rather keep the volatile (if it's not harmful) in order to be more generic.Trover
I did further research on memory barriers: they are only needed in case of multi-core architectures (or in general when the accessed memory is shared with other devices) and only if the CPU performs out-of-order execution. In my case, since I'm assuming ARMv7-A (that has different implementations matching this description), I should add memory barriers in order to be compatible with all ARMv7-A cores (even if the core I'm currently working on is single-issue/in-order and would not need them).Trover
U
2

If you use gcc may be you can use Legacy __sync Built-in Functions for Atomic Memory Access:

void add(int volatile& a, int value) {
    __sync_fetch_and_add(&a, value);
}

Generates:

add(int volatile&, int):
.L2:
        ldxr    w2, [x0]
        add     w2, w2, w1
        stlxr   w3, w2, [x0]
        cbnz    w3, .L2
        dmb     ish
        ret
Udele answered 11/10, 2019 at 10:53 Comment(4)
Unfortunately I'm not using gcc, and in any case I don't want to bind the implementation to any specific compiler. Thank you anyway for your hint, at least it tells me that my ARM add() part should be correct. What's the difference between ldxr and ldrex?Trover
This is ARM8 (e.g.64bit) rather than one of the 32-bit versions.Presentative
I managed to get the corresponding code by specifying the target architecture: link. It seems GCC actually puts dmb before and after the ldrex/strex loop.Trover
I think this is the good approach but to make it compiler independent just go to godbolt.org/z/WB8rxw type in the function you want using gcc builtins and copy the corresponding assembly output. Be sure to match the -march parameter with the specific version of ARM.Teasel

© 2022 - 2024 — McMap. All rights reserved.