Is this C++ AtomicInt implementation correct?

Asked 11/10, 2019 at 10:20 Answered 11/10, 2019 at 10:53

Premise: I'm working with an ARM embedded (almost bare-metal) environment where I don't even have C++11 (with std::atomic<int>) available, so please avoid answers like "just use standard C++ std::atomic<int>": I can't.

Is this ARM implementation of AtomicInt correct? (assume ARM architecture is ARMv7-A)

Do you see some synchronization issue? Is it volatile required/useful?

// File: atomic_int.h

#ifndef ATOMIC_INT_H_
#define ATOMIC_INT_H_

#include <stdint.h>

class AtomicInt
{
public:
    AtomicInt(int32_t init = 0) : atom(init) { }
    ~AtomicInt() {}

    int32_t add(int32_t value); // Implement 'add' method in platform-specific file

    int32_t sub(int32_t value) { return add(-value); }
    int32_t inc(void)          { return add(1);      }
    int32_t dec(void)          { return add(-1);     }

private:
    volatile int32_t atom;
};

#endif

// File: arm/atomic_int.cpp

#include "atomic_int.h"

int32_t AtomicInt::add(int32_t value)
{
    int32_t res, prev, tmp;

    asm volatile(

    "try:    ldrex   %1, [%3]\n"     // prev = atom;
    "        add     %0, %1, %4\n"   // res = prev + value;
    "        strex   %2, %0, [%3]\n" // tmp = outcome(atom = res); // may fail
    "        teq     %2, #0\n"       // if (tmp)
    "        bne     try"            //     goto try; /* add failed: someone else modified atom -> retry */

    : "=&r" (res), "=&r" (prev), "=&r" (tmp), "+mo" (atom)  // output (atom is both in-out)
    : "r" (value)                                           // input
    : "cc");                                                // clobbers (condition code register [CPSR] changed)

    return prev; // safe return (local variable cannot be changed by other execution contexts)
}

Also, I'm trying to achieve some code reuse, that is why I isolated just one basic function to implement in platform-specific code (add() method inside arm/atomic_int.cpp).

Is atomic_int.h really portable as it is across different platforms/architectures/compilers? Is this approach feasible? (With feasible I mean feasible for every platform to guarantee atomicity by implementing just the add() method).

here is the corresponding ARM GCC 8.3.1 implementation of the same function. Apparently, the only real difference is the presence of dmb before and after. Are they really needed in my case? Why? Do you have an example where my AtomicInt (without dmb) fails?

UPDATE: fixed implementation, removed get() method to solve atomicity and alignment issues. Now the add() behaves like a standard fetchAndAdd().

Trover answered 11/10, 2019 at 10:20 Comment(17)

volatile keyword in C++ means don't optimize via variable. So get() method benefits from it. Though, in general, volatile is about to deprycates in C++. If your system cannot built-in syncronize 32-bit data, then you have little choice but to use mutexes - spinlock in the very least. – Flosi 11/10, 2019 at 10:55

What version of the arm architecture are you using? armv-7? – Dextro 11/10, 2019 at 12:20

This doesn't address the question, but names that contain two consecutive underscores (__ATOMIC_INT_H_) and names that begin with an underscore followed by a capital letter are reserved for use by the implementation. Don't use them in your code. – Carchemish 11/10, 2019 at 13:26

The member name atomic is probably best not used to avoid confusion with std::atomic, although it begs the question why you wouldn't just use that in any case. – Footstool 11/10, 2019 at 14:25

Added ARM architecture, renamed __ATOMIC_INT_H_ identifier. – Trover 11/10, 2019 at 14:28

Renamed atomic with atom. @clifford if you're asking me why I'm not using std::atomic, it is because the cross-compiler I'm using does not support C++11. – Trover 11/10, 2019 at 14:41

@Trover : Fair enough. Is platform portability really an issue given the in-line assembler - you'd have to rewrite that in any case for each platform. Your compiler probably has __attribute__ or #pragma directives to force a specific alignment, or the compiler documentation will tell you what it guarantees or not otherwise. But yes, the implementation will be platform, architecture and/or compiler specific. – Footstool 11/10, 2019 at 14:47

@Footstool I'm working on a sort of framework and I wanted to achieve complete portability for all the rest of the software that will use framework classes (like AtomicInt). At the same time, I want the framework itself to be "easily" ported, i.e. minimizing and isolating the framework code that is platform/architecture/compiler specific, so that I will have to rewrite only specific portions of the framework when porting. So my question here is dual: did I isolate the necessary/sufficient portion of AtomicInt? Did I implemented that portion correctly for ARMv7-A? – Trover 11/10, 2019 at 15:2

The code in @Maxim Egorushkin's answer has a memory barrier before returning. Seems you don't need one before the ldrex, but the ARM documentation I found isn't too clear. – Presentative 13/10, 2019 at 0:14

@Presentative I have your same doubts, documentation it's not very clear to me. I'm not sure about dmb, and I'm also not sure if clrex is needed to reset exclusive monitors during context switches (or is it done automatically) on Cortex-A. – Trover 14/10, 2019 at 7:57

The get function also needs to be in inline asm or you can't guarantee atomic read. It's not enough that the variable is 32 bits - a C compiler is likely to use multiple instructions to for example temporarily store the value in a register. – Abessive 14/10, 2019 at 9:13

@Lundin: you are right, in fact I removed the get() method. Please focus on implementation v2 without get() method. – Trover 14/10, 2019 at 9:18

Seems to me that you need to add a memory barrier as volatile doesn't guarantee that. And (as far as I know) asm volatile("...") is a compiler specific extension so it can be compiler specific. – Missend 14/10, 2019 at 15:15

@TarickWelling: for asm volatile("...") it is ok for me to be compiler specific, I only want atomic_int.h to be portable (question edited: I further clarified my portability issue). For memory barrier, I would like to understand why and in which cases it is needed. – Trover 14/10, 2019 at 15:29

If the inline asm has all the needed specification, and everything is access with inline asm, then inherently no variable needs to be volatile: volatile applies to access in C/C++, not in code linked (external assembly code) or fused (inline assembly) into C/C++. – Kernel 18/10, 2019 at 0:17

@Kernel good point, but I'm not sure if all implementations will use inline assembly so I would rather keep the volatile (if it's not harmful) in order to be more generic. – Trover 21/10, 2019 at 8:21

I did further research on memory barriers: they are only needed in case of multi-core architectures (or in general when the accessed memory is shared with other devices) and only if the CPU performs out-of-order execution. In my case, since I'm assuming ARMv7-A (that has different implementations matching this description), I should add memory barriers in order to be compatible with all ARMv7-A cores (even if the core I'm currently working on is single-issue/in-order and would not need them). – Trover 21/10, 2019 at 8:28

If you use gcc may be you can use Legacy __sync Built-in Functions for Atomic Memory Access:

void add(int volatile& a, int value) {
    __sync_fetch_and_add(&a, value);
}

Generates:

add(int volatile&, int):
.L2:
        ldxr    w2, [x0]
        add     w2, w2, w1
        stlxr   w3, w2, [x0]
        cbnz    w3, .L2
        dmb     ish
        ret

Udele answered 11/10, 2019 at 10:53 Comment(4)

Unfortunately I'm not using gcc, and in any case I don't want to bind the implementation to any specific compiler. Thank you anyway for your hint, at least it tells me that my ARM add() part should be correct. What's the difference between ldxr and ldrex? – Trover 11/10, 2019 at 11:21

This is ARM8 (e.g.64bit) rather than one of the 32-bit versions. – Presentative 13/10, 2019 at 11:39

I managed to get the corresponding code by specifying the target architecture: link. It seems GCC actually puts dmb before and after the ldrex/strex loop. – Trover 14/10, 2019 at 15:46

I think this is the good approach but to make it compiler independent just go to godbolt.org/z/WB8rxw type in the function you want using gcc builtins and copy the corresponding assembly output. Be sure to match the -march parameter with the specific version of ARM. – Teasel 16/10, 2019 at 11:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags