What, if any, are the alignment requirements for the atomic intrinsic functions?
Asked Answered
F

2

5

Atomic operations for the Delphi mobile targets are built on top of the AtomicXXX family of intrinsic functions. The documentation says:

Because the Delphi mobile compilers do not support a built-in assembler, the System unit provides four atomic intrinsic functions that provide a way to atomically exchange, compare and exchange, increment, and decrement memory values.

These four functions are:

Other RTL functions that provide atomic operations, e.g. the static class methods of the TInterlocked class, are built on top of these four intrinsics.

For the mobile compilers that target ARMv7, are there any alignment requirements for these four atomic intrinsics? If so, what are they?

The documentation does not list any such requirements. However, the documentation has been known to be inaccurate and I am not confident to take the absence of any stated requirements as definitive proof that no such requirements exist.

As a mild aside, the XE8 documentation for intrinsic functions states that these atomic intrinsics are not supported by the desktop compilers. That is not correct – these intrinsics are supported by the desktop compilers.

Federicofedirko answered 24/8, 2015 at 2:56 Comment(0)
S
4

XE8 compiles

var 
  a: integer;

AtomicIncrement(a);

to

3e: 2201        movs    r2, #1
40: 900c        str r0, [sp, #48]   ; 0x30
42: 910b        str r1, [sp, #44]   ; 0x2c
44: 920a        str r2, [sp, #40]   ; 0x28
46: 980b        ldr r0, [sp, #44]   ; 0x2c
48: e850 1f00   ldrex   r1, [r0]
4c: 9a0a        ldr r2, [sp, #40]   ; 0x28
4e: 4411        add r1, r2
50: e840 1300   strex   r3, r1, [r0]
54: 2b00        cmp r3, #0
56: d1f6        bne.n   46 <_NativeMain+0x46>

So the atomicity is implemented using the ldrex/strex.

If I'm interpreting information at community.arm.com correctly, required alignment is DWORD-aligned for 4-byte operations (ldrd/strd) and QWORD-aligned for 8-byte operations.

Other atomic functions are implemented in a similar way so the same requirements should apply.

AtomicDecrement(a);

68: 980f        ldr r0, [sp, #60]   ; 0x3c
6a: e850 1f00   ldrex   r1, [r0]
6e: 9a0e        ldr r2, [sp, #56]   ; 0x38
70: 1a89        subs    r1, r1, r2
72: e840 1300   strex   r3, r1, [r0]
76: 2b00        cmp r3, #0
78: d1f6        bne.n   68 <_NativeMain+0x68>

AtomicExchange(a,b);

82: 990f        ldr r1, [sp, #60]   ; 0x3c
84: 6008        str r0, [r1, #0]
86: 4873        ldr r0, [pc, #460]  ; (254 <_NativeMain+0x254>)
88: 9a10        ldr r2, [sp, #64]   ; 0x40
8a: 5880        ldr r0, [r0, r2]
8c: 6800        ldr r0, [r0, #0]
8e: f3bf 8f5b   dmb ish
92: 900d        str r0, [sp, #52]   ; 0x34
94: 980f        ldr r0, [sp, #60]   ; 0x3c
96: e850 1f00   ldrex   r1, [r0]
9a: 9b0d        ldr r3, [sp, #52]   ; 0x34
9c: e840 3200   strex   r2, r3, [r0]
a0: 2a00        cmp r2, #0
a2: 910c        str r1, [sp, #48]   ; 0x30
a4: d1f6        bne.n   94 <_NativeMain+0x94>

AtomicCmpExchange(a, 42, 17);

ae: 990f        ldr r1, [sp, #60]   ; 0x3c
b0: 6008        str r0, [r1, #0]
b2: f3bf 8f5b   dmb ish
b6: 202a        movs    r0, #42 ; 0x2a
b8: 2211        movs    r2, #17
ba: 900b        str r0, [sp, #44]   ; 0x2c
bc: 920a        str r2, [sp, #40]   ; 0x28
be: 980f        ldr r0, [sp, #60]   ; 0x3c
c0: e850 1f00   ldrex   r1, [r0]
c4: 9a0a        ldr r2, [sp, #40]   ; 0x28
c6: 4291        cmp r1, r2
c8: d105        bne.n   d6 <_NativeMain+0xd6>
ca: 990b        ldr r1, [sp, #44]   ; 0x2c
cc: 9a0f        ldr r2, [sp, #60]   ; 0x3c
ce: e842 1000   strex   r0, r1, [r2]
d2: 2800        cmp r0, #0
d4: d1f3        bne.n   be <_NativeMain+0xbe>
Satanic answered 24/8, 2015 at 16:38 Comment(1)
Alignment requirements for LDREX and STREX stems from use of exclusive monitors not from generic alignment requirements for LDR(D)/STR(D)Justiciary
J
3

Atomicity is usually implemented using LDREX and STREX (Load Exclusive / Store Exclusive instructions). These instructions use a concept called exclusive monitors. Check out: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html Look for 'Exclusives Reservation Granule'

So your alignment requirements is implementation specific and will be decided by the exclusive monitor mechanism implemented on your hardware. I would suggest you take a look at the CPU/SoC documentation for exclusive monitor section.

Eg. When internal monitors are used and these monitors are usually implemented at cache level (usually L2). Each cache line will have a monitor.

  • Thus your atomic data should be contained in a single cache line, alignment will follow from this requirement
  • If multiple atomics occupy the same cache line, when one atomic is in exclusive state all other atomics in the same cache line will be in a false exclusive state. This will cause inefficiencies in locking. Having cache line aligned atomics avoid this problem. Note: Multiple atomics in the same cache line will still work, but will be inefficient
Justiciary answered 25/8, 2015 at 7:46 Comment(3)
The conclusion in the last paragraph is wrong. Suppose that the cache line size is, say, 32 bytes. Then you don't need to align to 32 byte boundaries. Aligning to the data size suffices, so long as the data is no bigger than 32 bytes. As an example, 4 byte data can be 4 byte aligned and will never straddle cache lines.Hypoplasia
@DavidHeffernan. The alignment problem I am referring to, does not arise out of one atomic data unit straddling over multiple cache lines. But the problem lies when multiple atomic variables lies in the same cache line. When an atomic unit is moved to exclusive state, unrelated atomic units in the same cache line will wrongly become exclusive (or unfairly locked up for use). So for efficiency reasons it is better to have atomics separated by cache lines. (Functionally there would be no difference with 4 byte alignment)Justiciary
That's the false sharing issue, which is somewhat tangential to the issue here, I thinkHypoplasia

© 2022 - 2024 — McMap. All rights reserved.