ARMv8.3 meaning of rcpc
Asked Answered
B

1

7

With ARMv8.3 a new instruction has been introduced: LDAPR.

When there is a STLR followed by a LDAR to a different address, then these 2 can't be reordered and hence it is called RCsc (release consistent sequential consistent).

When there is a STLR followed by a LDAPR to a different address, then these 2 can be reordered. This is called RCpc (release consistent processor consistent).

My issue is with the PC part.

PC is a relaxation of TSO whereby TSO is multi-copy atomic and PC is non multi-copy atomic.

The memory model of ARMv8 has been improved to be multi-copy atomic because no supplier ever created a non multi-copy atomic microarchitecture and it made the memory model more complicated.

So I'm running into a contradiction.

The key question is: is every store (including relaxed) multi-copy atomic?

If so, then the PC part of rcpc doesn't make sense to me since PC is non multi-copy atomic. Could it be a legacy name due to ARM being non multi-copy atomic in the past?

There are multiple definitions of PC; so perhaps that is the cause.

Bully answered 6/8, 2021 at 5:44 Comment(0)
F
5

In practice, STLR / LDAPR gives C++ std::memory_order_acq_rel, as opposed to SC.

So "processor consistent" is presumably describing the fact that the current core sees its own operations in program order, and as a way to note that it's not sequentially consistent because they don't use that term. It doesn't mean that other parts of the memory model rules are removed.

AFAIK, yes, ARMv8 is multi-copy atomic, so every plain store (str, stp, etc.) is multi-copy atomic. i.e. It becomes visible to all other cores at the same time via coherent cache, so all threads can agree on the order of two stores done by two independent writers (the IRIW litmus test). Unlike POWER where some threads can see stores early from other SMT threads on the same physical core.

I don't think LDAPR relaxed that guarantee.

(ARMv7 did not have this property on paper, but all real-world implementations did. So ARM was able the strengthen their guarantees without actually changing how anything worked in any real ARM microarchitectures, beyond adding support for ARMv8 32-bit mode new instructions. "Shared Memory Consistency Models: A Tutorial" from 1995, linked in comments, uses the term RCpc to describe a category of memory models that does include some readers being able to see some stores before other readers, allowing IRIW. So it seems either ARMv8 is using a different meaning, or other requirements still come into play to forbid IRIW reordering.)


Big caveat: I'm not a terminology expert on this, and I've never heard of "processor consistent" before so I'm just guessing from context what they mean by it, with an interpretation that would be consistent with all known facts. Please correct me if this is incompatible with an accepted definition of the term.

Ferrara answered 8/8, 2021 at 9:11 Comment(7)
Hi Peter. Thanks for the confirmation on ARM and all stores being multi-copy store atomic. I'll give you the aware for that in 21h unless a better answer comes. I was digging through some papers for the exact definition of PC, but I also found the definition for RCpc. hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf (page 12 has a nice overview). I'll do some reading in afternoon to improve my understanding on RCpc.Bully
If you go to page 12 you will see see that with RCpc a read can see writes of other CPU's early; in other words: non multi-copy atomic. This contradicts with the multi-copy atomic behavior of ARMv8.Bully
@pveentjer: My understanding of multi-copy atomic is that the key property is that a store becomes visible to all other cores at the same time. To violate it, you'd need some cores to see it early but some not, like is possible on some real POWER CPUs, producing IRIW reordering. (See my answer on Will two atomic writes to different locations in different threads always be seen in the same order by other threads?). The ARMv7 memory model allowed that, but as you say no real HW did it.Ferrara
@pveentjer: In the IRIW litmus test, the stores can be relaxed because that the only thing each writer thread is doing, so there's nothing to order it with respect to. Load ordering in the readers is needed to make sure you're observing the ordering of changes to the global cache state, not just local load reordering. But only LoadLoad ordering, not wrt. any store. (To recover sequential consistency on PowerPC, including blocking IRIW reordering, you do need more barriers because it's not multi-copy atomic.)Ferrara
@pveentjer: Again, I'm not a terminology expert, and haven't checked a precise definition of what "multi-copy atomic" formally means, but in other context where it's come up, it was the difference between IRIW being allowed or not, and that's all. Not ordering of two operations by the same core wrt. each other. And what you're raising as an objection doesn't match with what I've read in other context where that formal term has come up.Ferrara
This paper contains a definition of single copy atomic, multi copy atomic and non multi copy atomic. It is in line with what you are describing above. cs.tufts.edu/~nr/cs257/archive/arvind/wmm.pdf.Bully
If you go to page 12 of the WRL-95-7.pdf paper you will seee the RCpc has a mark at 'read others write early' and 'read own write early'. This is the same as non multi copy atomic (so non atomic). You agree?Bully

© 2022 - 2024 — McMap. All rights reserved.