Swapping Variables (C++, processor level)

About

Asked 8/12, 2011 at 12:53 Answered 9/12, 2011 at 19:10

click here to access the chatroom for this question.

I would like to swap two variables. and i would like to do it through the pipeline using a Read After Write hazard to my advantage.

Pipeline:

OPERXXXXXX FetchXXXXX DecodeXXXX ExecuteXXX WriteBkXXX
STORE X, Y ---------- ---------- ---------- ----------
STORE Y, X STORE X, Y ---------- ---------- ----------
---------- STORE Y, X STORE X, Y ---------- ----------
---------- ---------- STORE Y, X STORE X, Y ----------
---------- ---------- ---------- STORE Y, X STORE X, Y
---------- ---------- ---------- ---------- STORE Y, X

how do i go about telling the compiler to do that (and exactly that) without automatic locks and warning flags? can you suggest any literature/keywords?

specs:

-> target: modern architectures which support multistation (more than 4) pipelining
-> this is not related to any particular 'problem'. just for the sake of science.

current hurdles:

if you know how to ignore datahazards, please share.

Breviary answered 8/12, 2011 at 12:53 Comment(24)

Why is there no tag for hazards? Because nobody's made one yet... – Muss 8/12, 2011 at 13:0

Pipeline hazards are dealt with by the CPU, not the compiler... – Allo 8/12, 2011 at 13:6

The C++ abstract machine does not have an instruction pipeline. What is the actual problem that you are trying to solve? – Fleece 8/12, 2011 at 13:32

@OliCharlesworth, MIPS meant "Microprocessor without Interlocked Pipeline Stages" in order to indicate that hazards had to be handled by the programmer or the compiler, but that approach exposes too much of the micro-architecture to be viable. Different models have different pipelines, and without going to the 40+ stages of P4, most have more than 10 stages. – Pastorship 8/12, 2011 at 13:37

@OliCharlesworth - then how do you talk to the cpu?/tell the cpu to not be stupid? – Breviary 8/12, 2011 at 13:43

@Fleece not any particular problem - I just want to try to implement this particular approach for the sake of science. – Breviary 8/12, 2011 at 13:45

@Breviary - In that case you would be better off writing the whole thing in the assembly language for the architecture that you are interested in. C++ provides almost no access to the underlying hardware (although compilers often extend the language to give you slightly more control). – Fleece 8/12, 2011 at 13:49

@Mankarse: Even if you write this in assembler, the CPU will spot the RAW hazard. – Allo 8/12, 2011 at 13:58

@OliCharlesworth: That depends on the CPU (which is not specified in the question). – Fleece 8/12, 2011 at 13:59

+1 for an interesting question for which I have no idea of the answer. – Devoice 8/12, 2011 at 14:1

@Mankarse: Fair point. Ok, I'll rephrase as "most modern desktop/server CPUs will spot the RAW hazard". – Allo 8/12, 2011 at 14:1

@Fleece that's the answer I was afraid of - most non primitive architectures have the capability to do just that if I understand this correctly - however the portability goes to shreds if I start writing hardware specific code. it'll suffice for a proof of concept, but it's not as elegant as i'd hoped. thanks! – Breviary 8/12, 2011 at 14:2

@Fleece ah well yeah about the hazard spotting that's the entire point – Breviary 8/12, 2011 at 14:3

@CLASSIFIED: If you're after portability, then I'd suggest writing T tmp = x; x = y; y = tmp; (or simply std::swap(x,y)). I would hope that the compiler would always do the most optimal thing given the limitations of the architecture. – Allo 8/12, 2011 at 14:6

@OliCharlesworth yeah no. that absolutely defeats the point. you'll still have at least 33% more instructions and dependency gaps which need to be filled. – Breviary 8/12, 2011 at 14:10

@CLASSIFIED: My point is, if it's possible to achieve this trick on a given CPU, then the compiler author probably already knows about it. – Allo 8/12, 2011 at 14:19

@OliCharlesworth IC. but then the compiler would have to recognize T tmp = x; x = y; y = tmp; as a swap, and it's written in the standard that std::swap is to be executed linearly, which takes approx 280% longer to do on an individual basis. so there's not really a way to implement this into the existing language, thus I don't expect it to be there even if the authors did know about it. I absolutely see what you're saying though. – Breviary 8/12, 2011 at 14:33

@CLASSIFIED: Where in the standard does it say that std::swap is to be executed linearly, and why doesn't the "as-if" rule apply? As far as I can tell, after std::swap(a, b);, a and b have exchanged values, and if there are no side effects in evaluating a and b the compiler is free to do as it likes behind the scenes. – Aetna 8/12, 2011 at 15:53

@DavidThornley cplusplus.com/reference/algorithm/swap my particular compiler does not support it. i mean if it did, i wouldn't be asking, would i? – Breviary 8/12, 2011 at 16:36

@CLASSIFIED: The cplusplus.com reference seems to be meant to apply to larger data structures, where copy constructors and assignment operators can take significant time, and where the compiler is unlikely to come up with the appropriate shortcut. It doesn't mean it has to work that way. The Standard specifies that the values will be swapped, and doesn't say how. Exactly how std::swap works is completely up to the implementation, so the question isn't about C++ so much as an implementation of C++. – Aetna 8/12, 2011 at 17:14

@DavidThornley okay, if you say so. that doesn't really help though. – Breviary 8/12, 2011 at 17:21

Ok folks, you all have enough rep so it's time to move this to chat. Thanks. – Lingulate 8/12, 2011 at 18:55

@Lingulate - I lack the ability to do so. i saw the option before, but it's not here anymore. you're a mod, right? can you do it? – Breviary 8/12, 2011 at 19:25

Try this: chat.stackoverflow.com/rooms/5696/… – Lingulate 8/12, 2011 at 19:34

I suggest that you read the first parts of Intel's optimization manual. Then you will realize that a modern, out-of-order, speculative CPU does not even respect your assembly language. Manipulating pipeline to your advantage? Based on this document, I'd say -- forget it.

Disassembly answered 9/12, 2011 at 14:45 Comment(1)

you're right - might not be the most efficient way to get it done at all - however it still respects instructions which translate into individual microops. I'm still gonna try to get it in there. – Breviary 9/12, 2011 at 18:44

This would depend on which CPU you're targeting, and which compiler. You don't specify either.

In general, a CPU will go to great lengths to pretend that everything is in-order-executed, even when in reality it's superscalar behind the scenes. Code that tries to take advantage of hazards doesn't break, but instead it executes more slowly as the CPU will wait for the hazard to clear before continuing. Otherwise, almost all code would fail on future generations of the CPU as superscalar behavior increases.

In general, unless you're on a very specialized architecture and you have complete assembly-level control of execution, you will not be able to go anywhere with this idea.

Earthenware answered 9/12, 2011 at 19:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags