Swapping Variables (C++, processor level)
Asked Answered
B

2

9

I would like to swap two variables. and i would like to do it through the pipeline using a Read After Write hazard to my advantage.

Pipeline:

OPERXXXXXX FetchXXXXX DecodeXXXX ExecuteXXX WriteBkXXX
STORE X, Y ---------- ---------- ---------- ----------
STORE Y, X STORE X, Y ---------- ---------- ----------
---------- STORE Y, X STORE X, Y ---------- ----------
---------- ---------- STORE Y, X STORE X, Y ----------
---------- ---------- ---------- STORE Y, X STORE X, Y
---------- ---------- ---------- ---------- STORE Y, X

how do i go about telling the compiler to do that (and exactly that) without automatic locks and warning flags? can you suggest any literature/keywords?

specs:

  • -> target: modern architectures which support multistation (more than 4) pipelining

  • -> this is not related to any particular 'problem'. just for the sake of science.

current hurdles:

  • if you know how to ignore datahazards, please share.
Breviary answered 8/12, 2011 at 12:53 Comment(24)
Why is there no tag for hazards? Because nobody's made one yet...Muss
Pipeline hazards are dealt with by the CPU, not the compiler...Allo
The C++ abstract machine does not have an instruction pipeline. What is the actual problem that you are trying to solve?Fleece
@OliCharlesworth, MIPS meant "Microprocessor without Interlocked Pipeline Stages" in order to indicate that hazards had to be handled by the programmer or the compiler, but that approach exposes too much of the micro-architecture to be viable. Different models have different pipelines, and without going to the 40+ stages of P4, most have more than 10 stages.Pastorship
@OliCharlesworth - then how do you talk to the cpu?/tell the cpu to not be stupid?Breviary
@Fleece not any particular problem - I just want to try to implement this particular approach for the sake of science.Breviary
@Breviary - In that case you would be better off writing the whole thing in the assembly language for the architecture that you are interested in. C++ provides almost no access to the underlying hardware (although compilers often extend the language to give you slightly more control).Fleece
@Mankarse: Even if you write this in assembler, the CPU will spot the RAW hazard.Allo
@OliCharlesworth: That depends on the CPU (which is not specified in the question).Fleece
+1 for an interesting question for which I have no idea of the answer.Devoice
@Mankarse: Fair point. Ok, I'll rephrase as "most modern desktop/server CPUs will spot the RAW hazard".Allo
@Fleece that's the answer I was afraid of - most non primitive architectures have the capability to do just that if I understand this correctly - however the portability goes to shreds if I start writing hardware specific code. it'll suffice for a proof of concept, but it's not as elegant as i'd hoped. thanks!Breviary
@Fleece ah well yeah about the hazard spotting that's the entire pointBreviary
@CLASSIFIED: If you're after portability, then I'd suggest writing T tmp = x; x = y; y = tmp; (or simply std::swap(x,y)). I would hope that the compiler would always do the most optimal thing given the limitations of the architecture.Allo
@OliCharlesworth yeah no. that absolutely defeats the point. you'll still have at least 33% more instructions and dependency gaps which need to be filled.Breviary
@CLASSIFIED: My point is, if it's possible to achieve this trick on a given CPU, then the compiler author probably already knows about it.Allo
@OliCharlesworth IC. but then the compiler would have to recognize T tmp = x; x = y; y = tmp; as a swap, and it's written in the standard that std::swap is to be executed linearly, which takes approx 280% longer to do on an individual basis. so there's not really a way to implement this into the existing language, thus I don't expect it to be there even if the authors did know about it. I absolutely see what you're saying though.Breviary
@CLASSIFIED: Where in the standard does it say that std::swap is to be executed linearly, and why doesn't the "as-if" rule apply? As far as I can tell, after std::swap(a, b);, a and b have exchanged values, and if there are no side effects in evaluating a and b the compiler is free to do as it likes behind the scenes.Aetna
@DavidThornley cplusplus.com/reference/algorithm/swap my particular compiler does not support it. i mean if it did, i wouldn't be asking, would i?Breviary
@CLASSIFIED: The cplusplus.com reference seems to be meant to apply to larger data structures, where copy constructors and assignment operators can take significant time, and where the compiler is unlikely to come up with the appropriate shortcut. It doesn't mean it has to work that way. The Standard specifies that the values will be swapped, and doesn't say how. Exactly how std::swap works is completely up to the implementation, so the question isn't about C++ so much as an implementation of C++.Aetna
@DavidThornley okay, if you say so. that doesn't really help though.Breviary
Ok folks, you all have enough rep so it's time to move this to chat. Thanks.Lingulate
@Lingulate - I lack the ability to do so. i saw the option before, but it's not here anymore. you're a mod, right? can you do it?Breviary
Try this: chat.stackoverflow.com/rooms/5696/…Lingulate
D
3

I suggest that you read the first parts of Intel's optimization manual. Then you will realize that a modern, out-of-order, speculative CPU does not even respect your assembly language. Manipulating pipeline to your advantage? Based on this document, I'd say -- forget it.

Disassembly answered 9/12, 2011 at 14:45 Comment(1)
you're right - might not be the most efficient way to get it done at all - however it still respects instructions which translate into individual microops. I'm still gonna try to get it in there.Breviary
E
0

This would depend on which CPU you're targeting, and which compiler. You don't specify either.

In general, a CPU will go to great lengths to pretend that everything is in-order-executed, even when in reality it's superscalar behind the scenes. Code that tries to take advantage of hazards doesn't break, but instead it executes more slowly as the CPU will wait for the hazard to clear before continuing. Otherwise, almost all code would fail on future generations of the CPU as superscalar behavior increases.

In general, unless you're on a very specialized architecture and you have complete assembly-level control of execution, you will not be able to go anywhere with this idea.

Earthenware answered 9/12, 2011 at 19:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.