What's the advantage of compiler instruction scheduling compared to dynamic scheduling? [closed]

Asked 21/2, 2014 at 7:44 Answered 21/2, 2014 at 7:53

Solved compiler-construction cpu compiler-optimization cpu-architecture vliw

Nowadays, super-scalar RISC cpus usually support out-of-order execution, with branch prediction and speculative execution. They schedule work dynamically.

What's the advantage of compiler instruction scheduling, compared to an out-of-order CPU's dynamic scheduling? Does compile-time static scheduling matter at all for an out-of-order CPU, or only for simple in-order CPUs?

It seems currently most software instruction scheduling work focuses on VLIW or simple CPUs. The GCC wiki's scheduling page also shows not much interest in updating gcc's scheduling algorithms.

Breastwork answered 21/2, 2014 at 7:44 Comment(1)

This question appears to be off-topic because it is about software and hardware design. Maybe cs would be the right place? – Gabfest 21/2, 2014 at 7:47

Advantage of static (compiler) scheduling:

No time bound, therefore can use very complicated algorithms;
No bound on the instruction window. This allows for example to exchange an instruction with a whole loop of function call.

Advantage of dynamic (processor scheduling):

Take care of the actual environment (cache, arithmetic unit busy due to another hyperthread);
Do not force to recompile the code for each architecture upgrade.

That's all I can think of for now.

Orianna answered 21/2, 2014 at 7:52 Comment(1)

Somewhat related to "Not time bound", compile-time scheduling is done once while dynamic scheduling work (even if cached in something like a trace cache) tends to be repeated. Static scheduling is also connected to other optimizations which can reduce work. Dynamic scheduling also costs power and area; if the cost of energy use in the compiling system is lower than in the executing system, compile-time optimization could be preferred even if the total energy used is greater. Dynamic scheduling also fits well with speculation which is more expensive statically in a RISC. – Pantia 21/2, 2014 at 20:33

First, I should note that current RISC architectures first compile then do rescheduling, cause "high level" assembly commands are compiled into smaller RISC commands. At least it is true for x86/x64 architectures.

Then we can imagine an execution cycle as: compile - optimize/reschedule - descrease scale - compile - optimize/reschedule.

That sort of answers the question, compiler has much wider scope of visibility into the application, so it mainly optimizes on macro-level (blocks of application commands), while processor mainly optimizes for micro-level (blocks of RISC commands).

Effete answered 21/2, 2014 at 7:53 Comment(2)

AFAIK, only x86 needs to decode native ISA instructions into potentially multiple different uops. RISC CPUs design their instruction set so that instructions don't need to be dynamically split up. (IDK if they usually avoid even having microcode at all to handle context-switch instructions or other rare system-management stuff that's allowed to be slow). Anyway, in a PowerPC or something, I think the machine instructions translate directly to what the out-of-order machinery tracks internally. – Hittel 16/5, 2016 at 9:20

Anyway, an OOO CPU's out-of-order window (the ROB size) is maybe within an order of magnitude of 100 instructions or uops, depending on transistor and power budgets. Compilers can and do schedule and generate RISC instructions directly, for RISC ISAs, so there's nothing stopping the compiler from being able to schedule for a specific pipeline. This idea that the compiler can only schedule at a high level is nonsense. Saying that they don't schedule individual instructions doesn't answer the question. It's asking why not. – Hittel 16/5, 2016 at 9:24

Recommended topics

Hot tags