What is a microcoded instruction?
Asked Answered
F

1

14

I have seen a lot of literature referencing microcoded instructions.

What are these and why they are used?

Familiar answered 1/11, 2016 at 18:48 Comment(1)
The one-sentence version is that a microcoded CPU works kind of like an opcode-based interpreter for a high-level language: instructions in the exposed machine language are decomposed into smaller steps that are easier to execute. Wikipedia has a decent, if jargon-ful, longer explanation, but a real explanation requires an entire computer architecture textbook.Scantling
D
19

A CPU reads machine code and decodes it into internal control signals that send the right data to the right execution units.

Most instructions map to one internal operation, and can be decoded directly. (e.g. on x86, add eax, edx just sends eax and edx to the integer ALU for an ADD operation, and puts the result in eax.)

Some other single instructions do much more work. e.g. x86's rep movs implements memcpy(edi, esi, ecx), and requires the CPU to loop.

When the instruction decoders see an instruction like that, instead of just producing internal control signals directly they read micro-code out of the microcode ROM.

A micro-coded instruction is one that decodes to many internal operations


Modern x86 CPUs always decode x86 instructions to internal micro-operations. In this terminology, it still doesn't count as "micro-coded" even when add [mem], eax decodes to a load from [mem], an ALU ADD operation, and a store back into [mem]. Another example is xchg eax, edx, which decodes to 3 uops on Intel Haswell. And interestingly, not exactly the same kind of uops you'd get from using 3 MOV instructions to do the exchange with a scratch register, because they aren't zero-latency.

On Intel / AMD CPUs, "micro-coded" means the decoders turn on the micro-code sequencer to feed uops from the ROM into the pipeline, instead of producing multiple uops directly.

(You could call any multi-uop x86 instruction "microcoded" if you were thinking in pure RISC terms, but it's useful to use the term "microcoded" to make a different distinction, IMO. This meaning is I think widespread in x86 optimization circles, like Intel's optimization manual. Other people may use different meanings for terminology, especially if talking about other architectures or about computer architecture in general when comparing x86 to a RISC.)

In current Intel CPUs, the limit on what the decoders can produce directly, without going to micro-code ROM, is 4 uops (fused-domain). AMD similarly has FastPath (aka DirectPath) single or double instructions (1 or 2 "macro-ops", AMD's equivalent of uops), and beyond that it's VectorPath aka Microcode, as explained in David Kanter's in-depth look at AMD Bulldozer, specifically talking about its decoders.

Another example is x86's integer DIV instruction, which is micro-coded even on modern Intel CPUs like Haswell. But not AMD; AMD just has one or 2 uops activate everything inside the integer divider unit. It's not fundamental to DIV, just an implementation choice. See my answer on C++ code for testing the Collatz conjecture faster than hand-written assembly - why? for the numbers.

FP division is also slow, but is decoded to a single uop so it doesn't bottleneck the front-end. If FP division is rare and not part of a latency bottleneck, it can be as cheap as multiplication. (But if execution does have to wait for its result, or bottlenecks on its throughput, it's much slower.) More in this answer.

Integer division and other micro-coded instructions can give the CPU a hard time, and creates effects that make code alignment matter where it wouldn't otherwise.


To learn more about x86 CPU internals, see the tag wiki, and especially Agner Fog's microarch guide.

Also David Kanter's deep dives into x86 microarchitectures are useful to understand the pipeline that uops go through: Core 2 and Sandy Bridge being major ones, also AMD K8 and Bulldozer articles are interesting for comparison.

RISC vs. CISC Still Matters (Feb 2000) by Paul DeMone looks at how PPro breaks down instructions into uops, vs. RISCs where most instructions are already simple to just go through the pipeline in one step, with only rare ones like ARM push/pop multiple registers needing to send multiple things down the pipeline (aka microcoded in RISC terms).

And for good measure, Modern Microprocessors A 90-Minute Guide! is always worth recommending for the basics of pipelining and OoO exec.


Other uses of the term in very different contexts than modern x86

In some older / simpler CPUs, every instruction was effectively micro-coded. For example, the 6502 executed 6502 instructions by running a sequence of internal instructions from a PLA decode ROM. This works well for a non-pipelined CPU, where the order of using the different parts of the CPU can vary from instruction to instruction.


Historically, there was a different technical meaning for "microcode", meaning something like the internal control signals decoded from the instruction word. Especially in a CPU like MIPS where the instruction word mapped directly to those control signals, without complicated decoding. (I may have this partly wrong; I read something like this (other than in the deleted answer on this question) but couldn't find it again later.)

This meaning may still actually get used in some circles, like when designing a simple pipelined CPU, like a hobby MIPS.

Directly answered 2/11, 2016 at 0:36 Comment(6)
Though you went into more detail than I thought was necessary for the OP, I find it interesting that I said the same thing and went from +2 votes to -1 while you are a +1 saying the same thing. However, you may be confounding the OP by using assembly language instructions, such as xchg and div. This is the source of his confusion which I was trying to alleviate in simpler terms for someone who is just learning the basics.Alti
@Rob: your answer claims that some bits of the instructions being decoded is the microcode. It doesn't say anything about some instructions triggering a stream of internal instructions from a micro-code ROM, vs. others that directly affect control signals (although in modern x86 CPUs, even a single-uop instruction still goes through the massively complex out-of-order machinery). Some designs for some ISAs (like MIPS I think) don't use any micro-code at all, and instruction bits can be decoded directly to control signals.Directly
Because I didn't want to get further into the pipe than I thought he would understand. I even mentioned that it would get more complicated than that.Alti
@Rob: I get that trying to simplify is a good idea, I just think you didn't succeed and unfortunately ended up saying something that isn't correct.Directly
@PeterCordes to clarify something, if I were to do an add on memory, it actually reads, adds, and stores, just like you would do in ARM. Is this because ARM is a RISC machine?Familiar
@MarkYisri: Are you asking why ARM doesn't have a memory-destination ADD instruction? Yes, that's because it's a load/store architecture, where normal instructions can't have memory operands. This is highly related to or part of being a RISC architecture. Just to be clear, x86 decodes this to 3 or 4 internal uops, but it's still not "microcoded". The decoders have that uop pattern built-in, and don't have to redirect the CPU to the microcode ROM.Directly

© 2022 - 2024 — McMap. All rights reserved.