When an interrupt occurs, what happens to instructions in the pipeline?
Asked Answered
A

2

32

Assume a 5-stage pipeline architecture (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back). There are 4 instructions that has to be executed.

(These sample instruction are not accurate, but I believe the point would be understood)

In the fifth clock cycle, these instruction will be in pipeline as shown below.

Add a, b, c      [IF ID EX MEM WB]
Add a, b, d      [IF ID EX MEM]
Add a, b, e      [IF ID EX]
Add a, b, f      [IF ID]

Now if a hardware interrupt occurs, what happens to these instructions. Will the interrupt be handled only after all the instructions in the pipeline is executed? Will the software interrupts and exceptions be handled in a different way??

Alger answered 17/1, 2012 at 21:44 Comment(2)
The pipelines get flushed in much the same way as they would for e.g. a mispredicted branch - exact details depend on what CPU you are talking about.Esbenshade
I think it is a pity that the question has been voted -1. It's actually a fairly fundamental question in computer (micro)architecture, one that is often misunderstood - as is shown by the first answer being confused.Memnon
M
34

First, terminology:

Usually, at Intel at least, an interrupt is something that comes from the outside world. Usually it is not synchronized with instructions executing on the processor, i.e. it is an asynchronous external interrupt.

In Intel terminology an exception is something caused by instructions executing on the processor. E.g. a page fault, or an undefined instruction trap.

---+ Interrupts flush all instructions in flight

On every machine that I am familiar with - e.g. all Intel processors since the P5 (I worked on the P6), AMD x86s, ARM, MIPS - when the interrupt signal is received the instructions in the pipeline are nearly always flushed, thrown away.

The only reason I say "nearly always" is that on some of these machines you are not always at a place where you are allowed to receive an interrupt. So, you proceed to the next place where an interrupt is allowed - any instruction boundary, typically - and THEN throw away all of the instructions in the pipeline.

For that matter, interrupts may be blocked. So you proceed until interrupts are unblocked, and THEN you throw them away.

Now, these machines aren't exactly simple 5 stage pipelines. Nevertheless, this observation - that most machines throw away all instructions in the pipeline, in pipestages before the pipestage where the interrupt logic lives - remains almost universally true.

In simple machines the interrupt logic typically lives in the last stage of the pipeline, WB, corresponding roughly to the commit pipestage of advanced machines. Sometimes it is moved up to a pipestage just before, e.g. MEM in your example. So, on such machines, all instructions in IF ID EX, and usually MEM, are thrown away.

---++ Why I care: Avoiding Wasted Work

This topic is near and dear to my heart because I have proposed NOT doing this. E.g. in customer visits while we were planning to build the P6, I asked customers which they preferred - lower latency interrupts, flushing instructions that are in flight, or (slightly) higher throughput, allowing at least some of the instructions in flight to complete, at the cost of slightly longer latency.

However, although some customers preferred the latter, we chose to do the traditional thing, flushing immediately. Apart from the lower latency, the main reason is complexity:

E.g. if you take an interrupt, but if one of the instructions already in flight also takes an exception, after you have resteered IF (instruction fetch) but before any instruction in the interrupt has committed, which takes priority? A: it depends. And that sort of thing is a pain to deal with.

---+++ Folklore: Mainframe OS Interrupt Batching

This is rather like the way that some IBM mainframe OSes are reported to have operated:

  • with all interrupts blocked in normal operation except for the timer interrupt;
  • in the timer interrupt, you unblock interrupts, and handle them all;
  • and then return to normal operation with interrupts blocked mode

Conceivably they might only use such an "interrupt batching" mode when heavily loaded; if lightly loaded, they might not block interrupts.

---+++ Deferred Machine Check Exceptions

The idea of deferring interrupts to give instructions already in the pipeline a chance to execute is also similar to what I call the Deferred Machine Check Exception - a concept that I included in the original Intel P6 family Machine Check Architecture, circa 1991-1996, but which appears not to have been released.

Here's the rub: machine check errors like (un)correctable ECC errors can occur AFTER an instruction has retired (i.e. after supposedly younger instructions have committed state, e.g. written registers), or BEFORE the instruction has retired.

The classic example of AFTER errors is an uncorrectable ECC triggered by a store that is placed into a write buffer at graduation. Pretty much all modern machines do this, all machines with TSO, which pretty much means that there is always the possibility of an imprecise machine check error that could have been precise if you cared enough not to buffer stores.

The classic example of BEFORE errors is ... well, every instruction, on any machine with a pipeline. But more interestingly, errors on wrong-path instructions, in the shadow of a branch misprediction.

When a load instruction gets an uncorrectable ECC error, you have two choices:

(1) you could pull the chain immediately, killing not just instructions YOUNGER than the load instruction but also any OLDER instructions

(2) or you could write some sort of status code into the logic that controls speculation, and take the exception at retirement. This is pretty much what you have to do for a page fault, and it makes such errors precise, helping debugging.

(3) But what if the load instruction that got the uncorrectable ECC error was a wrong path instruction, and never retires because an older inflight branch mispredicted and went another way?

Well, you could write the status to try to make it precise. You should have counters of precise errors and imprecise errors. You could otherwise ignore an error on such a wrong-path instruction - after all, if it is a hard error, it wil either be touched again, or it might not be./ E.g. it is possible that the error would be architecturally silent - e.g. a bad cache line might be overwritten by a good cache line for the same address .

And, if you really wanted, you could set a bit so that if an older branch mispredicts, then you take the machine check exception at that point in time.

Such an error would not occur at a program counter associated with the instruction that caused the error, but might still have otherwise precise state.

I call (2) deferring a machine check exception; (3) is just how you might handle the deferral.

IIRC, all Intel P6 machine check exceptions were imprecise.

---++ On the gripping hand: even faster

So, we have discussed

0) taking the interrupt immediately, or, if interrupts are blocked, executing instructions and microinstructions until an interrupt unblocked point is reached. And then flushing all instructions in flight.

1) trying to execute instructions in the pipeline, so as to avoid wasted work.

But there is a third possibility:

-1) if you have microarchitecture state checkpoints, take the interrupt immediately, never waiting to an interrupt unblocked point. Which you can only do if you have a checkpoint of all relevant state at the most recent "safe to take an interrupt" point.

This is even faster than 0), which is why I labelled it -1). But it requires checkpoints, which many but not all aggressive CPUs use - e.g. Intel P6 dod not use checkpoints. And such post-retirement checkpoints get funky in the presence of shared memory - after all, you can do memory operations like loads and stores while interrupts are blocked. And you can even communicate between CPUs. Even hardware transactional memory usually doesn't do that.

---+ Exceptions mark the instructions affected

Conversely, exceptions, things like page faults, mark the instruction affected.

When that instruction is about to commit, at that point all later instructions after the exception are flushed, and instruction fetch is redirected.

Conceivably, instruction fetch could be resteered earlier, the way branch mispredictions are already handled on most processors, at the point at which we know that the exception is going to occur. I don't know anyone who does this. On current workloads, exceptions are not that important.

---+ "Software Interrupts"

"Software interrupts" are a misnamed instruction usually associated with system calls.

Conceivably, such an instruction could be handled without interrupting the pipeline, predicted like a branch.

However, all of the machines I am familiar with serialize in some way. In my parlance, they do not rename the privilege level.

---+ "Precise Interrupts", EMON, PEBS

Another poster mentioned precise interrupts.

This is a historical term. On most modern machines interrupts are defined to be precise. Older machines with imprecise interrupts have not been very successful in the market place.

However, there is an alternate meaning, I was involved in introducing: when I got Intel to add the capability to produce an interrupt on performance counter overflow, first using external hardware, and then inside the CPU, it was, in the first few generations, completely imprecise.

E.g. you might set the counter to count the number of instructions retired. The retirement logic (RL)would see the instructions retire, and signal the performance event monitoring circuitry (EMON). It might take two or three clock cycles to send this signal from RL to EMON. EMON would increment the counter, and then see that there was an overflow. The overflow would trigger an interrupt request to the APIC (Advanced Programmable Interrupt Controller). The APIC might take a few cycles to figure out what was happening, and then signal the retirement logic.

I.e. the EMON interrupt would be signalled imprecisely. Not at the time of the event, but some time thereafter.

Why this imprecision? Well, in 1992-6, performance measurement hardware was not a high priority. We were leveraging existing interrupt hardware. Beggars can't be choosers.

But furthermore, some performance are intrinsically imprecise. E.g. when do you signal an interrupt for a cache miss on a speculative instruction that never retires? (I have a scheme I called Deferred EMON events, but this is still considered too expensive.) For that matter, what about cache misses on store instructions, where the store is placed into a store buffer, and the instruction has already retired?

I.e. sometimes performance events occur after the instruction they are associated with has committed (retired). Sometimes before. And often not exactly at the instruction they are associated with.

But in all of the implementations so far, as far as I know, these performance events are treated like interrupts: existing instructions in the pipe are flushed.

Now, you can make a performance event precise by treating it like a trap. E.g. if it is an event like instructions retired, you can have the retirement logic trap immediately, instead of taking that circuitous loop I described above. If it occurs earlier in the pipeline, you can have the fact that it occurred marked in the instruction fault status in the ROB (Re-Order Buffer). Something like this is what Intel has done with PEBS (Precise Event Based Sampling). http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf.

However, note that not all events can be sampled using PEBS. For example, PEBS in the example above can count loads that took a cache hit or miss, but not stores (since stores occur later).

So this is like exceptions: the event is delivered only when the instruction retires. Because in a sense the event has not completely occurred - it is a load instruction, that takes a cache miss, and then retires. And instructions after the marked PEBS instruction are flushed from the pipeline.

I hope ---+ Late Addition About Early Computers

Memnon answered 28/4, 2012 at 20:36 Comment(23)
How hard would it have been to have asynchronous interrupts specify that instructions should stop entering the pipeline, but those in the pipeline should run to completion? One might need to have two IRQ lines (one of which would request a pipeline flush) but conceptually it seems like it should be straightforward.Thorax
Nothing is hard to build. Verifying, to make sure that you haven't broken something, some implicit assumption, is what takes time. Because the cost of verification is high, and the cost of getting something wrong can be very high (recalls, possibly lawsuits), companies (not just hardware companies, but all companies), tend to be pretty conservative. Don't innovate, unless the need is very clearly demonstrated. IMHO too conservative, but I understand the risk aversion. // Did I mention that rarely occurring bugs in something like interrupts are very much disliked?Memnon
That makes a lot of sense. I guess the only processors whose pipelining behavior I've examined in any detail are the ones which are simple enough that I can understand them, and could imagine how they could be verified. On many more sophisticated processors, all the pipeline interactions seem sufficiently complex I'm amazed they work at all, and I can see that adding another variable to the mix might complicate things. Still, I've done programming on one chip which had both delayed and non-delayed branch instructions, and I liked the delayed branches. Maybe they added some...Thorax
...complexity, but in a tight loop cutting two cycles from the cost of a branch can be a pretty major performance boost. I'd guess that delayed branches/calls and delayed interrupts would probably share a lot of implementation details.Thorax
Well, yes and no. Delayed branches (usually, on all of the public machines that I am aware of having implemented them) have a deterministic delay. Basically, you create the moral equivalent of a shift register of PCs, and insert the delayed branch at the end of the shift register, with the intervening ones sequential. (Actually, delayed branch architects get quite picky about exactly what they do - some might take exception to what I said about a shift register - but from 10,000 feet, that's what they are doing.)Memnon
Delayed interrupts don't go into the single PC shift register, or else they would get wiped out by a branch mispredict. So, instead, you create a siding, a place where the pending delayed interrupt sits, waiting for a good time to interrupt. Perhaps you insert a placeholder in the normal PC queue, and wait for that to come down the pipe - and if you are really aggressive, you start fetching and executing the code at the interrupt point, just late in the pipe. But if the placeholder gets wiped out, you take the siding. Perhaps you have a timer, so the interruopt cannot be delayed too long.Memnon
So, from my point of view, delayed interrupts are really much more like a thread switch than delayed branches. Or perhaps not even a thread switch: possibly a thread spawn, but you might delay really banging on the interrupt thread until the interrupted thread winds to a a slow down.Memnon
But... if you are really aggressive, you don't even necessarily need to stop the interrupted thread. // That's easy for external interrupts. But for internal interruptions, e.g. exceptions like page faults... well, you might stop it in an architectural sense. Stop retiring graduating instructions. But so long as you stop the page faulting instructions from returning, you can actually have another thread go off and satisfy the page fault, while the original page fault continues doing speculative work after, not dependent on, the page fault.Memnon
That last idea - speculating past page faults, I/O, etc. - arose because I challenging prefetch evangelists to show me an example where a prefetcher could do anything an out-of-order machine could not do. The best response was from some guys at IBM Zurich who said that database prefetchers prefetched disk blocks, I/O and page faults. So of course I figured out how to speculate past those. And evaluated the performance thereof in a class project.Memnon
So, I think delayed interrupts, or even speculating past interrupts, are different enough from delayed branches as to warrant a different, better, mechanism. How can I say this delicaely: although I now work for a company with delayed branches, I have said publicly that I think that delayed branches are a net bad idea (semipublic.comp-arch.net/wiki/…). Actually, more like they might have been good at one point in time, but at other points in time they are bad.Memnon
Some interesting stuff on that page. Care to chat?Thorax
Email. Working. Deliverable soon. Chat at other times, less busy.Memnon
Click the "move to chat" when you're interested. Chat doesn't have to be real-time--you can reply at leisure.Thorax
Yes. But it is another queue to monitor. If there were a select for all of the interesting information streams... email, newsgroups, stackoverflow, ...Memnon
Including an at-sign and a person's name in a chat thingie will make it appear in the "Inbox" at the top of every StackExchange page.Thorax
Sure. But I will notice my StackExchange Inbox only when I go to Stack Exchange. which as you can see, is only once every few days, with occasional lapses of months. If you are happy with a conversation that occurs at that rate, sure. No guarantee I will ever get back to you and stackexchange, however. If you want a conversation that occurs at a slightly faster rate, use email, or USEnet newsgroup comp.arch. A few IRC channels, but those not much. // The Internet is full of these semi-private messaging systems. Not quite private gardens, but almost. I am sick of them.Memnon
If stackexchange was gatewayed to email, maybe. But then I hate spam. If RSS had lived up to its promise, and not supplanted all of the better syndication tools that reduce information overload. // But, anyway, happy to chat. It's fairly easy to find me on email, if you know how to google. On USEnet newsgroup comp.arch is for exactly this sort of thing. // By the way, I love stackexchange. I often get my questions answered here, and I try to pay it forward by answering other people's questions. But it is not a conversation place for me.Memnon
Paul Clayton is trying to get a comp.arch stackexchange started. That may be relevant. // But, overall, I want an info conversation muc/demux. Something that lets me post to stackexchange, my comp-arch.net wikior blog, USEnet newsgroup comp.arch, etc. Post once. Crosspost and/or link as appropriate. Heck, given that I might even start using twitter again.Memnon
if instruction A, B out-of-order execution in B -> A order, and when B was retired, an interrupt arrive, other cpu will see order B -> A or A -> B ?Loveliesbleeding
what about a reordered StoreLoad was interrupted by interruption ? the reorder will be revert to in-programm order ?Loveliesbleeding
@Chinaxing: I am having a bit of trouble parsing The question in your comment, but I think it boils down to “if instructions are executed out of order and an interrupt occurs…”. on processors with precise interrupts, it is as if the out of order execution did not occur, except for performance artifacts or MMAO side effects. I’m processors with imprecise interrupts, issues such as you mention can occur.Memnon
@KrazyGlew any intel or ARM's paper detail this ?Loveliesbleeding
@chinaxing: I am not aware of any Intel paper about this interrupt tradeoff. IIRC Academic: Pleszkun?Memnon
V
1

For precise interrupts, instructions in flight before the IF stage jumps to the ISR retire normally. When the ISR returns, execution resumes starting with the next instruction after the last retired instruction of the original process. In other words, precise interrupts always happen in between instructions.

Processing for synchronous interrupts is a bit different. Taking x86 as an example, synchronous exceptions come in three flavors, traps, faults and aborts.

A trap, like INT3, causes the core to push the instruction after the trap on the stack, such that when the ISR returns, the core does not pointlessly reexecute the same trapping instruction.

A fault, like a page fault, causes the core the push the faulting instruction on the stack, such that when the ISR returns, the core will reexecute the faulting instruction, presumably now in circumstances that avoid the same fault again.

An abort, like a double fault, is a fatal unrecoverable problem in which the processor cannot resume execution where it left off.

The content of interrupt stack frame pushed by core the before entering the ISR differs depending on which case you're talking about.

Voltmeter answered 18/1, 2012 at 17:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.