Interrupting instruction in the middle of execution

The CPU has the option of deciding to do either one, i.e. deciding when the interrupt was handled relative to the original instruction stream.

Instructions that have been issued, but not yet dispatched to an execution unit, are cancelled in current implementations from AMD and Intel. When an interrupt occurs, what happens to instructions in the pipeline?

With out-of-order execution, typically dozens of instructions are in flight, and more than one can literally be in the middle of executing in an ALU at once.

But it's an interesting question whether or not low-latency instructions like add or imul that have started executing but not yet retired will be allowed to complete and update the architectural state that the interrupt handler sees or not.

If not, it's probably because of the difficulty of building the logic for detecting how many more contiguous instructions will be ready to retire "soon", beyond the current retirement state. Interrupts are rare (one per thousands of instructions at worst, or one per millions of instructions with low I/O load), so the benefit of squeezing a bit more throughput of surrounding code around interrupt handling is low. And any potential cost in interrupt latency would be a downside.

Some instructions, especially micro-coded ones, have mechanisms for being interrupted without having to restart from scratch. For example

rep movsb can leave RSI, RDI, and RCX updated to part-way through a copy (so it will finish the copy on restart). The other REP-string instructions can similarly be interrupted. Only a single count of the operation is atomic with respect to interrupts.

Even when single-stepping in a debugger (by setting TF), the CPU breaks after each count, so from an interrupt PoV it really is repeating a separate movsb instruction RCX times.
AVX2 gathers like vpgatherdd have an input mask vector that shows which elements to gather vs. ignore. It clears mask elements after successfully gathering the corresponding index. On an exception (e.g. page fault), the faulting element is the right-most element with its mask still set (gather order is not guaranteed, but fault order is, see Intel's manual entry).

This makes it possible for a gather to succeed without needing all the relevant pages to be mapped at the same time. Evicting an already-gathered element while paging in another can't lead to an infinite loop, even in a memory-pressure corner case. Forward progress is guaranteed.

On an async interrupt, the hardware could similarly leave the gather partially done, using the mask to record progress. IDK if any hardware actually does that, but the ISA design leaves that option open.

Anyway, this is why you need to keep creating a fresh all-ones mask inside the loop for every gather.

AVX512 gathers and scatters have the same mechanism but with the a mask register instead of a vector register. http://felixcloutier.com/x86/VPSCATTERDD:VPSCATTERDQ:VPSCATTERQD:VPSCATTERQQ.html

Very slow instructions without a mechanism for being interrupted and restarting include wbinvd. (Sync all caches to main memory and invalidate them). Intel's manual mentions that wbinvd does delay interrupts.

As a consequence, the use of the WBINVD instruction can have an impact on logical processor interrupt/event response time.

This is probably why it's a privileged instruction. There's lots of stuff that user-space can do to make the system slow (e.g. use up lots of memory bandwidth), but it can't increase interrupt latency too dramatically. (Stores that have retired from the ROB but not yet committed to L1d can increase interrupt latency because they have to happen and can't be aborted. But creating a pathological case of lots of scattered cache-miss stores in flight is harder, and the store buffer size is small.)

Interrupting an assembly instruction while it is operating is very similar to this answer, and mentions that this applies to most (all?) ISAs.
Do x86 instructions require their own encoding as well as all of their arguments to be present in memory at the same time?
Taking a semaphore must be atomic. Is Pintos's sema_down safe? - single-core atomicity wrt. interrupts (and thus context switches) for uniprocessor systems.

Recommended topics

Hot tags