The CPU has the option of deciding to do either one, i.e. deciding when the interrupt was handled relative to the original instruction stream.
Instructions that have been issued, but not yet dispatched to an execution unit, are cancelled in current implementations from AMD and Intel. When an interrupt occurs, what happens to instructions in the pipeline?
With out-of-order execution, typically dozens of instructions are in flight, and more than one can literally be in the middle of executing in an ALU at once.
But it's an interesting question whether or not low-latency instructions like add
or imul
that have started executing but not yet retired will be allowed to complete and update the architectural state that the interrupt handler sees or not.
If not, it's probably because of the difficulty of building the logic for detecting how many more contiguous instructions will be ready to retire "soon", beyond the current retirement state. Interrupts are rare (one per thousands of instructions at worst, or one per millions of instructions with low I/O load), so the benefit of squeezing a bit more throughput of surrounding code around interrupt handling is low. And any potential cost in interrupt latency would be a downside.
Some instructions, especially micro-coded ones, have mechanisms for being interrupted without having to restart from scratch. For example
rep movsb
can leave RSI, RDI, and RCX updated to part-way through a copy (so it will finish the copy on restart). The other REP-string instructions can similarly be interrupted. Only a single count of the operation is atomic with respect to interrupts.
Even when single-stepping in a debugger (by setting TF), the CPU breaks after each count, so from an interrupt PoV it really is repeating a separate movsb
instruction RCX times.
AVX2 gathers like vpgatherdd
have an input mask vector that shows which elements to gather vs. ignore. It clears mask elements after successfully gathering the corresponding index. On an exception (e.g. page fault), the faulting element is the right-most element with its mask still set (gather order is not guaranteed, but fault order is, see Intel's manual entry).
This makes it possible for a gather to succeed without needing all the relevant pages to be mapped at the same time. Evicting an already-gathered element while paging in another can't lead to an infinite loop, even in a memory-pressure corner case. Forward progress is guaranteed.
On an async interrupt, the hardware could similarly leave the gather partially done, using the mask to record progress. IDK if any hardware actually does that, but the ISA design leaves that option open.
Anyway, this is why you need to keep creating a fresh all-ones mask inside the loop for every gather.
AVX512 gathers and scatters have the same mechanism but with the a mask register instead of a vector register. http://felixcloutier.com/x86/VPSCATTERDD:VPSCATTERDQ:VPSCATTERQD:VPSCATTERQQ.html
Very slow instructions without a mechanism for being interrupted and restarting include wbinvd
. (Sync all caches to main memory and invalidate them). Intel's manual mentions that wbinvd
does delay interrupts.
As a consequence, the use of the WBINVD instruction can have an impact on logical processor interrupt/event response time.
This is probably why it's a privileged instruction. There's lots of stuff that user-space can do to make the system slow (e.g. use up lots of memory bandwidth), but it can't increase interrupt latency too dramatically. (Stores that have retired from the ROB but not yet committed to L1d can increase interrupt latency because they have to happen and can't be aborted. But creating a pathological case of lots of scattered cache-miss stores in flight is harder, and the store buffer size is small.)
Related: