Interrupting an assembly instruction while it is operating
Asked Answered
C

2

8

When an interrupt comes to CPU, it is handled by saving current address location prior jumping into the handler if it is acknowledged. Otherwise it is ignored.

I wonder whether an assembly instruction call is interrupted.

For example,

mvi a, 03h ; put 3 value into acc. in 8080 assembly

Can be the one line instruction interrupted? Or if not, it is atomic??

Is there always a guarantee that "one line assembly instruction" is always atomic??

What if there is no "lock" keyword i.e. in 8080 assembly, then how is the atomicity provided?

For example, what if 64 bit sum is wanted to be operated, but there is no way to do it with "one line instruction" and an interrupt comes while operating on sum. How can it be prevented at assembly level??

The concept is being started to boil down for me.

Crewel answered 15/3, 2019 at 10:24 Comment(2)
The chip designer ensured it is atomic, it has to be. An interrupt handler must never corrupt the processor state so that a multi-instruction operation misbehaves. Not that hard to do on 8080 by simply saving and restoring the registers. The interrupt logic itself already preserves the IP register, RET restores it. Almost every interrupt handler starts with PUSH PSW to preserve the flags and accumulator registers.Wellrounded
I doubt that this is done for the 8080. However, theoretically it is possible that an already running instruction is interrupted by an interrupt. I have been working on different RISC processors (for FPGAs). In one design instructions can even be interrupted in a way that the register being written to has an inconsistent value if this happens. In that design the return address would be the address of the instruction that had been interrupted so the complete instruction would be repeated in this case. So at least there exist designs that allow interrupting instructions.No
D
8

Update: a few ISAs, notably m68k do save partial progress for complex instructions, and restore that microarchitectural state afterwards, so a memory-indirect load for example might not be atomic wrt. interrupts on the same core. But I think that's rare and the rest of this answer is true for ISAs that are still in widespread use: I think they all work like x86 in this respect, which I use as the primary example. (Most other ISAs don't have long-running instructions like x86 rep movsb.)


Yes all "normal" ISAs including 8080 and x86 guarantee that instructions are atomic with respect to interrupts on the same core. Either an instruction has fully executed and all its architectural effects are visible (in the interrupt handler), or none of them are. Any deviations from this rule are generally carefully documented.


For example, Intel's x86 manual vol.3 (~1000 page PDF) does make a point of specifically saying this:

6.6 PROGRAM OR TASK RESTART
To allow the restarting of program or task following the handling of an exception or an interrupt, all exceptions (except aborts) are guaranteed to report exceptions on an instruction boundary. All interrupts are guaranteed to be taken on an instruction boundary.

An old paragraph in Intel's vol.1 manual talks about single-core systems using cmpxchg without a lock prefix to read-modify-write atomically (with respect to other software, not hardware DMA access).

The CMPXCHG instruction is commonly used for testing and modifying semaphores. It checks to see if a semaphore is free. If the semaphore is free, it is marked allocated; otherwise it gets the ID of the current owner. This is all done in one uninterruptible operation [because it's a single instruction]. In a single-processor system, the CMPXCHG instruction eliminates the need to switch to protection level 0 (to disable interrupts) before executing multiple instructions to test and modify a semaphore.

For multiple processor systems, CMPXCHG can be combined with the LOCK prefix to perform the compare and exchange operation atomically. (See “Locked Atomic Operations” in Chapter 8, “Multiple-Processor Management,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on atomic operations.)

(For more about the lock prefix and how it's implemented vs. non-locked add [mem], 1, see Is incrementing an int effectively atomic in specific cases?)

As Intel points out in that first paragraph, one way to achieve multi-instruction atomicity is to disable interrupts, then re-enable when you're done. This is better than using a mutex to protect a larger integer, especially if you're talking about data shared between the main program and an interrupt handler. If an interrupt happens while the main program holds the lock, it can't wait for the lock to be release; that would never happen.

Disabling interrupts is usually pretty cheap on simple in-order pipelines, or especially microcontrollers. (Sometimes you need to save the previous interrupt state, instead of unconditionally enabling interrupts. E.g. a function that might be called with interrupts already disabled.)

Anyway, disabling interrupts is how you could atomically do something with a 64-bit integer on 8080.


A few long-running instructions are interruptible, according to rules documented for that instruction.

e.g. x86's rep-string instructions, like rep movsb (single-instruction memcpy of arbitrary size) are architecturally equivalent to repeating the base instruction (movsb) RCX times, decrementing RCX each time and incrementing or decrementing the pointer inputs (RSI and RDI). An interrupt arriving during a copy can set RCX starting_value - byte_copied and (if RCX is then non-zero) leave RIP pointing to the instruction, so on resuming after the interrupt the rep movsb will run again and do the rest of the copy.

Other x86 examples include SIMD gather loads (AVX2/AVX512) and scatter stores (AVX512). E.g. vpgatherdd ymm0, [rdi + ymm1*4], ymm2 does up to 8 32-bit loads, according to which elements of ymm2 are set. And the results are merged into ymm0.

In the normal case (no interrupts, no page faults or other synchronous exceptions during the gather), you get the data in the destination register, and the mask register ends up zeroed. The mask register thus gives the CPU somewhere to store progress.

Gather and scatter are slow, and might need to trigger multiple page faults, so for synchronous exceptions this guarantees forward progress even under pathological conditions where handling a page fault unmaps all other pages. But more relevantly, it means avoiding redoing TLB misses if a middle element page faults, and not discarding work if an async interrupt arrives.


Some other long-running instructions (like wbinvd which flushes all data caches across all cores) are not architecturally interruptible, or even microarchitecturally abortable (to discard partial work and go handle an interrupt). It's privileged so user-space can't execute it as a denial-of-service attack causing high interrupt latency.


Related example of documenting funny behaviour is when x86 popad goes off the top of the stack (segment limit). This is for an exception (not an external interrupt), documented earlier in the vol.3 manual, in section 6.5 EXCEPTION CLASSIFICATIONS (i.e. fault / trap / abort, see the PDF for more details.)

NOTE
One exception subset normally reported as a fault is not restartable. Such exceptions result in loss of some processor state. For example, executing a POPAD instruction where the stack frame crosses over the end of the stack segment causes a fault to be reported. In this situation, the exception handler sees that the instruction pointer (CS:EIP) has been restored as if the POPAD instruction had not been executed. However, internal processor state (the general-purpose registers) will have been modified. Such cases are considered programming errors. An application causing this class of exceptions should be terminated by the operating system.

Note that this is only if popad itself causes an exception, not for any other reason. An external interrupt can't split popad the way it can for rep movsb or vpgatherdd

(I guess for the purposes of popad faulting, it effectively works iteratively, popping 1 register at a time and logically modifying RSP/ESP/SP as well as the target register. Instead of checking the whole region it's going to load for segment limit before starting, because that would require an extra add, I guess.)


Out-of-order CPUs roll back to the retirement state on interrupts.

CPUs like modern x86 with out-of-order execution and splitting complex instructions into multiple uops still ensure this is the case. When an interrupt arrives, the CPU has to pick a point between two instructions it's in the middle of running as the location where the interrupt architecturally happens. It has to discard any work that's already done on decoding or starting to execute any later instructions. Assuming the interrupt returns, they'll be re-fetched and start over again executing.

See When an interrupt occurs, what happens to instructions in the pipeline?

As Andy Glew says, current CPUs don't rename the privilege level, so what logically happens (interrupt/exception handler executes after earlier instructions finish) matches what actually happens.

Fun fact, though: x86 interrupts aren't fully serializing, at least not guaranteed on paper. (In x86 terminology, instructions like cpuid and iret are defined as serializing; drain the OoO back-end and store buffer, and anything else that might possibly matter. That's a very strong barrier and lots of other things aren't, e.g. mfence.)

In practice (because CPUs don't in practice rename the privilege level), there won't be any old user-space instructions/uops in the out-of-order back-end still in flight when an interrupt handler runs.

Async (external) interrupts may also drain the store buffer, depending on how we interpret the wording of Intel's SDM vol.3 11.10: *the contents of the store buffer are always drained to memory in the following situations:" ... "When an exception or interrupt is generated". Clearly that applies to exceptions (where the CPU core itself generates the interrupt), and might also mean before servicing an interrupt.

(Store data from retired store instructions is not speculative; it definitely will happen, and the CPU has already dropped the state it would need to be able to roll back to before that store instruction. So a large store buffer full of scattered cache-miss stores can hurt interrupt latency. Either from waiting for it to drain before any interrupt-handler instructions can run at all, or at least before any in/out or locked instruction in an ISR can happen if it turns out that the store buffer isn't drained.)

Related: Sandpile (https://www.sandpile.org/x86/coherent.htm) has a table of things that are serializing. Interrupts and exceptions aren't. But again, this doesn't mean they don't drain the store buffer. This would be testable with an experiment: look for StoreLoad reordering between a store in user-space and a load (of a different shared variable) in an ISR, as observed by another core.

Part of this section doesn't really belong in this answer and should be moved somewhere else. It's here because discussion in comments on What happens to expected memory semantics (such as read after write) when a thread is scheduled on a different CPU core? cited this as a source for the probably wrong claim that interrupts don't drain the store buffer, which I wrote after misinterpreting "not serializing".

Diaphone answered 15/3, 2019 at 16:11 Comment(4)
Sandpile doesn't list hardware interrupts as serializing probably because they are not instructions. I think that list is a list of serializing instructions, not serializing events. But the "doc?" field says "no" for interrupts and exceptions, which I'm not sure what it means.Wallas
The Intel manual V2 mentions that the INT instructions basically have the same serialization properties as LFENCE. The AMD manual doesn't say this though (AFAICT). Also, both the Intel and AMD manuals mentions that "exceptions and interrupts" drain the store buffer and the WC buffers. This suggests that the term "interrupts" in this context refers to hardware interrupts and the term"exceptions" refers to program-error exceptions and machine-check exceptions (see Section 6.4 of Volume 3). It seems to me that "exceptions and interrupts" are fully serializing.Wallas
I don't want to read the whole 2008 paper at this time, can you point out where exactly does it say that interrupts on x86 are serializing? And hopefully the terms "interrupts" and "serializing" are well-defined in the paper, so we don't have to guess. And also hopefully they give an Intel reference (the authors are not from Intel). They have used the Simics simulator, which is an academic simulator, which means that their results do not necessarily show how real processors work.Wallas
@HadiBrais: That paper is a red herring; they're talking about serializing OoO exec only, not memory. I was looking at section 3.2 where they talk about CPUs not renaming CS, thus syscall is serializing. And by implication, so are interrupts (at least when taken from user-space), although they don't even mention that. I'm going to remove that section from this answer; after a second look it's too distantly related. (BTW, I updated the link to a better-formatted version of it. ftp.cs.wisc.edu/sohi/papers/2008/hpca2008-serial.pdf.)Diaphone
P
4

I'm not sure the 8080 was designed to be used in multi-CPU systems with shared RAM, which, however doesn't necessarily imply impossibility or nonexistence of such systems. The 8086 lock prefix is for such systems to ensure just one CPU can have exclusive access to memory while executing a sequence of memory read, value modification, memory write (RMW). The lock prefix isn't there to guard an instruction or a few instructions from being preempted by an interrupt handler.

You can be sure that individual instructions don't somehow get interrupted in mid-flight. Either they're let to run until completion or any of their side effects are reverted and they are restarted at a later time. That's a common implementation on most CPUs. Without it it would be hard to write well behaving code in presence of interrupts.

Indeed, you cannot perform a 64-bit addition with a single 8080 instruction, so, that operation can be preempted by the ISR.

If you don't want that preemption at all, you can guard your 64-bit add with interrupt disable and enable instructions (DI and EI).

If you want to let the ISR preempt the 64-bit but without disturbing the registers that the 64-bit add uses, the ISR must save and restore those registers by e.g. using the PUSH and POP instructions.

Find a 8080 manual for detailed description of interrupt handling (e.g. here).

Phenacite answered 15/3, 2019 at 11:5 Comment(4)
On 8086, lock (and xchg with memory) exist for atomicity with respect to other non-CPU devices in the system, e.g. DMA reads. And for use on memory-mapped I/O, I think, where perhaps it was important that the CPU keep the #LOCK signal asserted while doing the read + write. The earliest SMP x86 systems were 386, I think. (And the earlier with something like the modern memory model were 486; I think I've read the 386 didn't have some of the current guarantees.)Diaphone
@PeterCordes You may be right w.r.t. other memory-accessing devices. I focused on just CPUs.Phenacite
Well that's what it's mostly used for on modern x86, but you literally say "the 8086 lock prefix", not "x86 lock prefix". That use-case doesn't exist in 8086. (And it's interesting that it existed before SMP systems.)Diaphone
@PeterCordes Ah, yes, x86 would fit better than 8086.Phenacite

© 2022 - 2024 — McMap. All rights reserved.