I don't see any reason why being inside the guest would be special wrt. what happens when an external interrupt arrives. (Assuming we're talking about the guest code running natively on the pipeline, not the root emulating and/or deciding to delay re-entry to the guest in anticipation of another interrupt. See comments.)
They're not going to effectively block interrupts for multiple instructions; that would hurt interrupt latency. Since there aren't special "sync points" where interrupts can only be delivered there, the pipeline needs to be able to handle interrupts between arbitrary instructions. Out-of-order exec always potentially has a lot going on, so you can't count on waiting for any specific state before handling an interrupt; that could take too long. If the gap between one pair of instructions is ok to deliver an interrupt, why not any other?
CPUs don't rename the privilege level, so yes they'd roll back to the retirement state, discard all in-flight instructions in the back-end, and then figure out what to do based on the current state. See also When an interrupt occurs, what happens to instructions in the pipeline?
This completely untested guess is based on my understanding of CPU architecture. If there was a measurable effect on interrupt latency, that might be a real thing.
In practice, regardless of VT-X, some instruction-boundaries will probably be impossible to interrupt unless single-stepping.
Retirement bandwidth is 3 per clock (Nehalem), 4 per clock per logical thread (Haswell), or even higher in Skylake. Retirement from the out-of-order core is usually bursty, because it happens in-order (to support precise exceptions), this is why we have a ROB separate from the Reservation Station.
It's very common for one instruction to block retirement of later independent instructions for a while, and then for a burst of retirement along with that instruction. e.g. a cache-miss load, or the end of a long dep chain right before some independent instructions.
So for some functions or blocks of code, it's likely that every single time they run, an xor
-zeroing instruction for example always retires in the same cycle as the instruction before. That means the CPU is never in a state where the xor-zeroing instruction is the oldest non-retired instruction, and thus gap between it and the insn before can never be the place where an interrupt appears.
If you have 2 instructions closely following each other, e.g. one coming in a cycle after the CPU returns to user-space from an earlier instruction, you could end up with front-end effects at 64-byte I-cache boundaries disturbing the usual pattern of cheap independent instructions like nop
or xor
-zeroing always retiring in the same cycle as an earlier higher-latency instruction, but there might still be cases that were non-disturbable where fetch and 5-wide decode, and 4-wide issue/rename will reliably get instructions into the pipe together without opportunity for having the slow one finish before the fast one after it is ready to retire.
As I said, this isn't specific to VT-x.
rep
string instructions. – Salinas