Understanding arm64's kvm_vcpu_run_vhe function
Asked Answered
V

1

9

In arch/arm64/kvm/hyp/vhe/switch.c, we have this code that runs the CPU in EL2 (the virtualization layer) on arm64:

/* Switch to the guest for VHE systems running in EL2 */
static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
{
    struct kvm_cpu_context *host_ctxt;
    struct kvm_cpu_context *guest_ctxt;
    u64 exit_code;

    host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
    host_ctxt->__hyp_running_vcpu = vcpu;
    guest_ctxt = &vcpu->arch.ctxt;

    sysreg_save_host_state_vhe(host_ctxt);

    /*
     * ARM erratum 1165522 requires us to configure both stage 1 and
     * stage 2 translation for the guest context before we clear
     * HCR_EL2.TGE.
     *
     * We have already configured the guest's stage 1 translation in
     * kvm_vcpu_load_sysregs_vhe above.  We must now call
     * __load_guest_stage2 before __activate_traps, because
     * __load_guest_stage2 configures stage 2 translation, and
     * __activate_traps clear HCR_EL2.TGE (among other things).
     */
    __load_guest_stage2(vcpu->arch.hw_mmu);
    __activate_traps(vcpu);

    __kvm_adjust_pc(vcpu);

    sysreg_restore_guest_state_vhe(guest_ctxt);
    __debug_switch_to_guest(vcpu);

    do {
        /* Jump in the fire! */
        exit_code = __guest_enter(vcpu);

        /* And we're baaack! */
    } while (fixup_guest_exit(vcpu, &exit_code));

    sysreg_save_guest_state_vhe(guest_ctxt);

    __deactivate_traps(vcpu);

    sysreg_restore_host_state_vhe(host_ctxt);

    if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
        __fpsimd_save_fpexc32(vcpu);

    __debug_switch_to_host(vcpu);

    return exit_code;
}
NOKPROBE_SYMBOL(__kvm_vcpu_run_vhe);

I think the important line here is the __guest_enter, which is defined here.

I'm trying to understand what does __kvm_vcpu_run_vhe do exactly. Does it block the caller? What does __guest_enter do? I've looked at the code and it saves and then restores some registers, but I couldn't identify where it runs the VM code.

Valois answered 17/7, 2021 at 2:55 Comment(4)
I think the transfer to guest code must be the eret.Managerial
@NateEldredge Is correct. Just like setjmp, the routine may return an error code. Otherwise, it blocks the caller. All code does not have to have a return semantic. Co-routines, state-machines, etc. do not need to save state on a stack so that things are 'nested'. As with the Linux scheduler context switch, a Trustzone world switch and setjmp, this code needs to careful about registers and may not have a traditional return semantic. Note: ARM code often uses banked registers for context switch; pay attention to modesSamhita
@NateEldredge what happens after eret? It returns to __kvm_vcpu_run_vhe but then what happens? Does the code after exit_code = __guest_enter(vcpu); run on the guest? Where is the part that the cpu starts running guest code?Valois
No, eret doesn't return to __kvm_vcpu_run_vhe. It branches to the address held in ELR, which presumably is set somewhere else to point to the guest code. So it really is the eret itself that triggers the start of guest code. How we get back is a separate question - presumably the only way to get out of guest code is via exception, so that exception handler must arrange to branch back into __kvm_vcpu_run_vhe, perhaps at the point right after the call to __guest_enter, as if it had returned normally.Managerial
M
4

Yes, it blocks the host and starts running the guest until the next interrupt.

__guest_enter is passed a pointer to a kernel structure describing the guest’s CPU state. The assembly routine transfers that state to the appropriate (host) CPU registers and executes an instruction which transfers control to the guest.

The instruction is named eret, which is short for ‘exception return’. This is because from the perspective of the CPU, the guest had triggered an exception and the kernel, after handling it, is returning to the guest. But from the perspective of the kernel, it’s the guest which is invoked by, and will later return to the kernel! Either way you look at it, this is the point where guest code starts executing.

At some point during the execution of the guest, an interrupt is going to occur, requiring an intervention from the hypervisor (such as when the guest accesses emulated hardware, or host hardware requiring attention). Other code in the kernel ensures that when that happens, the kernel continues execution at either __guest_exit_panic or __guest_exit, depending on which exception was raised. The former triggers a kernel panic, while the latter stores the exit reason and the guest state back in the kernel structure and returns execution to the caller of __guest_enter. The caller can then decide how to handle the guest from then on. As we can see, some exits are dispatched immediately without even notifying the caller of __kvm_vcpu_run_vhe; others are returned to the caller, and eventually to userspace.

Meingoldas answered 20/7, 2021 at 14:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.