In which conditions the ioctl KVM_RUN returns?
Asked Answered
H

1

2

In https://github.com/qemu/qemu/blob/stable-4.2/cpus.c#L1290 lies a very important piece of Qemu. I guess it's the event loop for a CPU on KVM.

Here is the code:

static void *qemu_kvm_cpu_thread_fn(void *arg)
{
    CPUState *cpu = arg;
    int r;

    rcu_register_thread();

    qemu_mutex_lock_iothread();
    qemu_thread_get_self(cpu->thread);
    cpu->thread_id = qemu_get_thread_id();
    cpu->can_do_io = 1;
    current_cpu = cpu;

    r = kvm_init_vcpu(cpu);
    if (r < 0) {
        error_report("kvm_init_vcpu failed: %s", strerror(-r));
        exit(1);
    }

    kvm_init_cpu_signals(cpu);

    /* signal CPU creation */
    cpu->created = true;
    qemu_cond_signal(&qemu_cpu_cond);
    qemu_guest_random_seed_thread_part2(cpu->random_seed);

    do {
        if (cpu_can_run(cpu)) {
            r = kvm_cpu_exec(cpu);
            if (r == EXCP_DEBUG) {
                cpu_handle_guest_debug(cpu);
            }
        }
        qemu_wait_io_event(cpu);
    } while (!cpu->unplug || cpu_can_run(cpu));

    qemu_kvm_destroy_vcpu(cpu);
    cpu->created = false;
    qemu_cond_signal(&qemu_cpu_cond);
    qemu_mutex_unlock_iothread();
    rcu_unregister_thread();
    return NULL;
}

I'm interested on the do loop. It calls kvm_cpu_exec in a loop, which is defined here: https://github.com/qemu/qemu/blob/stable-4.2/accel/kvm/kvm-all.c#L2285

At one point of kvm_cpu_exec, it calls run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);, which calls the KVM_RUN ioctl documented here: https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt

4.10 KVM_RUN

Capability: basic
Architectures: all
Type: vcpu ioctl
Parameters: none
Returns: 0 on success, -1 on error
Errors:
EINTR: an unmasked signal is pending

This ioctl is used to run a guest virtual cpu. While there are no
explicit parameters, there is an implicit parameter block that can be
obtained by mmap()ing the vcpu fd at offset 0, with the size given by
KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct
kvm_run' (see below).

I am struggling to understand if this ioctl blocks execution? In which cases it returns?

I would like to have a little context of what is happening. Given the line qemu_wait_io_event(cpu), it looks like at least, the ioctl would return every time an event was to be read/written to/from the CPU. I don't know, I'm confused.

Hoang answered 8/12, 2020 at 7:24 Comment(0)
A
7

The KVM API design requires each virtual CPU in the VM to have an associated userspace thread in the program like QEMU which is controlling that VM (this program is often called a "Virtual Machine Monitor" or VMM, and it doesn't have to be QEMU; other examples are kvmtool and firecracker).

The thread behaves like a normal userspace thread within QEMU up to the point where it makes the KVM_RUN ioctl. At that point the kernel uses that thread to execute guest code on the vCPU associated with the thread. This continues until some condition is encountered which means that guest execution can't proceed any further. (One common condition is "the guest made a memory access to a device that is being emulated by QEMU".) At that point, the kernel stops running guest code on this thread, and instead causes it to return from the KVM_RUN ioctl. The code within QEMU then looks at the return code and so on to find out why it got control back, deals with whatever the situation was, and loops back around to call KVM_RUN again to ask the kernel to continue to run guest code.

Typically when running a VM, you'll see that almost all the time the thread is inside the KVM_RUN ioctl, running real guest code. Occasionally execution will return, QEMU will spend as little time as possible doing whatever it needs to do, and then it loops around and runs guest code again. One way of improving the efficiency of a VM is to try to ensure that the number of these "VM exits" is as low as possible (eg by careful choice of what kind of network or block device the guest is given).

Averil answered 8/12, 2020 at 16:12 Comment(4)
Very good explanation. I was looking on how does Qemu simulates peripherals. I'm now imagining how does KVM know to return from the KVM_RUN call. My guess is that Qemu can call KVM and tell that when certain addresses are acessed, the kernel KVM should return for Qemu to handle it. Am I right or at least near it? Let's say that the guest kernel acesses a peripheral. Then the Qemu thread is unlocked and Qemu can respond to this peripheral access. But how?Hoang
Maybe If the guest kernel try to read from an address (where a PCI device should be located) and put into a register, then KVM could return so Qemu can write to that register? Would be nice if you give at least a place to look about this. I'm writing some notes on github about Qemu internals that could help others :)Hoang
I've took a closer look at the qemu_wait_io_event which eventually calls process_queued_cpu_work which executes the function cpu->queued_work_first. It looks like it can handle only functions queued, so I doubt it's related to handling io device communication, because those I/O comunications would have to be constantly queued. The only relevant calls that queue work on the cpu that I found were on kvm-all like this: kvm_cpu_synchronize_post_resetHoang
KVM returns from the KVM_RUN call for a peripheral access when the access is to some address where the VMM didn't map memory and where there's no built-in kernel device. The guest access causes a stage-2 fault, in Arm terms; I dunno the x86 terminology -- in any case the hypervisor gets control and can emulate the load/store however it likes, which for KVM usually means "let userspace deal with it".Averil

© 2022 - 2024 — McMap. All rights reserved.