KVM Run Process之KVM核心流程

在“KVM Run Process之Qemu核心流程”一文中讲到Qemu通过KVM_RUN调用KVM提供的API发起KVM的启动,从这里进入到了内核空间运行,本文主要讲述内核中KVM关于VM运行的核心调用流程,所使用的内核版本为linux3.15。

KVM核心流程

KVM RUN的准备

当Qemu使用kvm_vcpu_ioctl(env, KVM_RUN, 0);发起KVM_RUN命令时,ioctl会陷入内核,到达kvm_vcpu_ioctl();

kvm_vcpu_ioctl()     file: virt/kvm/kvm_main.c, line: 1958
    --->kvm_arch_vcpu_ioctl_run()    file: arch/x86/kvm, line: 6305
        --->__vcpu_run()  file: arch/x86/kvm/x86.c, line: 6156

在__vcpu_run()中也出现了一个while(){}主循环;

static int __vcpu_run(struct kvm_vcpu *vcpu)
{
......
r = 1;
while (r > 0) {
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted)
r = vcpu_enter_guest(vcpu);
else {
......
}
}
if (r <= 0) <--------当r小于0时会跳出循环体
break;
......
}
return r;
}

我们看到当KVM通过__vcpu_run()进入主循环后,调用vcpu_enter_guest(),从名字上看可以知道这是进入guest模式的入口;
当r大于0时KVM内核代码会一直调用vcpu_enter_guest(),重复进入guest模式;
当r小于等于0时则会跳出循环体,此时会一步一步退到当初的入口kvm_vcpu_ioctl(),乃至于退回到用户态空间Qemu进程中,具体的地方可以参看上一篇文章,这里也给出相关的代码片段:

int kvm_cpu_exec(CPUArchState *env)
{

do {
run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
switch (run->exit_reason) { <----------Qemu根据退出的原因进行处理,主要是IO相关方面的操作
case KVM_EXIT_IO:
kvm_handle_io();
......
case KVM_EXIT_MMIO:
cpu_physical_memory_rw();
......
case KVM_EXIT_IRQ_WINDOW_OPEN:
ret = EXCP_INTERRUPT;
......
case KVM_EXIT_SHUTDOWN:
ret = EXCP_INTERRUPT;
......
case KVM_EXIT_UNKNOWN:
ret = -1
......
case KVM_EXIT_INTERNAL_ERROR:
ret = kvm_handle_internal_error(env, run);
......
default:
ret = kvm_arch_handle_exit(env, run);
......
}
} while (ret == 0);
env->exit_request = 0;
return ret;
}

Qemu根据退出的原因进行处理,主要是IO相关方面的操作,当然处理完后又会调用kvm_vcpu_ioctl(env, KVM_RUN, 0)再次RUN KMV。
我们再次拉回到内核空间,走到了static int vcpu_enter_guest(struct kvm_vcpu *vcpu)函数,其中有几个重要的初始化准备工作:

static int vcpu_enter_guest(struct kvm_vcpu *vcpu)  file: arch/x86/kvm/x86.c, line: 5944
{

......
kvm_check_request(); <-------查看是否有guest退出的相关请求
......
kvm_mmu_reload(vcpu); <-------Guest的MMU初始化,为内存虚拟化做准备
......
preempt_disable(); <-------内核抢占关闭
......
kvm_x86_ops->run(vcpu); <-------体系架构相关的run操作
...... <-------到这里表明guest模式已退出
kvm_x86_ops->handle_external_intr(vcpu); <-------host处理外部中断
......
preempt_enable(); <-------内核抢占使能
......
r = kvm_x86_ops->handle_exit(vcpu); <------根据具体的退出原因进行处理
return r;
......
}

Guest的进入

kvm_x86_ops是一个x86体系相关的函数集,定义位于file: arch/x86/kvm/vmx.c, line: 8693

static struct kvm_x86_ops vmx_x86_ops = {
......
.run = vmx_vcpu_run,
.handle_exit = vmx_handle_exit,
......
}

vmx_vcpu_run()中一段核心的汇编函数的功能主要就是从ROOT模式切换至NO ROOT模式,主要进行了:

  1. Store host registers:主要将host状态上下文存入到VM对应的VMCS结构中;
  2. Load guest registers:主要讲guest状态进行加载;
  3. Enter guest mode: 通过ASM_VMX_VMLAUNCH指令进行VM的切换,从此进入另一个世界,即Guest OS中;
  4. Save guest registers, load host registers: 当发生VM Exit时,需要保持guest状态,同时加载HOST;

当第4步完成后,Guest即从NO ROOT模式退回到了ROOT模式中,又恢复了HOST的执行生涯。

Guest的退出处理

当然Guest的退出不会就这么算了,退出总是有原因的,为了保证Guest后续的顺利运行,KVM要根据退出原因进行处理,此时重要的函数为:vmx_handle_exit();

static int vmx_handle_exit(struct kvm_vcpu *vcpu)    file: arch/x86/kvm/vmx.c, line: 6877
{

......
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu); <-----根据reason调用对应的注册函数处理
else {
vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
vcpu->run->hw.hardware_exit_reason = exit_reason;
}
return 0; <--------若发生退出原因不在KVM预定义的handler范围内,则返回0
}

而众多的exit reason对应的handler如下:

static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_EXCEPTION_NMI] = handle_exception, <------异常
[EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, <------外部中断
[EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault,
[EXIT_REASON_NMI_WINDOW] = handle_nmi_window,
[EXIT_REASON_IO_INSTRUCTION] = handle_io, <------io指令操作
[EXIT_REASON_CR_ACCESS] = handle_cr,
[EXIT_REASON_DR_ACCESS] = handle_dr,
[EXIT_REASON_CPUID] = handle_cpuid,
[EXIT_REASON_MSR_READ] = handle_rdmsr,
[EXIT_REASON_MSR_WRITE] = handle_wrmsr,
[EXIT_REASON_PENDING_INTERRUPT] = handle_interrupt_window,
[EXIT_REASON_HLT] = handle_halt,
[EXIT_REASON_INVD] = handle_invd,
[EXIT_REASON_INVLPG] = handle_invlpg,
[EXIT_REASON_RDPMC] = handle_rdpmc,
[EXIT_REASON_VMCALL] = handle_vmcall, <-----VM相关操作指令
[EXIT_REASON_VMCLEAR] = handle_vmclear,
[EXIT_REASON_VMLAUNCH] = handle_vmlaunch,
[EXIT_REASON_VMPTRLD] = handle_vmptrld,
[EXIT_REASON_VMPTRST] = handle_vmptrst,
[EXIT_REASON_VMREAD] = handle_vmread,
[EXIT_REASON_VMRESUME] = handle_vmresume,
[EXIT_REASON_VMWRITE] = handle_vmwrite,
[EXIT_REASON_VMOFF] = handle_vmoff,
[EXIT_REASON_VMON] = handle_vmon,
[EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_APIC_WRITE] = handle_apic_write,
[EXIT_REASON_EOI_INDUCED] = handle_apic_eoi_induced,
[EXIT_REASON_WBINVD] = handle_wbinvd,
[EXIT_REASON_XSETBV] = handle_xsetbv,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch, <----进程切换
[EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
[EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, <----EPT缺页异常
[EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
[EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
[EXIT_REASON_INVEPT] = handle_invept,
};

当该众多的handler处理成功后,会得到一个大于0的返回值,而处理失败则会返回一个小于0的数;则又回到__vcpu_run()中的主循环中;
vcpu_enter_guest() > 0时: 则继续循环,再次准备进入Guest模式;
vcpu_enter_guest() <= 0时: 则跳出循环,返回用户态空间,由Qemu根据退出原因进行处理。

Conclusion

至此,KVM内核代码部分的核心调用流程的分析到此结束,从上述流程中可以看出,KVM内核代码的主要工作如下:

  1. Guest进入前的准备工作;
  2. Guest的进入;
  3. 根据Guest的退出原因进行处理,若kvm自身能够处理的则自行处理;若KVM无法处理,则返回到用户态空间的Qemu进程中进行处理;

总而言之,KVM与Qemu的工作是为了确保Guest的正常运行,通过各种异常的处理,使Guest无需感知其运行的虚拟环境。

附图: