I am writing a small OS-agnostic hypervisor as a teaching tool for my students. The hypervisor code is loaded by the code I embed in a custom MBR on the boot device when the system boots. The hypervisor code switches to 32-bit proceted mode and then IA32e (64-bit mode, paged with identity mapping of linear -- physical addresses). It then sets up the 64-bit exception handling mechanism and tests of this exception handling mechanism are successful (CPL and DPL are 0 so no stack switching is expected). E.g., divide by 0, and page faults are handled as expected.
Next, an IA32e mode guest is launched. The guest has its own paging tables (these are not identity mapped). The guest handles exceptions and interrutps by itself (i.e., it has a different IDT than the host, and the exception bitmap control is set to 0). All this is working. External interrupts, exceptions, memory accesses, access to I/O devices is working well int he guest. The guest exits to the host because of various conditions and is resumed correctly.
The issue occurs when I try to capture programming mistakes in the VM exit handler (host code). For example, a divide by 0, invalid, opcode, page fault exceptions all result in the CPU locking up. The host essentially has the same IDT setting as before the launch of the guest, but clearly something is getting screwed up. Any thoughts as to what I should be looking at in particular to help solve this issue?
For the host, I am setting up TR selector, IDTR base, TR base to the same values they are before VM launch is executed. Because the host is running witch CPL of 0 and the handler code's CS has DPL of 0, I am not expecting a stack switch. Therefore, I am not specifying any stacks in the 64-bit TSS (all entries in the hosts TSS are 0 except for the I/O bitmap offset).