The question is addressed to hackers/developers specialized in hardware-assisted virtualization.
Now I work on a simple hypervisor (proprietary) that uses VMX virtualization extensions. I have an emulator of local APIC interrupt controller (it works through MMIO interception) and it works nice. I started to improve the performance of Windows XP 32-bit guest by application of FlexPriority extensions and noticed some strange behavior.
When APIC access page is mapped to the guest through EPT for read access only all the things are O.K. TPR shadowing improves overall performance up to 3 times. Reads of TPR (at offset 0x80) are performed by HW without exiting, other accesses are virtualized by instruction emulation during APIC access exits.
But when APIC access page is mapped for reads and writes both, TPR shadowing results to BSOD in the guest: IRQL_IS_LESS_OR_EQUAL on access to 0x00000016. In this case all accesses to TPR are virtualized by HW.
Looks like TPR register value automatically stored (by CPU) to the virtual APIC page at offset 0x80 is in some inconsistent state and it cannot be trusted during computing of PPR value and making decisions on the next interrupt vector for servicing (injection). I printed values of HW-set TPR values during booting of Windows guest and compared them with software-virtualized APIC (without FlexPriority), and they were similar.
My configuration is the following:
- virtualize APIC accesses secondary control is ON;
- use TPR shadow CPU based control is ON
- hypervisor is executed as a guest in Qemu/KVM in nested mode
- APIC access page is mapped through EPT to the guest at APIC_DEFAULT_BASE (0xFEE00000)
- virtual APIC page is mapped in host through CPU page tables for reading TPR value from local APIC emulator
I have a number of hypotheses on reasons of observed behavior:
- the problem can be some kind of caching issue, thus possibly virtual APIC page should be mapped to host via non-cacheable mapping
- the problem can be related to nested virtualization
But it looks that I loose something or understand mechanics behind virtual APIC page and APIC access page usage incorrectly. Could anybody suggest an assumption why TPR shadowing can result in BSODs when the same code works well with fair emulation through MMIO interception/exiting on R/W to APIC access page?