#1341 VMX: Triple fault in guest leads to unusable CPU

fixed_in_SVN
closed
None
1
2014-07-29
2013-10-14
reet
No

Hi,

We are using Bochs to speed up our hypervisor development process and it is working very well. Thanks for the great software!

Recently we found a problem with the handling of triple faults occurring in VMX non-root mode.

On "real" hardware (Intel Core i7 3520M) the occurrence of a triple fault in VMX non-root mode has no effect on the CPU in root mode and it is still possible to launch another virtual machine (i.e. load and launch another VMCS to perform error handling).

In Bochs however, a triple fault in a guest seems to have side-effects not only to other guest state but also to the host state itself.

Attached you find the bochs debug output of a problematic run. I triggered a tripple fault in the guest by using the ud2 instruction very early in the boot code. This results in a trap into the hypervisor code.

The hypervisor logs the error and uses ud2 to panic and halt the CPU. But after a triple fault in the guest, this mechanism does not work anymore, resulting in a triple fault in root mode. Trying to run another guest is also not possible anymore, the code behaves erratic.

I'm currently using revision r11876 of Bochs.

Thanks!

Kind regards

1 Attachments

Discussion

  • Stanislav Shwartsman

    I don't understand the description. Normal behavior would be that hypervisor sets VMEXIT control and exits on tripple fault. After VMEXIT it can kill faulted guest and continue. If the VMEXIT was not registered - the triple fault will cause SHIUTDOWN state even in host mode. When in shutdown state you dead, the only thing that can take you out of it is NMI and (if configured) VMX preemption timer. After VMEXIT the hypervisor could again kill the guest and continue living. If none of these events will arrive - the CPU will remain stuck in SHUTDOWN mode under VMX.

    What is the behavior you expect ?

    Stanislav

     
  • reet

    reet - 2013-10-15

    On 10/14/2013 07:53 PM, Stanislav Shwartsman wrote:

    I don't understand the description. Normal behavior would be that hypervisor sets VMEXIT control and exits on tripple fault.

    There is no execution control for triple faults: a triple fault always
    causes a VM exit (see Intel SDM Vol. 3C, section 33.2 and Table C-1 in
    appendix C).

    After VMEXIT it can kill faulted guest and continue.

    No it can't, at least with Bochs. This is the bug.

    If the VMEXIT was not registered - the triple fault will cause SHIUTDOWN state even in host mode.

    No, a triple fault in a guest must not affect VMX root mode or other guests.

     
  • Stanislav Shwartsman

    What do you mean - it can't ?

    Does the VMEXIT happen (yes, you are right, it always should happen) ?
    If it happens is the host able to run ?
    What host does after VMEXIT ?

    Stanislav

     
  • reet

    reet - 2013-10-15

    On 10/15/2013 01:50 PM, Stanislav Shwartsman wrote:

    What do you mean - it can't ?

    In our system a trap handler guest is launched on the same CPU to
    process the triple fault of the previous component. The handler
    basically dumps the state and halts the VCPU in this case.

    The mechanism does not work with Bochs because the handler guest is
    unable to run correctly after a triple fault in another guest (i.e. the
    handler guest provokes a page fault) whereas it works as expected on
    real hardware.

    Does the VMEXIT happen (yes, you are right, it always should happen) ?

    It happens. But somehow the state of the CPU/VCPU is messed up
    afterwards. Other guests on the same CPU behave erratic and it even has
    side effects on the host.

    If it happens is the host able to run ?

    It is to a certain point. But the interrupt handling code does not work
    anymore for example (see the bochsout.txt attached to the first message:
    the ud2 leads to a triple fault on the host instead of dumping the host
    state and then halting the CPU in the ISR).

     
  • Stanislav Shwartsman

    Do you have disk image to try with explanation what is going on and what is expected behavior ?

    I am still not able to understand from your description what is going on and need to see it myself on my Bochs.

    Stanislav

     
  • Stanislav Shwartsman

    BTW, which Bochs version you use ?

     
  • Stanislav Shwartsman

    I see some issue within your Bochs log now and it leads me to conclusion that bug is in your code and not in Bochs.
    See these lines:
    00326975249e[CPU0 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU1 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU2 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU3 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0

    First I would expect you to follow good practice of checking after VMLAUNCH/VMRESUME if it succeeded. In your case VMRESUME failed (VMFAIL: VMRESUME with non-launched VMCS) and you stay in host mode. The ud2a later was executed within host but the host have no IDTR set up even so it crashes ... Did you launch the new VMM before or you doing VMRESUME directly ?

    Stanislav

     
  • reet

    reet - 2013-10-15

    On 10/15/2013 10:17 PM, Stanislav Shwartsman wrote:

    I see some issue within your Bochs log now and it leads me to conclusion that bug is in your code and not in Bochs.
    See these lines:
    00326975249e[CPU0 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU1 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU2 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975249e[CPU3 ] VMFAIL: VMRESUME with non-launched VMCS!
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0
    00326975287e[CPU1 ] interrupt(): vector must be within IDT table limits, IDT.limit = 0x0

    First I would expect you to follow good practice of checking after VMLAUNCH/VMRESUME if it succeeded. In your case VMRESUME failed (VMFAIL: VMRESUME with non-launched VMCS) and you stay in host mode. The ud2a later was executed within host but the host have no IDTR set up even so it crashes ... Did you launch the new VMM before or you doing VMRESUME directly ?

    We follow good VMM practice by hiding the distinction between vmresume
    and vmlaunch. The VMM only cares if both fail, there is no need to
    handle these differently. The same method is e.g. implemented in the
    NOVA hypervisor [1] which is considered very well designed.

    Then, the IDT messages stem from the software interrupt in non-root mode
    I used to provoke the triple fault in the guest.

    After these lines, the VMM launches the handler guest so another line of:

    VMFAIL: VMRESUME with non-launched VMCS!

    Which then fails on Bochs but works on real hardware.

    • reet

    [1] - https://github.com/IntelLabs/NOVA/blob/master/src/ec.cpp, line 199

     
  • Adrian-Ken Rueegsegger

    This issue has been fixed in r11908.

     
  • Stanislav Shwartsman

    • status: open --> closed
    • assigned_to: Stanislav Shwartsman
    • Group: can't_reproduce --> fixed_in_SVN
     

Log in to post a comment.