Menu

#1428 AMD SVM Hyper-V fails (bug)

fixed_in_SVN
closed
None
1
2022-08-13
2021-03-11
Christopher
No

Hi,

I'm trying to run Hyper-V AMD in Bochs, unfortunately it seems there may be bugs in SVM emulation that prevents this from working.

First thing I encountered was in WriteCR8 in crregs.cc, I think the V_INTR_TPR needs to be set to "value & 0xf" rather than "(value & 0xf) << 4", not sure on this 100% but this first patch does allow the OS to boot further.

Afterwards however it seems that the guest gets an interrupt vector 0xd1, and goes to vmexit to hyper-v, however during this, a RDTSC happens somehow in bochs and this corrupts the guest registers (e.g. RAX). I haven't checked yet to see what causes the RDTSC to occur yet, I cant see the guest actually issuing this.

Currently this is attempted using the zambezi cpu as the ryzen cpu fails to boot into windows 10 even without hyper-v enabled.

Discussion

<< < 1 2 (Page 2 of 2)
  • Stanislav Shwartsman

    Were all these changes necessary ?
    Could you ignore first one and see if it matters for example ?
    I will try to look on other changes you made.

     
    • Christopher

      Christopher - 2021-04-21

      Yeah all the changes were necessary, reasons being:

      • For the paging.cc nested_walk change, without this change as mentioned the exitinfo1 was like 0000000200000004 and when this went to hyper-v, it must have injected an exception or something because code execution jumps to BSOD in guest, but with this change it doesn't and the exitinfo1 becomes 0000000100000004 and the guest continues. This code is hit when the guest attempts to read addresses like 0xfee00320 and 0xfee00340

      • For the paging.cc *nx_fault change, this is required because the guest does a "out dx, al" instructions (dx being 0xb2, al = 0xe1) and this causes the *nx_fault line to be hit which then causes vmexit to hyper-v to handle it and hyper-v dies.

      • The guest.efer check change is required because somehow this gets changed at some point and causes future VMRUNs to fail, which leads to the hypervisor crashing.

       

      Last edit: Christopher 2021-04-21
  • Stanislav Shwartsman

    Change that paging.cc line ~1384 from nested_walk(paddress, rw, 0); to nested_walk(paddress, rw, 1);

    Found a bug here. Fix is:

    void BX_CPU_C::nested_page_fault(unsigned fault, bx_phy_address guest_paddr, unsigned rw, unsigned is_page_walk)
    {
      unsigned isWrite = rw & 1;
    
      Bit64u error_code = fault | (1 << 2) | (isWrite << 1);
      if (rw == BX_EXECUTE)
        error_code |= ERROR_CODE_ACCESS; // I/D = 1
    
      if (is_page_walk)
            error_code |= BX_CONST64(1) << 33;// was 32
      else
        error_code |= BX_CONST64(1) << 32;// was 33
    
      Svm_Vmexit(SVM_VMEXIT_NPF, error_code, guest_paddr);
    }
    

    Let me know if helps.

     
  • Stanislav Shwartsman

    For NX fault problem I have a guess to verify.
    What is the value of EFER.NXE for both guest and host there ?

    A page is considered executable by the guest only if it is marked executable at both the guest and
    nested page table levels. If the EFER.NXE bit is cleared for the guest, all guest pages are executable at the guest level. Similarly, if the EFER.NXE bit is cleared for the host, all nested page table mappings are executable at the underlying nested level.

    This is definitelly not what Bochs does. I am still struggling to understand the exact meaning of it. But it would be nice to confirm if this is the issue.

     
    • Christopher

      Christopher - 2021-04-23

      Hey, so when the NX fault occurs, the EFER.NXE status for guest and host are:
      GUEST_NXE:0, HOST_NXE:1

      Also yes your fix for the nested_page_fault issue above seems good

       
  • Stanislav Shwartsman

     
  • Stanislav Shwartsman

    I am still wondering what is going on with nx_fault=1
    It seems like there is indeed page walk which hit this NX fault.
    True NX fault, i.e. page was translated for execute access and NX bit was set on the way. And it was not reserved because of GUEST::EFER.NXE=0 so it happened on host,
    Can you may catch that walk and show what is going on there ?
    You may enable DEBUG=all right before that or so and have a lot of debug into printed about paging into the log file. Want to understand what is missing, what kind of NX is present in page tables and why it should not be taken into account.
    May be while translating execute access for guest EPT walk for stuffed loads should not be treated as EXECUTE ? I don't know, AMD manual is very unpleasant to use :(

     
  • Christopher

    Christopher - 2021-04-26

    I'll look into it more tomorrow too, but for now I can provide a dump of the debug prints right before it happens (with some extras in there):

    07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
    07119718387d[CPU0 ] Nested walk for guest paddr 0x000004605000
    07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
    07119718387d[CPU0 ] Nested walk for guest paddr 0x000004606070
    07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
    07119718387d[CPU0 ] Nested walk for guest paddr 0x00000010e04c
    07119718393i[CPU0 ] Enter to System Management Mode
    07119718393i[CPU0 ] enter_system_management_mode: temporary disable VMX while in SMM mode
    07119718393d[CPU0 ] real mode activated
    07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
    07119718393d[CPU0 ] Nested walk for guest paddr 0x0000000a8000
    07119718393d[CPU0 ] PAE PTE: non-executable page fault occured
    07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
    ^ Error here

     
  • Stanislav Shwartsman

    07119718393i[CPU0 ] Enter to System Management Mode
    07119718393i[CPU0 ] enter_system_management_mode: temporary disable VMX while in SMM mode
    07119718393d[CPU0 ] real mode activated
    07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
    07119718393d[CPU0 ] Nested walk for guest paddr 0x0000000a8000
    07119718393d[CPU0 ] PAE PTE: non-executable page fault occured
    07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1

    Wow, so it enters SMM mode under VMM ? Really pathological case.
    So guest is in real mode with PAGING OFF actually but its memory accesses translated through nested walk. I wonder it got to this stage at all :)
    It would be nice to know if access to 0x0000000a8000 is read/write or execute. You might print the rw and is_page_walk variables together with

    BX_DEBUG(("Nested walk for guest paddr 0x" FMT_PHY_ADDRX, guest_paddr));

     
  • Stanislav Shwartsman

    Actually Intel turns off VMX when enters SMM:

    // Enter SMM
    // save the following internal to the processor:
    // * CR4.VMXE
    // * an indication of whether the logical processor was in VMX operation (root or non-root)
    // IF the logical processor is in VMX operation
    // THEN
    // leave VMX operation;
    // save VMX-critical state defined below;
    // preserve current VMCS pointer as noted below;
    // FI;
    // CR4.VMXE = 0;

    BX_CPU_THIS_PTR cr4.set_VMXE(0);
    BX_CPU_THIS_PTR in_smm_vmx = BX_CPU_THIS_PTR in_vmx;
    BX_CPU_THIS_PTR in_smm_vmx_guest = BX_CPU_THIS_PTR in_vmx_guest;
    BX_CPU_THIS_PTR in_vmx = 0;
    BX_CPU_THIS_PTR in_vmx_guest = 0;

    BX_INFO(("enter_system_management_mode: temporary disable VMX while in SMM mode"));

    // perform ordinary SMI delivery:
    // * save processor state in SMRAM;
    // * set processor state to standard SMM values

    AMD must have smth similar for SMM I guess ...

     
  • Stanislav Shwartsman

    It would be nice to know where the SMI came from here ...
    AMD's doc explains in 15.22.2 Response to SMI how SMI should be treated assuming it is external/internal. I believe SMI sent through xAPIC would be considered internal while received from device external. But I actually don't care.
    Might be what is important is how to intercept that.
    if intercept SMI is set -> we should VMEXIT(SMI) otherwise take it normally.
    Seems like "take it normally" is broken here so let's hope it should be intercepted.
    In event.cc

    if (is_unmasked_event_pending(BX_EVENT_SMI) && SVM_GIF)
    {

    #if BX_SUPPORT_SVM
        if (BX_CPU_THIS_PTR in_svm_guest) {
          if (SVM_INTERCEPT(SVM_INTERCEPT0_SMI)) Svm_Vmexit(SVM_VMEXIT_SMI);
        }
    #endif
    
    clear_event(BX_EVENT_SMI); // clear SMI pending flag
    enter_system_management_mode(); // would disable NMI when SMM was accepted
    

    }

     
  • Christopher

    Christopher - 2021-04-27

    I checked with your changes in event.cc, it doesn't have the intercept set so it doesn't vmexit, it goes the "take it normally" route.

    Also:
    - The SMI comes via apic_bus_deliver_smi called via iodev/acpi.cc in generate_smi with the value 0xf1
    - When doing the problematic page walk, status is: GUEST_NXE:0, HOST_NXE:1, IN_SVM:1, IN_SMM:1, RW:2, is_page_walk:0

     
  • Stanislav Shwartsman

    Let's try this hack, ignore the EXECUTE attribute for accesses in SMM under SVM.
    Attached patch does it:

    Index: paging.cc
    ===================================================================
    --- paging.cc   (revision 14232)
    +++ paging.cc   (working copy)
    @@ -1381,6 +1381,9 @@
     #endif
     #if BX_SUPPORT_SVM
       if (BX_CPU_THIS_PTR in_svm_guest && SVM_NESTED_PAGING_ENABLED) {
    +    // hack: ignore isExecute attribute in SMM mode under SVM virtualization
    +    if (BX_CPU_THIS_PTR in_smm && rw == BX_EXECUTE) rw = BX_READ;
    +
         paddress = nested_walk(paddress, rw, 0);
       }
     #endif
    

    It guess it would help. If it is - I will push it into repository.

     
  • Christopher

    Christopher - 2021-04-27

    Yeah the hack works, Hyper-V AMD boots fine with that alongside the Guest EFER.SVME change I mentioned earlier too.

     
  • Stanislav Shwartsman

    About the EFER.SVME change -> in enter_system_management_mode() instead of clearing EFER let's keep SVME bit if set:

    #if BX_CPU_LEVEL >= 5
     if (BX_CPU_THIS_PTR efer.get_SVME)
       BX_CPU_THIS_PTR efer.set32(BX_EFER_SVME_MASK);
      else
       BX_CPU_THIS_PTR efer.set32(0);
    #endif
    

    Real software not supposed to touch EFER.SVME while SVM is active so I believe the only place that was clearing it was SMM enter again.

     
  • Christopher

    Christopher - 2021-04-27

    Yeah that works

     
  • Stanislav Shwartsman

    So there are no w/a required anymore, it works ?
    Thank you for such detailed report and help in debugging !

     
  • Christopher

    Christopher - 2021-04-27

    Yep, with these changes AMD Hyper-V boots and works!

    Thanks for you assistance with finding the bugs and patches

     
  • Christopher

    Christopher - 2021-05-24

    Hey man,

    New issue to fix, this one is why I hate C lol.

    in SVM.cc you have SvmInterceptMSR, theres a bunch of if and else if statements like:

    if (msr <= 0x1fff) msr_map_offset = 0;
    else if (msr >= 0xc0000000 && msr <= 0xc0001fff) msr_map_offset = 2048;
    else if (msr >= 0xc0010000 && msr <= 0xc0011fff) msr_map_offset = 4096;

    Theres a problem here, specifically the double conditions inside the brackets like

    "msr >= 0xc0000000 && msr <= 0xc0001fff"

    Its not being calculated properly because you need inner brackets around these seperate statements so the && works accurately.

    As it currently stands, values like 0x40000070 will match the first else if statement, even though it shouldn't!

    To fix this, you need to add the inner brackets on the else if statements like so:

    "(msr >= 0xc0000000) && (msr <= 0xc0001fff)"

    Doing the brackets for each else if statement will fix this issue.

     
  • Christopher

    Christopher - 2021-05-24

    Actually no, I'm so confused.

    When guest does wrmsr to 0x40000071, MSR is 0x40000071.

    However somehow when I check "if(msr == 0x40000071)" the if fails?

    And I know MSR is 0x40000071 from BX_INFO printing it, yet somehow it also passes the checks "else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))" and enters this if?

    When it shouldn't? So confused whats going on here at the moment haha.

     
  • Christopher

    Christopher - 2021-05-25

    Ok I'm kinda getting closer.

    What's happening is first the MSR KernelGSBase (0xC0000102) is being accessed and this is in the MSR bitmap and NOT being intercepted, so that occurs in the guest without VMEXIT.

    Now when the MSR 0x40000071 follows, this somehow leads to a memory access exception (still need to trace where exactly).

    But if I intercept all MSRs, meaning the KernelGSBase MSR leads to a VMEXIT instead, then it works fine and the following 0x40000071 MSR VMEXITs and doesn't cause any more memory access exception.

    I guess something with handling the KernelGSBase MSR in a SVM guest when its not intercepted is messing something up, will continue to look into it but just logging this here.

     
  • Stanislav Shwartsman

    Seems like all required parts were merged and issue can be closed

     
  • Stanislav Shwartsman

    • status: open --> closed
    • assigned_to: Stanislav Shwartsman
    • Group: can't_reproduce --> fixed_in_SVN
     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.