Bochs x86 PC emulator / Bugs / #1428 AMD SVM Hyper-V fails (bug)

Stanislav Shwartsman - 2021-04-21

Were all these changes necessary ?
Could you ignore first one and see if it matters for example ?
I will try to look on other changes you made.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Christopher - 2021-04-21
  
  Yeah all the changes were necessary, reasons being:
  
  For the paging.cc nested_walk change, without this change as mentioned the exitinfo1 was like 0000000200000004 and when this went to hyper-v, it must have injected an exception or something because code execution jumps to BSOD in guest, but with this change it doesn't and the exitinfo1 becomes 0000000100000004 and the guest continues. This code is hit when the guest attempts to read addresses like 0xfee00320 and 0xfee00340
  
  For the paging.cc *nx_fault change, this is required because the guest does a "out dx, al" instructions (dx being 0xb2, al = 0xe1) and this causes the *nx_fault line to be hit which then causes vmexit to hyper-v to handle it and hyper-v dies.
  
  The guest.efer check change is required because somehow this gets changed at some point and causes future VMRUNs to fail, which leads to the hypervisor crashing.
  
  Last edit: Christopher 2021-04-21
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Change that paging.cc line ~1384 from nested_walk(paddress, rw, 0); to nested_walk(paddress, rw, 1);

Found a bug here. Fix is:

void BX_CPU_C::nested_page_fault(unsigned fault, bx_phy_address guest_paddr, unsigned rw, unsigned is_page_walk)
{
  unsigned isWrite = rw & 1;

  Bit64u error_code = fault | (1 << 2) | (isWrite << 1);
  if (rw == BX_EXECUTE)
    error_code |= ERROR_CODE_ACCESS; // I/D = 1

  if (is_page_walk)
        error_code |= BX_CONST64(1) << 33;// was 32
  else
    error_code |= BX_CONST64(1) << 32;// was 33

  Svm_Vmexit(SVM_VMEXIT_NPF, error_code, guest_paddr);
}

Let me know if helps.

Stanislav Shwartsman - 2021-04-22

For NX fault problem I have a guess to verify.
What is the value of EFER.NXE for both guest and host there ?

A page is considered executable by the guest only if it is marked executable at both the guest and
nested page table levels. If the EFER.NXE bit is cleared for the guest, all guest pages are executable at the guest level. Similarly, if the EFER.NXE bit is cleared for the host, all nested page table mappings are executable at the underlying nested level.

This is definitelly not what Bochs does. I am still struggling to understand the exact meaning of it. But it would be nice to confirm if this is the issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Christopher - 2021-04-23
  
  Hey, so when the NX fault occurs, the EFER.NXE status for guest and host are:
  GUEST_NXE:0, HOST_NXE:1
  
  Also yes your fix for the nested_page_fault issue above seems good
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-26

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-26

I am still wondering what is going on with nx_fault=1
It seems like there is indeed page walk which hit this NX fault.
True NX fault, i.e. page was translated for execute access and NX bit was set on the way. And it was not reserved because of GUEST::EFER.NXE=0 so it happened on host,
Can you may catch that walk and show what is going on there ?
You may enable DEBUG=all right before that or so and have a lot of debug into printed about paging into the log file. Want to understand what is missing, what kind of NX is present in page tables and why it should not be taken into account.
May be while translating execute access for guest EPT walk for stuffed loads should not be treated as EXECUTE ? I don't know, AMD manual is very unpleasant to use :(

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-04-26

I'll look into it more tomorrow too, but for now I can provide a dump of the debug prints right before it happens (with some extras in there):

07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
07119718387d[CPU0 ] Nested walk for guest paddr 0x000004605000
07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
07119718387d[CPU0 ] Nested walk for guest paddr 0x000004606070
07119718387i[CPU0 ] GUEST_NXE:1, HOST_NXE:1, IN_SVM:1
07119718387d[CPU0 ] Nested walk for guest paddr 0x00000010e04c
07119718393i[CPU0 ] Enter to System Management Mode
07119718393i[CPU0 ] enter_system_management_mode: temporary disable VMX while in SMM mode
07119718393d[CPU0 ] real mode activated
07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
07119718393d[CPU0 ] Nested walk for guest paddr 0x0000000a8000
07119718393d[CPU0 ] PAE PTE: non-executable page fault occured
07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
^ Error here

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-26

07119718393i[CPU0 ] Enter to System Management Mode
07119718393i[CPU0 ] enter_system_management_mode: temporary disable VMX while in SMM mode
07119718393d[CPU0 ] real mode activated
07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1
07119718393d[CPU0 ] Nested walk for guest paddr 0x0000000a8000
07119718393d[CPU0 ] PAE PTE: non-executable page fault occured
07119718393i[CPU0 ] GUEST_NXE:0, HOST_NXE:1, IN_SVM:1

Wow, so it enters SMM mode under VMM ? Really pathological case.
So guest is in real mode with PAGING OFF actually but its memory accesses translated through nested walk. I wonder it got to this stage at all :)
It would be nice to know if access to 0x0000000a8000 is read/write or execute. You might print the rw and is_page_walk variables together with

BX_DEBUG(("Nested walk for guest paddr 0x" FMT_PHY_ADDRX, guest_paddr));

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-26

Actually Intel turns off VMX when enters SMM:

// Enter SMM
// save the following internal to the processor:
// * CR4.VMXE
// * an indication of whether the logical processor was in VMX operation (root or non-root)
// IF the logical processor is in VMX operation
// THEN
// leave VMX operation;
// save VMX-critical state defined below;
// preserve current VMCS pointer as noted below;
// FI;
// CR4.VMXE = 0;

BX_CPU_THIS_PTR cr4.set_VMXE(0);
BX_CPU_THIS_PTR in_smm_vmx = BX_CPU_THIS_PTR in_vmx;
BX_CPU_THIS_PTR in_smm_vmx_guest = BX_CPU_THIS_PTR in_vmx_guest;
BX_CPU_THIS_PTR in_vmx = 0;
BX_CPU_THIS_PTR in_vmx_guest = 0;

BX_INFO(("enter_system_management_mode: temporary disable VMX while in SMM mode"));

// perform ordinary SMI delivery:
// * save processor state in SMRAM;
// * set processor state to standard SMM values

AMD must have smth similar for SMM I guess ...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-26

It would be nice to know where the SMI came from here ...
AMD's doc explains in 15.22.2 Response to SMI how SMI should be treated assuming it is external/internal. I believe SMI sent through xAPIC would be considered internal while received from device external. But I actually don't care.
Might be what is important is how to intercept that.
if intercept SMI is set -> we should VMEXIT(SMI) otherwise take it normally.
Seems like "take it normally" is broken here so let's hope it should be intercepted.
In event.cc

if (is_unmasked_event_pending(BX_EVENT_SMI) && SVM_GIF)
{

#if BX_SUPPORT_SVM if (BX_CPU_THIS_PTR in_svm_guest) { if (SVM_INTERCEPT(SVM_INTERCEPT0_SMI)) Svm_Vmexit(SVM_VMEXIT_SMI); } #endif

clear_event(BX_EVENT_SMI); // clear SMI pending flag enter_system_management_mode(); // would disable NMI when SMM was accepted

}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-04-27

I checked with your changes in event.cc, it doesn't have the intercept set so it doesn't vmexit, it goes the "take it normally" route.

Also:
- The SMI comes via apic_bus_deliver_smi called via iodev/acpi.cc in generate_smi with the value 0xf1
- When doing the problematic page walk, status is: GUEST_NXE:0, HOST_NXE:1, IN_SVM:1, IN_SMM:1, RW:2, is_page_walk:0

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Let's try this hack, ignore the EXECUTE attribute for accesses in SMM under SVM.
Attached patch does it:

Index: paging.cc
===================================================================
--- paging.cc   (revision 14232)
+++ paging.cc   (working copy)
@@ -1381,6 +1381,9 @@
 #endif
 #if BX_SUPPORT_SVM
   if (BX_CPU_THIS_PTR in_svm_guest && SVM_NESTED_PAGING_ENABLED) {
+    // hack: ignore isExecute attribute in SMM mode under SVM virtualization
+    if (BX_CPU_THIS_PTR in_smm && rw == BX_EXECUTE) rw = BX_READ;
+
     paddress = nested_walk(paddress, rw, 0);
   }
 #endif

It guess it would help. If it is - I will push it into repository.

Christopher - 2021-04-27

Yeah the hack works, Hyper-V AMD boots fine with that alongside the Guest EFER.SVME change I mentioned earlier too.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-27

About the EFER.SVME change -> in enter_system_management_mode() instead of clearing EFER let's keep SVME bit if set:

#if BX_CPU_LEVEL >= 5 if (BX_CPU_THIS_PTR efer.get_SVME) BX_CPU_THIS_PTR efer.set32(BX_EFER_SVME_MASK); else BX_CPU_THIS_PTR efer.set32(0); #endif

Real software not supposed to touch EFER.SVME while SVM is active so I believe the only place that was clearing it was SMM enter again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-04-27

Yeah that works

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2021-04-27

So there are no w/a required anymore, it works ?
Thank you for such detailed report and help in debugging !

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-04-27

Yep, with these changes AMD Hyper-V boots and works!

Thanks for you assistance with finding the bugs and patches

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-05-24

Hey man,

New issue to fix, this one is why I hate C lol.

in SVM.cc you have SvmInterceptMSR, theres a bunch of if and else if statements like:

if (msr <= 0x1fff) msr_map_offset = 0;
else if (msr >= 0xc0000000 && msr <= 0xc0001fff) msr_map_offset = 2048;
else if (msr >= 0xc0010000 && msr <= 0xc0011fff) msr_map_offset = 4096;

Theres a problem here, specifically the double conditions inside the brackets like

"msr >= 0xc0000000 && msr <= 0xc0001fff"

Its not being calculated properly because you need inner brackets around these seperate statements so the && works accurately.

As it currently stands, values like 0x40000070 will match the first else if statement, even though it shouldn't!

To fix this, you need to add the inner brackets on the else if statements like so:

"(msr >= 0xc0000000) && (msr <= 0xc0001fff)"

Doing the brackets for each else if statement will fix this issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-05-24

Actually no, I'm so confused.

When guest does wrmsr to 0x40000071, MSR is 0x40000071.

However somehow when I check "if(msr == 0x40000071)" the if fails?

And I know MSR is 0x40000071 from BX_INFO printing it, yet somehow it also passes the checks "else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))" and enters this if?

When it shouldn't? So confused whats going on here at the moment haha.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christopher - 2021-05-25

Ok I'm kinda getting closer.

What's happening is first the MSR KernelGSBase (0xC0000102) is being accessed and this is in the MSR bitmap and NOT being intercepted, so that occurs in the guest without VMEXIT.

Now when the MSR 0x40000071 follows, this somehow leads to a memory access exception (still need to trace where exactly).

But if I intercept all MSRs, meaning the KernelGSBase MSR leads to a VMEXIT instead, then it works fine and the following 0x40000071 MSR VMEXITs and doesn't cause any more memory access exception.

I guess something with handling the KernelGSBase MSR in a SVM guest when its not intercepted is messing something up, will continue to look into it but just logging this here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2022-08-13

Seems like all required parts were merged and issue can be closed

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stanislav Shwartsman - 2022-08-13

status: open --> closed

assigned_to: Stanislav Shwartsman

Group: can't_reproduce --> fixed_in_SVN
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AMD SVM Hyper-V fails (bug)

Group

Searches

Help

#1428 AMD SVM Hyper-V fails (bug)

Discussion