Thread: [kvm-devel] WARN_ON in kvm_queue_exception_e triggers

Brought to you by: avik, mtosatti

kvm-devel

[kvm-devel] WARN_ON in kvm_queue_exception_e triggers

From: Jan K. <jan...@si...> - 2008-04-28 17:35:32

Hi,

sorry, the test environment is not really reproducible (stock kvm-66,
yet unpublished NMI support by Sheng Yang and me, special guest), but
I'm just fishing for some ideas on what may cause the flood of the
following warning in my kernel log:

------------[ cut here ]------------
WARNING: at /data/kvm-66/kernel/x86.c:180
kvm_queue_exception_e+0x30/0x54 [kvm]()
Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG
nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss
snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp
ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter
ip6_tables cpufreq_conservative x_tables cpufreq_userspace
cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm
l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia
firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery
bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp
rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib
yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801
rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev
sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher
usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot
dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal
processor
Pid: 4718, comm: qemu-system-x86 Tainted: P        N
2.6.25-rc5-git2-109.8-default #1

Call Trace:
 [<ffffffff8020d826>] dump_trace+0xc4/0x576
 [<ffffffff8020dd18>] show_trace+0x40/0x57
 [<ffffffff8044e341>] _etext+0x72/0x7b
 [<ffffffff80238137>] warn_on_slowpath+0x58/0x80
 [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54
 [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a
 [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b
 [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c
 [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686
 [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca
 [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78
 [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261
 [<ffffffff802ad3be>] sys_ioctl+0x55/0x77
 [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f
 [<00007faed2969267>]

---[ end trace 5d286714f3c5c50f ]---

I'm suspecting that it is the way our guest raises a triple fault in
order to initiate a restart. At least it tells us via virtual console
that it wants to restart, and those messages start around the same time.
So, while waiting for my colleagues to dig out the precise triple-fault
code pattern (for a cleaner test case), maybe someone could comment on
potential reasons for this warning - or even ways to resolve them.

Thanks!
Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

Re: [kvm-devel] WARN_ON in kvm_queue_exception_e triggers

From: Joerg R. <joe...@am...> - 2008-04-28 20:43:17

Attachments: test-fix.patch

On Mon, Apr 28, 2008 at 07:35:10PM +0200, Jan Kiszka wrote:
> Hi,
> 
> sorry, the test environment is not really reproducible (stock kvm-66,
> yet unpublished NMI support by Sheng Yang and me, special guest), but
> I'm just fishing for some ideas on what may cause the flood of the
> following warning in my kernel log:
> 
> ------------[ cut here ]------------
> WARNING: at /data/kvm-66/kernel/x86.c:180
> kvm_queue_exception_e+0x30/0x54 [kvm]()
> Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG
> nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss
> snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp
> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
> ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter
> ip6_tables cpufreq_conservative x_tables cpufreq_userspace
> cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm
> l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia
> firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery
> bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp
> rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib
> yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801
> rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev
> sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher
> usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot
> dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal
> processor
> Pid: 4718, comm: qemu-system-x86 Tainted: P        N
> 2.6.25-rc5-git2-109.8-default #1
> 
> Call Trace:
>  [<ffffffff8020d826>] dump_trace+0xc4/0x576
>  [<ffffffff8020dd18>] show_trace+0x40/0x57
>  [<ffffffff8044e341>] _etext+0x72/0x7b
>  [<ffffffff80238137>] warn_on_slowpath+0x58/0x80
>  [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54
>  [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a
>  [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b
>  [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c
>  [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686
>  [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca
>  [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78
>  [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261
>  [<ffffffff802ad3be>] sys_ioctl+0x55/0x77
>  [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f
>  [<00007faed2969267>]
> 
> ---[ end trace 5d286714f3c5c50f ]---

Hmm, seems we have to check for DF and triple faults in the
kvm_queue_exception functions too. Does the attached patch fix the
problem (patch is against kvm-66).

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

Re: [kvm-devel] WARN_ON in kvm_queue_exception_e triggers

From: Jan K. <jan...@si...> - 2008-04-29 08:38:35

Joerg Roedel wrote:
> On Mon, Apr 28, 2008 at 07:35:10PM +0200, Jan Kiszka wrote:
>> Hi,
>>
>> sorry, the test environment is not really reproducible (stock kvm-66,
>> yet unpublished NMI support by Sheng Yang and me, special guest), but
>> I'm just fishing for some ideas on what may cause the flood of the
>> following warning in my kernel log:
>>
>> ------------[ cut here ]------------
>> WARNING: at /data/kvm-66/kernel/x86.c:180
>> kvm_queue_exception_e+0x30/0x54 [kvm]()
>> Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG
>> nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss
>> snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp
>> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
>> ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter
>> ip6_tables cpufreq_conservative x_tables cpufreq_userspace
>> cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm
>> l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia
>> firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery
>> bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp
>> rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib
>> yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801
>> rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev
>> sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher
>> usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot
>> dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal
>> processor
>> Pid: 4718, comm: qemu-system-x86 Tainted: P        N
>> 2.6.25-rc5-git2-109.8-default #1
>>
>> Call Trace:
>>  [<ffffffff8020d826>] dump_trace+0xc4/0x576
>>  [<ffffffff8020dd18>] show_trace+0x40/0x57
>>  [<ffffffff8044e341>] _etext+0x72/0x7b
>>  [<ffffffff80238137>] warn_on_slowpath+0x58/0x80
>>  [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54
>>  [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a
>>  [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b
>>  [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c
>>  [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686
>>  [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca
>>  [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78
>>  [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261
>>  [<ffffffff802ad3be>] sys_ioctl+0x55/0x77
>>  [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f
>>  [<00007faed2969267>]
>>
>> ---[ end trace 5d286714f3c5c50f ]---
> 
> Hmm, seems we have to check for DF and triple faults in the
> kvm_queue_exception functions too. Does the attached patch fix the
> problem (patch is against kvm-66).

Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But
then is stumbles and falls probably over some inconsistent system state:

exception 13 (43)
rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633
rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000
r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
rip 000000000000fff0 rflags 00033002
cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
gdt 0/ffff
idt 0/ffff
cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Looks like trying to execute the first instruction after reset is already unsuccessful. As the tr selector is non-zero here, I already tried a kvm_arch_reset_cpu-hack along the line that sets KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check?

Note that this does not happen when I raise a reset via the monitor.

BTW, kvm_show_code() does not seem to provide correct informations, even when I add it right before the first kvm_run().

Jan


(*) There is just a bit noise left behind in the syslog:

kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
kvm: inject_page_fault: double fault
kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
handle_exception: unexpected, vectoring info 0x80000b08 intr info 0x80000b0d

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

Re: [kvm-devel] WARN_ON in kvm_queue_exception_e triggers

From: Joerg R. <joe...@am...> - 2008-04-29 10:05:03

On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote:
> Joerg Roedel wrote:
> > Hmm, seems we have to check for DF and triple faults in the
> > kvm_queue_exception functions too. Does the attached patch fix the
> > problem (patch is against kvm-66).
> 
> Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But
> then is stumbles and falls probably over some inconsistent system state:
> 
> exception 13 (43)
> rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633
> rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000
> r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
> r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
> rip 000000000000fff0 rflags 00033002
> cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
> ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
> gdt 0/ffff
> idt 0/ffff
> cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
> code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> Looks like trying to execute the first instruction after reset is
> already unsuccessful. As the tr selector is non-zero here, I already
> tried a kvm_arch_reset_cpu-hack along the line that sets
> KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check?

Its weird to me what triggers the taskswitch. What guest operating
system are you running and what is the qemu/kvm command line to start
the guest?

> Note that this does not happen when I raise a reset via the monitor.
> 
> BTW, kvm_show_code() does not seem to provide correct informations,
> even when I add it right before the first kvm_run().

When the guest state is messed up the information may be incorrect.

> (*) There is just a bit noise left behind in the syslog:
> 
> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9

Reason 0x9 is the taskswitch intercept.

> kvm: inject_page_fault: double fault

This is expected from the patch I sent you.

Joerg

-- 
           |           AMD Saxony Limited Liability Company & Co. KG
 Operating |         Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System    |                  Register Court Dresden: HRA 4896
 Research  |              General Partner authorized to represent:
 Center    |             AMD Saxony LLC (Wilmington, Delaware, US)
           | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy

Re: [kvm-devel] WARN_ON in kvm_queue_exception_e triggers

From: Jan K. <jan...@si...> - 2008-04-29 10:34:40

Joerg Roedel wrote:
> On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote:
>> Joerg Roedel wrote:
>>> Hmm, seems we have to check for DF and triple faults in the
>>> kvm_queue_exception functions too. Does the attached patch fix the
>>> problem (patch is against kvm-66).
>> Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But
>> then is stumbles and falls probably over some inconsistent system state:
>>
>> exception 13 (43)
>> rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633
>> rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000
>> r8  0000000000000000 r9  0000000000000000 r10 0000000000000000 r11 0000000000000000
>> r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
>> rip 000000000000fff0 rflags 00033002
>> cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
>> tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
>> ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
>> gdt 0/ffff
>> idt 0/ffff
>> cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
>> code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>
>> Looks like trying to execute the first instruction after reset is
>> already unsuccessful. As the tr selector is non-zero here, I already
>> tried a kvm_arch_reset_cpu-hack along the line that sets
>> KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check?
> 
> Its weird to me what triggers the taskswitch. What guest operating

It is the guest, looking for a soft-restart (after it detected some
other error - now our main problem).

> system are you running and what is the qemu/kvm command line to start
> the guest?

Well, the guest is a proprietary OS of our customer, running in 16-bit
protected mode with a lot of segment shuffling. Due to this and also
some special hardware emulations, the current test case is not portable.
So I'm looking for input on where to dig and what to try.

Note that I ran the very same test with -no-kvm, and here we do not get
those post-reset GPF (provided that some reset-on-triple-fault patch is
applied to avoid the abort(), e.g. [1]).

> 
>> Note that this does not happen when I raise a reset via the monitor.
>>
>> BTW, kvm_show_code() does not seem to provide correct informations,
>> even when I add it right before the first kvm_run().
> 
> When the guest state is messed up the information may be incorrect.

I don't expect the guest state to be messed up right before the very
first guest code execution (that's where kvm_show_code() also reported
zeros)... :->

> 
>> (*) There is just a bit noise left behind in the syslog:
>>
>> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
> 
> Reason 0x9 is the taskswitch intercept.
> 
>> kvm: inject_page_fault: double fault
> 
> This is expected from the patch I sent you.

For sure. I would just suggest to rethink if a final version should
still issue such warnings. We basically had the same discussion on
qemu-devel around the reset-on-triple-fault patch (which is
unfortunately still not finalized :-/).

Jan

[1] http://permalink.gmane.org/gmane.comp.emulators.qemu/24475

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux