|
From: Jan K. <jan...@si...> - 2008-04-28 17:35:32
|
Hi, sorry, the test environment is not really reproducible (stock kvm-66, yet unpublished NMI support by Sheng Yang and me, special guest), but I'm just fishing for some ideas on what may cause the flood of the following warning in my kernel log: ------------[ cut here ]------------ WARNING: at /data/kvm-66/kernel/x86.c:180 kvm_queue_exception_e+0x30/0x54 [kvm]() Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter ip6_tables cpufreq_conservative x_tables cpufreq_userspace cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801 rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal processor Pid: 4718, comm: qemu-system-x86 Tainted: P N 2.6.25-rc5-git2-109.8-default #1 Call Trace: [<ffffffff8020d826>] dump_trace+0xc4/0x576 [<ffffffff8020dd18>] show_trace+0x40/0x57 [<ffffffff8044e341>] _etext+0x72/0x7b [<ffffffff80238137>] warn_on_slowpath+0x58/0x80 [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54 [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686 [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78 [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261 [<ffffffff802ad3be>] sys_ioctl+0x55/0x77 [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f [<00007faed2969267>] ---[ end trace 5d286714f3c5c50f ]--- I'm suspecting that it is the way our guest raises a triple fault in order to initiate a restart. At least it tells us via virtual console that it wants to restart, and those messages start around the same time. So, while waiting for my colleagues to dig out the precise triple-fault code pattern (for a cleaner test case), maybe someone could comment on potential reasons for this warning - or even ways to resolve them. Thanks! Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
|
From: Joerg R. <joe...@am...> - 2008-04-28 20:43:17
Attachments:
test-fix.patch
|
On Mon, Apr 28, 2008 at 07:35:10PM +0200, Jan Kiszka wrote:
> Hi,
>
> sorry, the test environment is not really reproducible (stock kvm-66,
> yet unpublished NMI support by Sheng Yang and me, special guest), but
> I'm just fishing for some ideas on what may cause the flood of the
> following warning in my kernel log:
>
> ------------[ cut here ]------------
> WARNING: at /data/kvm-66/kernel/x86.c:180
> kvm_queue_exception_e+0x30/0x54 [kvm]()
> Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG
> nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss
> snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp
> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
> ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter
> ip6_tables cpufreq_conservative x_tables cpufreq_userspace
> cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm
> l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia
> firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery
> bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp
> rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib
> yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801
> rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev
> sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher
> usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot
> dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal
> processor
> Pid: 4718, comm: qemu-system-x86 Tainted: P N
> 2.6.25-rc5-git2-109.8-default #1
>
> Call Trace:
> [<ffffffff8020d826>] dump_trace+0xc4/0x576
> [<ffffffff8020dd18>] show_trace+0x40/0x57
> [<ffffffff8044e341>] _etext+0x72/0x7b
> [<ffffffff80238137>] warn_on_slowpath+0x58/0x80
> [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54
> [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a
> [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b
> [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c
> [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686
> [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca
> [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78
> [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261
> [<ffffffff802ad3be>] sys_ioctl+0x55/0x77
> [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f
> [<00007faed2969267>]
>
> ---[ end trace 5d286714f3c5c50f ]---
Hmm, seems we have to check for DF and triple faults in the
kvm_queue_exception functions too. Does the attached patch fix the
problem (patch is against kvm-66).
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
|
|
From: Jan K. <jan...@si...> - 2008-04-29 08:38:35
|
Joerg Roedel wrote: > On Mon, Apr 28, 2008 at 07:35:10PM +0200, Jan Kiszka wrote: >> Hi, >> >> sorry, the test environment is not really reproducible (stock kvm-66, >> yet unpublished NMI support by Sheng Yang and me, special guest), but >> I'm just fishing for some ideas on what may cause the flood of the >> following warning in my kernel log: >> >> ------------[ cut here ]------------ >> WARNING: at /data/kvm-66/kernel/x86.c:180 >> kvm_queue_exception_e+0x30/0x54 [kvm]() >> Modules linked in: ipt_MASQUERADE kvm_intel kvm bridge tun ip6t_LOG >> nf_conntrack_ipv6 xt_pkttype ipt_LOG xt_limit snd_pcm_oss snd_mixer_oss >> snd_seq snd_seq_device nls_utf8 cifs af_packet ip6t_REJECT xt_tcpudp >> ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter >> ip6table_mangle nf_conntrack_ipv4 nf_conntrack ip_tables ip6table_filter >> ip6_tables cpufreq_conservative x_tables cpufreq_userspace >> cpufreq_powersave acpi_cpufreq ipv6 microcode fuse ohci_hcd loop rfcomm >> l2cap wlan_scan_sta ath_rate_sample ath_pci snd_hda_intel wlan pcmcia >> firmware_class hci_usb snd_pcm snd_timer ath_hal(P) sdhci battery >> bluetooth button ohci1394 mmc_core rtc_cmos parport_pc intel_agp >> rtc_core dock ac snd_page_alloc iTCO_wdt ieee1394 sky2 rtc_lib >> yenta_socket parport snd_hwdep snd iTCO_vendor_support i2c_i801 >> rsrc_nonstatic pcmcia_core sg i2c_core soundcore serio_raw joydev >> sha256_generic aes_x86_64 aes_generic cbc dm_crypt crypto_blkcipher >> usbhid hid ff_memless sd_mod ehci_hcd uhci_hcd usbcore dm_snapshot >> dm_mod edd ext3 mbcache jbd fan ata_piix ahci libata scsi_mod thermal >> processor >> Pid: 4718, comm: qemu-system-x86 Tainted: P N >> 2.6.25-rc5-git2-109.8-default #1 >> >> Call Trace: >> [<ffffffff8020d826>] dump_trace+0xc4/0x576 >> [<ffffffff8020dd18>] show_trace+0x40/0x57 >> [<ffffffff8044e341>] _etext+0x72/0x7b >> [<ffffffff80238137>] warn_on_slowpath+0x58/0x80 >> [<ffffffff886e2e05>] :kvm:kvm_queue_exception_e+0x30/0x54 >> [<ffffffff886e3678>] :kvm:kvm_task_switch+0xca/0x20a >> [<ffffffff8870d096>] :kvm_intel:handle_task_switch+0x19/0x1b >> [<ffffffff8870cb1b>] :kvm_intel:kvm_handle_exit+0x7f/0x9c >> [<ffffffff886e51e2>] :kvm:kvm_arch_vcpu_ioctl_run+0x49b/0x686 >> [<ffffffff886e08c9>] :kvm:kvm_vcpu_ioctl+0xf7/0x3ca >> [<ffffffff802ad0ba>] vfs_ioctl+0x2a/0x78 >> [<ffffffff802ad34f>] do_vfs_ioctl+0x247/0x261 >> [<ffffffff802ad3be>] sys_ioctl+0x55/0x77 >> [<ffffffff8020c18a>] system_call_after_swapgs+0x8a/0x8f >> [<00007faed2969267>] >> >> ---[ end trace 5d286714f3c5c50f ]--- > > Hmm, seems we have to check for DF and triple faults in the > kvm_queue_exception functions too. Does the attached patch fix the > problem (patch is against kvm-66). Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But then is stumbles and falls probably over some inconsistent system state: exception 13 (43) rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633 rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000 r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 rip 000000000000fff0 rflags 00033002 cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) gdt 0/ffff idt 0/ffff cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Looks like trying to execute the first instruction after reset is already unsuccessful. As the tr selector is non-zero here, I already tried a kvm_arch_reset_cpu-hack along the line that sets KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check? Note that this does not happen when I raise a reset via the monitor. BTW, kvm_show_code() does not seem to provide correct informations, even when I add it right before the first kvm_run(). Jan (*) There is just a bit noise left behind in the syslog: kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 kvm: inject_page_fault: double fault kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 handle_exception: unexpected, vectoring info 0x80000b08 intr info 0x80000b0d -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |
|
From: Joerg R. <joe...@am...> - 2008-04-29 10:05:03
|
On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote:
> Joerg Roedel wrote:
> > Hmm, seems we have to check for DF and triple faults in the
> > kvm_queue_exception functions too. Does the attached patch fix the
> > problem (patch is against kvm-66).
>
> Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But
> then is stumbles and falls probably over some inconsistent system state:
>
> exception 13 (43)
> rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633
> rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000
> r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000
> r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000
> rip 000000000000fff0 rflags 00033002
> cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0)
> tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0)
> ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0)
> gdt 0/ffff
> idt 0/ffff
> cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0
> code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Looks like trying to execute the first instruction after reset is
> already unsuccessful. As the tr selector is non-zero here, I already
> tried a kvm_arch_reset_cpu-hack along the line that sets
> KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check?
Its weird to me what triggers the taskswitch. What guest operating
system are you running and what is the qemu/kvm command line to start
the guest?
> Note that this does not happen when I raise a reset via the monitor.
>
> BTW, kvm_show_code() does not seem to provide correct informations,
> even when I add it right before the first kvm_run().
When the guest state is messed up the information may be incorrect.
> (*) There is just a bit noise left behind in the syslog:
>
> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9
Reason 0x9 is the taskswitch intercept.
> kvm: inject_page_fault: double fault
This is expected from the patch I sent you.
Joerg
--
| AMD Saxony Limited Liability Company & Co. KG
Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
System | Register Court Dresden: HRA 4896
Research | General Partner authorized to represent:
Center | AMD Saxony LLC (Wilmington, Delaware, US)
| General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
|
|
From: Jan K. <jan...@si...> - 2008-04-29 10:34:40
|
Joerg Roedel wrote: > On Tue, Apr 29, 2008 at 10:38:41AM +0200, Jan Kiszka wrote: >> Joerg Roedel wrote: >>> Hmm, seems we have to check for DF and triple faults in the >>> kvm_queue_exception functions too. Does the attached patch fix the >>> problem (patch is against kvm-66). >> Thanks, it indeed fixes the warnings (*) and makes KVM issue a reset. But >> then is stumbles and falls probably over some inconsistent system state: >> >> exception 13 (43) >> rax 0000000000000000 rbx 0000000000000000 rcx 0000000000000000 rdx 0000000000000633 >> rsi 0000000000000000 rdi 0000000000000000 rsp 0000000000000000 rbp 0000000000000000 >> r8 0000000000000000 r9 0000000000000000 r10 0000000000000000 r11 0000000000000000 >> r12 0000000000000000 r13 0000000000000000 r14 0000000000000000 r15 0000000000000000 >> rip 000000000000fff0 rflags 00033002 >> cs f000 (000f0000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> ds 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> es 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> ss 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> fs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> gs 0000 (00000000/0000ffff p 1 dpl 3 db 0 s 1 type 3 l 0 g 0 avl 0) >> tr 0178 (fffbd000/00002088 p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) >> ldt 0000 (00000000/0000ffff p 1 dpl 0 db 0 s 0 type 2 l 0 g 0 avl 0) >> gdt 0/ffff >> idt 0/ffff >> cr0 60000010 cr2 0 cr3 0 cr4 0 cr8 0 efer 0 >> code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 --> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> >> Looks like trying to execute the first instruction after reset is >> already unsuccessful. As the tr selector is non-zero here, I already >> tried a kvm_arch_reset_cpu-hack along the line that sets >> KVM_REQ_TRIPLE_FAULT, but without success. Any idea what to check? > > Its weird to me what triggers the taskswitch. What guest operating It is the guest, looking for a soft-restart (after it detected some other error - now our main problem). > system are you running and what is the qemu/kvm command line to start > the guest? Well, the guest is a proprietary OS of our customer, running in 16-bit protected mode with a lot of segment shuffling. Due to this and also some special hardware emulations, the current test case is not portable. So I'm looking for input on where to dig and what to try. Note that I ran the very same test with -no-kvm, and here we do not get those post-reset GPF (provided that some reset-on-triple-fault patch is applied to avoid the abort(), e.g. [1]). > >> Note that this does not happen when I raise a reset via the monitor. >> >> BTW, kvm_show_code() does not seem to provide correct informations, >> even when I add it right before the first kvm_run(). > > When the guest state is messed up the information may be incorrect. I don't expect the guest state to be messed up right before the very first guest code execution (that's where kvm_show_code() also reported zeros)... :-> > >> (*) There is just a bit noise left behind in the syslog: >> >> kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 > > Reason 0x9 is the taskswitch intercept. > >> kvm: inject_page_fault: double fault > > This is expected from the patch I sent you. For sure. I would just suggest to rethink if a final version should still issue such warnings. We basically had the same discussion on qemu-devel around the reset-on-triple-fault patch (which is unfortunately still not finalized :-/). Jan [1] http://permalink.gmane.org/gmane.comp.emulators.qemu/24475 -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux |