Menu

#297 guest: device offline, then kernel panic

closed
nobody
None
5
2015-01-16
2008-08-08
No

host: kvm71, 64bit 2.6.18-92.1.6.el5, 16Gram, 2*X5450(8cores)
guest: 64bit 2.6.18-92.1.6.el5, 3.5Gram, 2cpus, 5hdds on raw partitions(!).

In the guest, i'm getting quite often messages like
kernel: sd 0:0:0:0: ABORT operation started.
kernel: sd 0:0:0:0: ABORT operation timed-out.
[many times like that]
[there was more messages concerning the device is offline, but I lost them, will update if it happens again]
then filesystem gets remounted read-only, then kernel panics with message(part of the message only, that's what i got on the screen)
FS: 0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000400000013 CR3: 0000000000201000 CR4: 00000000000006e0
Process sshd (pid: 23911, threadinfo ffff81006f53a000, task ffff8100dc2ca0c0)
Stack:
ffffffff800075dc
ffff8100dc1ba960
ffff8100dc1ba688
ffff810096b52300
ffff8100dd15acc0
ffff8100dc1ba758
ffff8100dc1ba758
ffff810003f2a680
ffffffff8000d11c
0000000000000008
0000000000000008
ffff8100dd15acc0
Call Trace:
[<ffffffff800075dc>] kmem_cache_free+0x13c/0x1dd
[<ffffffff8000d11c>] dput+0xf6/0x114
[<ffffffff800125f3>] __fput+0x16c/0x198
[<ffffffff8001a6a7>] remove_vma+0x3d/0x64
[<ffffffff80039c60>] exit_mmap+0xcf/0xf3
[<ffffffff8003bd73>] mmput+0x30/0x83
[<ffffffff800151b6>] do_exit+0x28b/0x8d0
[<ffffffff80048a1c>] cpuset_exit+0x0/0x6c
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
Code: f0 ff 0f 0f 88 6c 01 00 00 c3 f0 81 2f 00 00 00 01 74 05 e8
RIP [<ffffffff80064a2d>] _spin_lock+0x0/0xa
RSP <ffff81006f53be10>
CR2: 0000000400000013
<0>Kernel panic - not syncing: Fatal exception

Even though the kernel panic, the kvm process was still taking 100% CPU. gdb shows following info - no clue though if it's helpful in any way.

Thread 4 (Thread 1938626880 (LWP 17006)):

0 0x000000368bec6fa7 in ioctl () from /lib64/libc.so.6

1 0x000000000050f726 in kvm_run (kvm=0x11b15010, vcpu=0) at libkvm.c:903

2 0x00000000004e9426 in kvm_cpu_exec (env=<value optimized="" out="">) at /usr/src/kvm-71/qemu/qemu-kvm.c:218

3 0x00000000004e9700 in ap_main_loop (_env=<value optimized="" out="">) at /usr/src/kvm-71/qemu/qemu-kvm.c:407

4 0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0

5 0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 3 (Thread 1087498560 (LWP 17007)):

0 0x000000368bec6fa7 in ioctl () from /lib64/libc.so.6

1 0x000000000050f726 in kvm_run (kvm=0x11b15010, vcpu=1) at libkvm.c:903

2 0x00000000004e9426 in kvm_cpu_exec (env=<value optimized="" out="">) at /usr/src/kvm-71/qemu/qemu-kvm.c:218

3 0x00000000004e9700 in ap_main_loop (_env=<value optimized="" out="">) at /usr/src/kvm-71/qemu/qemu-kvm.c:407

4 0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0

5 0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 2 (Thread 1949133120 (LWP 17014)):

0 0x000000368ca0a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

1 0x0000003692202ee5 in handle_fildes_io () from /lib64/librt.so.1

2 0x000000368ca062e7 in start_thread () from /lib64/libpthread.so.0

3 0x000000368bece3bd in clone () from /lib64/libc.so.6

Thread 1 (Thread 47523282295136 (LWP 16990)):

0 0x000000368bec7922 in select () from /lib64/libc.so.6

1 0x00000000004094b2 in main_loop_wait (timeout=<value optimized="" out="">) at /usr/src/kvm-71/qemu/vl.c:7545

2 0x00000000004e9342 in kvm_main_loop () at /usr/src/kvm-71/qemu/qemu-kvm.c:587

3 0x0000000000411662 in main (argc=20, argv=0x7fffca7a9b38) at /usr/src/kvm-71/qemu/vl.c:7705

0 0x000000368bec7922 in select () from /lib64/libc.so.6

Discussion

  • Rafal Wijata

    Rafal Wijata - 2008-08-11

    Logged In: YES
    user_id=996150
    Originator: YES

    Update_1:
    while guest was panicking, I was able to see SEGV for it's host's qemu process. No core file though. I'll try next time
    happened on kvm72 as well.

     
  • Rafal Wijata

    Rafal Wijata - 2008-08-13

    Logged In: YES
    user_id=996150
    Originator: YES

    [guest] And finally the device gets offline
    [guest] sd 0:0:0:0: rejecting I/O to offline device

    Is it possible, that those problems come from the fact, that I have configured raw devices as kvm disks? Eg:
    -drive media=disk,if=scsi,boot=on,file=/dev/sdb2 -drive media=disk,if=scsi,boot=off,file=/dev/sdc2 ...

     
  • Rafal Wijata

    Rafal Wijata - 2008-08-13

    Logged In: YES
    user_id=996150
    Originator: YES

    Another crash with guest bt, please advise how to debug?

    R13: ffff8100dd107000 R14: ffffffff80077090 R15: ffffffff80418e80
    FS: 0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00002aec1f42e000 CR3: 00000000d49e4000 CR4: 00000000000006e0

    Call Trace:
    <IRQ> [<ffffffff8003eadd>] dev_watchdog+0x98/0xc0
    [<ffffffff800953c2>] run_timer_softirq+0x133/0x1af
    [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6
    [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
    [<ffffffff8006c6e4>] do_softirq+0x2c/0x85
    [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
    <EOI> [<ffffffff800d403a>] drain_array+0x28/0xc0
    [<ffffffff800d4aea>] cache_reap+0x0/0x219
    [<ffffffff800d4b8f>] cache_reap+0xa5/0x219
    [<ffffffff8004cea9>] run_workqueue+0x94/0xe4
    [<ffffffff800497be>] worker_thread+0x0/0x122
    [<ffffffff800498ae>] worker_thread+0xf0/0x122
    [<ffffffff8008ad76>] default_wake_function+0x0/0xe
    [<ffffffff8003253d>] kthread+0xfe/0x132
    [<ffffffff8005dfb1>] child_rip+0xa/0x11
    [<ffffffff8003243f>] kthread+0x0/0x132
    [<ffffffff8005dfa7>] child_rip+0x0/0x11

     
  • Jes Sorensen

    Jes Sorensen - 2010-11-30

    This looks like a duplicate of https://bugs.launchpad.net/qemu/+bug/587993

    If you can reproduce this problem, it would be great if you can add the info to the bug in launchpad.

    Thanks,
    Jes

     

Log in to post a comment.