host: kvm71, 64bit 2.6.18-92.1.6.el5, 16Gram, 2*X5450(8cores)
guest: 64bit 2.6.18-92.1.6.el5, 3.5Gram, 2cpus, 5hdds on raw partitions(!).
In the guest, i'm getting quite often messages like
kernel: sd 0:0:0:0: ABORT operation started.
kernel: sd 0:0:0:0: ABORT operation timed-out.
[many times like that]
[there was more messages concerning the device is offline, but I lost them, will update if it happens again]
then filesystem gets remounted read-only, then kernel panics with message(part of the message only, that's what i got on the screen)
FS: 0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000400000013 CR3: 0000000000201000 CR4: 00000000000006e0
Process sshd (pid: 23911, threadinfo ffff81006f53a000, task ffff8100dc2ca0c0)
Stack:
ffffffff800075dc
ffff8100dc1ba960
ffff8100dc1ba688
ffff810096b52300
ffff8100dd15acc0
ffff8100dc1ba758
ffff8100dc1ba758
ffff810003f2a680
ffffffff8000d11c
0000000000000008
0000000000000008
ffff8100dd15acc0
Call Trace:
[<ffffffff800075dc>] kmem_cache_free+0x13c/0x1dd
[<ffffffff8000d11c>] dput+0xf6/0x114
[<ffffffff800125f3>] __fput+0x16c/0x198
[<ffffffff8001a6a7>] remove_vma+0x3d/0x64
[<ffffffff80039c60>] exit_mmap+0xcf/0xf3
[<ffffffff8003bd73>] mmput+0x30/0x83
[<ffffffff800151b6>] do_exit+0x28b/0x8d0
[<ffffffff80048a1c>] cpuset_exit+0x0/0x6c
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
Code: f0 ff 0f 0f 88 6c 01 00 00 c3 f0 81 2f 00 00 00 01 74 05 e8
RIP [<ffffffff80064a2d>] _spin_lock+0x0/0xa
RSP <ffff81006f53be10>
CR2: 0000000400000013
<0>Kernel panic - not syncing: Fatal exception
Even though the kernel panic, the kvm process was still taking 100% CPU. gdb shows following info - no clue though if it's helpful in any way.
Thread 4 (Thread 1938626880 (LWP 17006)):
Thread 3 (Thread 1087498560 (LWP 17007)):
Thread 2 (Thread 1949133120 (LWP 17014)):
Thread 1 (Thread 47523282295136 (LWP 16990)):
Logged In: YES
user_id=996150
Originator: YES
Update_1:
while guest was panicking, I was able to see SEGV for it's host's qemu process. No core file though. I'll try next time
happened on kvm72 as well.
Logged In: YES
user_id=996150
Originator: YES
[guest] And finally the device gets offline
[guest] sd 0:0:0:0: rejecting I/O to offline device
Is it possible, that those problems come from the fact, that I have configured raw devices as kvm disks? Eg:
-drive media=disk,if=scsi,boot=on,file=/dev/sdb2 -drive media=disk,if=scsi,boot=off,file=/dev/sdc2 ...
Logged In: YES
user_id=996150
Originator: YES
Another crash with guest bt, please advise how to debug?
R13: ffff8100dd107000 R14: ffffffff80077090 R15: ffffffff80418e80
FS: 0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002aec1f42e000 CR3: 00000000d49e4000 CR4: 00000000000006e0
Call Trace:
<IRQ> [<ffffffff8003eadd>] dev_watchdog+0x98/0xc0
[<ffffffff800953c2>] run_timer_softirq+0x133/0x1af
[<ffffffff80011ed2>] __do_softirq+0x5e/0xd6
[<ffffffff8005e2fc>] call_softirq+0x1c/0x28
[<ffffffff8006c6e4>] do_softirq+0x2c/0x85
[<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff800d403a>] drain_array+0x28/0xc0
[<ffffffff800d4aea>] cache_reap+0x0/0x219
[<ffffffff800d4b8f>] cache_reap+0xa5/0x219
[<ffffffff8004cea9>] run_workqueue+0x94/0xe4
[<ffffffff800497be>] worker_thread+0x0/0x122
[<ffffffff800498ae>] worker_thread+0xf0/0x122
[<ffffffff8008ad76>] default_wake_function+0x0/0xe
[<ffffffff8003253d>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8003243f>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
This looks like a duplicate of https://bugs.launchpad.net/qemu/+bug/587993
If you can reproduce this problem, it would be great if you can add the info to the bug in launchpad.
Thanks,
Jes