|
From: Mike T. <mh...@us...> - 2003-03-04 19:54:26
|
Robert Dorn wrote:
> (1) I use evms RAID1+lvm. Bad sectors on a disk cause repeatable kernel
> panics. (evms_md_error seems to be guilty)
>
> (2) Unfortunatly the replacement disk is 30MB smaller. I seem be be stuck
> as
> evmsn gives a lot of warnings and no longer recognizes my lvm partitions
> while they are still in memory and working. Any Ideas?
>
> ----------------------
> Guilty for the panic seems to be [md_core.c]: evms_md_error() .... LOG_ERROR
>
> .... __builtin_return_address(3)
> If -fomit_frame_pointer is used only __builtin_return_address(0) may IMHO be
>
> safely used.
> there are two non-arch-specific places in the Kernel where this is violated:
> drivers/md/md.c and drivers/evms/md_core.c
> The assembler output shows the problematic lines:
> [__builtin_return_address(3) is transated to]
> movl 12(%esp),%edx ; return address to edx
> movl (%edx),%eax ; deref. -- luck: it is "add $8,%esp jmp .L4437"
> movl (%eax),%eax ; <===========CRASH (Unable to handle kernel paging
> request
> at vitual address eb08c483
> movl 4(%eax),%eax
> pushl %eax
> I am __very__ surprised that this problem has not occurred before.
>
You're right. When I ported the linux md code, I didnot catch this. Since
the error log message already contains the MD array index and the troubled
device, the caller address is not needed. See the patch below.
>
> ksymoops:
> -----------------------------
> Unable to handle kernel paging request at virtual address eb08c483
> 802bc275
> *pde = 00000000
> Oops: 0000 2.4.20aa1 #12 Mon Feb 17 12:17:40 CET 2003
> CPU: 0
> EIP: 0010:[<802bc275>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010002
> eax: eb08c483 ebx: 82387980 ecx: 82392d00 edx: 802bc1ff
> esi: 82387980 edi: 82387880 ebp: 00000008 esp: 80441e74
> ds: 0018 es: 0018 ss: 0018
> Process swapper (pid: 0, stackpage=80441000)
> Stack: 82387980 00001602 81f46c80 802bc1ff 82387980 82387880 82374d00
> 82374d18
> 802bfc85 82387980 00001602 804b3dfc 8022950a 82374d18 00000000
> 81f46c80
> 00000092 bff15880 804b3db8 80241205 81f46c80 00000000 804b3ef0
> 81f46c80
> Call Trace: [<802bc1ff>] [<802bfc85>] [<8022950a>] [<80241205>]
> [<802451d0>]
> [<80241f7c>] [<80247500>] [<80242d59>] [<80247460>] [<80109b7e>]
> [<80109ce6>]
> [<80105000>] [<80105019>]
> Code: 8b 00 8b 40 04 50 8b 02 8b 40 04 50 8b 42 04 50 8b 44 24 18
>
> >>EIP; 802bc275 <evms_md_error+41/f4> <=====
>
> >>edx; 802bc1ff <evms_md_error_dev+23/58>
> >>esp; 80441e74 <init_task_union+1e74/2000>
>
> Trace; 802bc1ff <evms_md_error_dev+23/58>
> Trace; 802bfc85 <end_sync_read+1d/3c>
> Trace; 8022950a <end_that_request_first+ae/10c>
> Trace; 80241205 <ide_end_request+5d/a0>
> Trace; 802451d0 <default_end_request+10/14>
> Trace; 80241f7c <ide_error+138/184>
> Trace; 80247500 <ide_dma_intr+a0/a8>
> Trace; 80242d59 <ide_intr+bd/110>
> Trace; 80247460 <ide_dma_intr+0/a8>
> Trace; 80109b7e <handle_IRQ_event+32/5c>
> Trace; 80109ce6 <do_IRQ+6a/a8>
> Trace; 80105000 <_stext+0/0>
> Trace; 80105019 <rest_init+19/1c>
>
> Code; 802bc275 <evms_md_error+41/f4>
> 00000000 <_EIP>:
> Code; 802bc275 <evms_md_error+41/f4> <=====
> 0: 8b 00 mov (%eax),%eax <=====
> Code; 802bc277 <evms_md_error+43/f4>
> 2: 8b 40 04 mov 0x4(%eax),%eax
> Code; 802bc27a <evms_md_error+46/f4>
> 5: 50 push %eax
> Code; 802bc27b <evms_md_error+47/f4>
> 6: 8b 02 mov (%edx),%eax
> Code; 802bc27d <evms_md_error+49/f4>
> 8: 8b 40 04 mov 0x4(%eax),%eax
> Code; 802bc280 <evms_md_error+4c/f4>
> b: 50 push %eax
> Code; 802bc281 <evms_md_error+4d/f4>
> c: 8b 42 04 mov 0x4(%edx),%eax
> Code; 802bc284 <evms_md_error+50/f4>
> f: 50 push %eax
> Code; 802bc285 <evms_md_error+51/f4>
> 10: 8b 44 24 18 mov 0x18(%esp,1),%eax
>
> <0>Kernel panic: Aiee, killing interrupt handler!
> ---------------------
> Error message imidiatly before panic:
> --------------------
> 8
> end_request: I/O error, dev 21:00 (hde), sector 5364848
> --------------------
Please try the following evms kernel patch, this should get rid of the panic.
--- linux-2.4.20a/drivers/evms/md_core.c Tue Mar 4 13:15:46 2003
+++ linux-2.4.20b/drivers/evms/md_core.c Tue Mar 4 13:25:48 2003
@@ -2763,14 +2763,12 @@
mdk_rdev_t * rrdev;
/* check for NULL first */
- if (!mddev) {
+ if (!mddev || !node) {
MD_BUG();
return 0;
}
- LOG_ERROR("evms_md_error dev:(md%d), node:(%s), (caller: %p,%p,%p,%p).\n",
- mdidx(mddev), node->name,
- __builtin_return_address(0),__builtin_return_address(1),
- __builtin_return_address(2),__builtin_return_address(3));
+ LOG_ERROR("%s: dev:(md%d), node:(%s)\n",
+ __FUNCTION__, mdidx(mddev), evms_md_partition_name(node));
rrdev = evms_md_find_rdev_from_node(mddev, node);
if (!rrdev || rrdev->faulty)
>
>
> After replacement of defect drive: output of evms:[consecutive blank lines
> removed]
> -------------------
> EVMS Command Line Interpreter Version 1.2.0
>
> MDRaid1RegMgr: Error building region md/md0. Missing member object 0
>
> MDRaid1RegMgr: Region md/md0 missing raid array object 0. Possible
> identifier
> of missing object is Major=0 Minor=0
>
> MDRaid1RegMgr: Region md/md0 disks array not zeroed
> [Repeated 25 times]
>
> MDRaid1RegMgr: Region md/md0 disk counts incorrect
>
> MDRaid1RegMgr: MD region md/md0 has inconsistent metadata. Any missing
> objects will be *PERMANENTLY* removed from the region and all super blocks
> will be updated. If you elect not to fix the region at this time, you may
> do
> so later. Changes will not be written to disk until you select to commit
> the
> changes.
>
> The following responses are available:
>
> *1 = Don't Fix
> 2 = Fix
>
> The default choice is marked with an *.
> Please enter the number corresponding to your choice:
>
Did you choose the "Fix" option?
>
> Engine: WARNING: Volume "/dev/evms/usr" was exported by the EVMS kernel but
> was not discovered by the EVMS Engine. The kernel's in memory copy of the
> volume is scheduled to be deleted when changes are committed. Deleting the
> kernel's in memory copy of the volume will not change any data on the disks.
>
> If the volume truly exists, the kernel will discover it after the changes
> have been committed.
>
> Engine: WARNING: Volume "/dev/evms/root" was exported by the EVMS kernel but
>
> was not discovered by the EVMS Engine. The kernel's in memory copy of the
> volume is scheduled to be deleted when changes are committed. Deleting the
> kernel's in memory copy of the volume will not change any data on the disks.
>
> If the volume truly exists, the kernel will discover it after the changes
> have been committed.
>
> [Same for other volumes]
>
I guess that the failed disk has other partitions which belongs to other
volumes, thus the same message was displayed for other volumes.
Thanks,
Mike T.
|