Re: [Evms-devel] (no subject)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Robert Dorn wrote:

> (1)  I use evms RAID1+lvm. Bad sectors on a disk cause repeatable kernel
> panics. (evms_md_error seems to be guilty)
>
> (2)  Unfortunatly the replacement disk is 30MB smaller. I seem be be stuck
> as
> evmsn gives a lot of warnings and no longer recognizes my lvm partitions
> while they are still in memory and working. Any Ideas?
>
> ----------------------
> Guilty for the panic seems to be [md_core.c]: evms_md_error() .... LOG_ERROR
>
> .... __builtin_return_address(3)
> If -fomit_frame_pointer is used only __builtin_return_address(0) may IMHO be
>
> safely used.
> there are two non-arch-specific places in the Kernel where this is violated:
> drivers/md/md.c and drivers/evms/md_core.c
> The assembler output shows the problematic lines:
> [__builtin_return_address(3) is transated to]
> movl 12(%esp),%edx ; return address to edx
> movl (%edx),%eax   ; deref. -- luck: it is "add $8,%esp    jmp .L4437"
> movl (%eax),%eax   ; <===========CRASH (Unable to handle kernel paging
> request
> at vitual address eb08c483
> movl 4(%eax),%eax
> pushl %eax
> I am __very__ surprised that this problem has not occurred before.
>

You're right.  When I ported the linux md code, I didnot catch this.   Since
the error log message already contains the MD array index and the troubled
device, the caller address is not needed.  See the patch below.


>
> ksymoops:
> -----------------------------
> Unable to handle kernel paging request at virtual address eb08c483
> 802bc275
> *pde = 00000000
> Oops: 0000 2.4.20aa1 #12 Mon Feb 17 12:17:40 CET 2003
> CPU:    0
> EIP:    0010:[<802bc275>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010002
> eax: eb08c483   ebx: 82387980   ecx: 82392d00   edx: 802bc1ff
> esi: 82387980   edi: 82387880   ebp: 00000008   esp: 80441e74
> ds: 0018   es: 0018   ss: 0018
> Process swapper (pid: 0, stackpage=80441000)
> Stack: 82387980 00001602 81f46c80 802bc1ff 82387980 82387880 82374d00
> 82374d18
>        802bfc85 82387980 00001602 804b3dfc 8022950a 82374d18 00000000
> 81f46c80
>        00000092 bff15880 804b3db8 80241205 81f46c80 00000000 804b3ef0
> 81f46c80
> Call Trace:    [<802bc1ff>] [<802bfc85>] [<8022950a>] [<80241205>]
> [<802451d0>]
>   [<80241f7c>] [<80247500>] [<80242d59>] [<80247460>] [<80109b7e>]
> [<80109ce6>]
>   [<80105000>] [<80105019>]
> Code: 8b 00 8b 40 04 50 8b 02 8b 40 04 50 8b 42 04 50 8b 44 24 18
>
> >>EIP; 802bc275 <evms_md_error+41/f4>   <=====
>
> >>edx; 802bc1ff <evms_md_error_dev+23/58>
> >>esp; 80441e74 <init_task_union+1e74/2000>
>
> Trace; 802bc1ff <evms_md_error_dev+23/58>
> Trace; 802bfc85 <end_sync_read+1d/3c>
> Trace; 8022950a <end_that_request_first+ae/10c>
> Trace; 80241205 <ide_end_request+5d/a0>
> Trace; 802451d0 <default_end_request+10/14>
> Trace; 80241f7c <ide_error+138/184>
> Trace; 80247500 <ide_dma_intr+a0/a8>
> Trace; 80242d59 <ide_intr+bd/110>
> Trace; 80247460 <ide_dma_intr+0/a8>
> Trace; 80109b7e <handle_IRQ_event+32/5c>
> Trace; 80109ce6 <do_IRQ+6a/a8>
> Trace; 80105000 <_stext+0/0>
> Trace; 80105019 <rest_init+19/1c>
>
> Code;  802bc275 <evms_md_error+41/f4>
> 00000000 <_EIP>:
> Code;  802bc275 <evms_md_error+41/f4>   <=====
>    0:   8b 00                     mov    (%eax),%eax   <=====
> Code;  802bc277 <evms_md_error+43/f4>
>    2:   8b 40 04                  mov    0x4(%eax),%eax
> Code;  802bc27a <evms_md_error+46/f4>
>    5:   50                        push   %eax
> Code;  802bc27b <evms_md_error+47/f4>
>    6:   8b 02                     mov    (%edx),%eax
> Code;  802bc27d <evms_md_error+49/f4>
>    8:   8b 40 04                  mov    0x4(%eax),%eax
> Code;  802bc280 <evms_md_error+4c/f4>
>    b:   50                        push   %eax
> Code;  802bc281 <evms_md_error+4d/f4>
>    c:   8b 42 04                  mov    0x4(%edx),%eax
> Code;  802bc284 <evms_md_error+50/f4>
>    f:   50                        push   %eax
> Code;  802bc285 <evms_md_error+51/f4>
>   10:   8b 44 24 18               mov    0x18(%esp,1),%eax
>
>  <0>Kernel panic: Aiee, killing interrupt handler!
> ---------------------
> Error message imidiatly before panic:
> --------------------
> 8
> end_request: I/O error, dev 21:00 (hde), sector 5364848
> --------------------

Please try the following evms kernel patch, this should get rid of the panic.

--- linux-2.4.20a/drivers/evms/md_core.c Tue Mar  4 13:15:46 2003
+++ linux-2.4.20b/drivers/evms/md_core.c Tue Mar  4 13:25:48 2003
@@ -2763,14 +2763,12 @@
  mdk_rdev_t * rrdev;

  /* check for NULL first */
- if (!mddev) {
+ if (!mddev || !node) {
   MD_BUG();
   return 0;
  }
- LOG_ERROR("evms_md_error dev:(md%d), node:(%s), (caller: %p,%p,%p,%p).\n",
-     mdidx(mddev), node->name,
-     __builtin_return_address(0),__builtin_return_address(1),
-     __builtin_return_address(2),__builtin_return_address(3));
+ LOG_ERROR("%s: dev:(md%d), node:(%s)\n",
+     __FUNCTION__, mdidx(mddev), evms_md_partition_name(node));

  rrdev = evms_md_find_rdev_from_node(mddev, node);
  if (!rrdev || rrdev->faulty)



>
>
> After replacement of defect drive: output of evms:[consecutive blank lines
> removed]
> -------------------
> EVMS Command Line Interpreter Version 1.2.0
>
> MDRaid1RegMgr: Error building region md/md0. Missing member object 0
>
> MDRaid1RegMgr: Region md/md0 missing raid array object 0. Possible
> identifier
> of missing object is Major=0 Minor=0
>
> MDRaid1RegMgr: Region md/md0 disks array not zeroed
> [Repeated 25 times]
>
> MDRaid1RegMgr: Region md/md0 disk counts incorrect
>
> MDRaid1RegMgr: MD region md/md0 has inconsistent metadata.  Any missing
> objects will be *PERMANENTLY* removed from the region and all super blocks
> will be updated.  If you elect not to fix the region at this time, you may
> do
> so later.  Changes will not be written to disk until you select to commit
> the
> changes.
>
> The following responses are available:
>
> *1 = Don't Fix
>  2 = Fix
>
> The default choice is marked with an *.
> Please enter the number corresponding to your choice:
>

Did you choose the "Fix" option?


>
> Engine: WARNING: Volume "/dev/evms/usr" was exported by the EVMS kernel but
> was not discovered by the EVMS Engine.  The kernel's in memory copy of the
> volume is scheduled to be deleted when changes are committed.  Deleting the
> kernel's in memory copy of the volume will not change any data on the disks.
>
> If the volume truly exists, the kernel will discover it after the changes
> have been committed.
>
> Engine: WARNING: Volume "/dev/evms/root" was exported by the EVMS kernel but
>
> was not discovered by the EVMS Engine.  The kernel's in memory copy of the
> volume is scheduled to be deleted when changes are committed.  Deleting the
> kernel's in memory copy of the volume will not change any data on the disks.
>
> If the volume truly exists, the kernel will discover it after the changes
> have been committed.
>
> [Same for other volumes]
>

I guess that the failed disk has other partitions which belongs to other
volumes, thus the same message was displayed for other volumes.

Thanks,
Mike T.