[SSI-devel] Re: SSI-1.2.2 CVS May21 cfs failover oops <<cfs_restart_read>>
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-05-24 03:33:14
|
Nevermind I managed to kill them all. It took a while. :) -Roger On 5/23/05, Roger Tsang <rog...@gm...> wrote: > Additionally with this kernel, I ran into an immortal `rm -f` process > while compiling on a CFS chard mounted filesystem. I can't kill the > `rm -f drbd_actlog.o drbd_bitmap.o drbd_buildtag.o drbd_fs.o drbd_mai` > process even with kill -9. It is stuck in the Running state in the > process table. Going into the directory where rm -f is run using ls > results in no output and ls stuck in defuct state. >=20 > I'm going to update the kernel with Laura's fixes she checked in > today, so I'll write back if this happens again. >=20 > It seems rm is expected to finish in "make clean" before doing "make > all", but the following indicates that both "were" running at the same > time. odd. I issued "make clean all". >=20 > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 75682 root 17 0 1032 1032 736 R 0.7 0.1 0:01.51 top > 73650 root 18 0 380 380 320 R 0.0 0.0 0:00.00 rm -f > drbd_actlog.o drbd_bitmap.o drbd_buildtag.o drbd_fs.o drbd_mai > 73748 root 16 0 648 648 516 D 0.0 0.1 0:00.00 make -C > /backup/cvs.stable/openssi/drbd/drbd fastdep > 74159 root 16 0 592 592 488 D 0.0 0.1 0:00.00 ls --color= =3Dtty >=20 > -Roger >=20 >=20 > On 5/23/05, Roger Tsang <rog...@gm...> wrote: > > Hi John, > > > > I tried to do a failover just now and got a cfs oops. This time > > definitely everything is synced to May 21st developer CVS > > OPENSSI-1-2-FC-STABLE branch. > > > > -Roger > > > > node2 login: drbd2: short read expecting header on sock: r=3D-512 > > drbd0: PingAck did not arrive in time. > > drbd0: short read expecting header on sock: r=3D-512 > > drbd1: PingAck did not arrive in time. > > drbd1: short read expecting header on sock: r=3D-512 > > Taking over master from node 1. > > Node 1 has gone down!!! > > read handler down off 0 len 10000 > > read handler down off 10000 len 10000 > > read handler down off 20000 len 10000 > > read handler down off 30000 len 10000 > > read handler down off 40000 len 10000 > > read handler down off 50000 len 10000 > > passed the first scan in ipcname_pull_data > > num_objects[MSG] =3D 0 > > num_objects[SEM] =3D 37 > > num_objects[SHM] =3D 27 > > ipcnameserver ready completed > > drbd0: drbd_nodedown: Signaling receiver thread. > > drbd1: drbd_nodedown: Signaling receiver thread. > > drbd2: drbd_nodedown: Signaling receiver thread. > > drbd0: Doing CLMS nodedown callback for service 9 > > drbd2: Doing CLMS nodedown callback for service 11 > > drbd1: Doing CLMS nodedown callback for service 10 > > fsck 1.35 (28-Feb-2004) > > e2fsck 1.35 (28-Feb-2004) > > fsck.ext3: No such file= or direc > > tory while trying to open /dev/drbd/0 > > > > The superblock could not be read or does not describe a correct ext2 > > fil= esystem. > > If the device is valid and it really contains an ext2 > > filesystem (and n= ot swap o > > r ufs or something else), then the superblock > > is corrupt, and you might = try runni > > ng e2fsck with an alternate superblock: > > e2fsck -b 8193 <device> > > > > Unabl= e to hand > > le kernel NULL pointer dereference at virtual address 00000138 > > printing eip: > > c0228ac5 > > *pde =3D 09763067 > > *pte =3D 00000000 > > Oops: 0000 > > loop cls_u32 sch_sfq sch_htb softdog nfsd ip_vs_sed ip_conntrack_tftp i= p_conntra > > ck_ftp ip_nat_ftp iptable_nat ipt_REJECT ipt_LOG ipt_limit ipt_multipor= t ipt_s > > CPU: 0 > > EIP: 0060:[<c0228ac5>] Not tainted > > EFLAGS: 00210246 > > > > EIP is at cfs_restart_read [kernel] 0x65 (2.4.22-1.2199.nptl_ssi_10up) > > eax: 00000000 ebx: 00000000 ecx: c9481488 edx: c9481700 > > esi: c94816f8 edi: c9481700 ebp: ce879e88 esp: ce879e6c > > ds: 0068 es: 0068 ss: 0068 > > Process cfs failover (pid: 166648, stackpage=3Dce879000) > > Stack: 00000000 d6c8991c d6c89924 c9481488 d6c89880 00000000 d6b34400 c= e879ea4 > > c023f380 d6c89880 ce879ea4 d6b34400 ce879ef8 fffffffe ce879ed0 c= 02401fe > > d6b34400 ce879ec0 c054ba40 00000000 00000002 00000008 d2b0f1e0 d= 2b0f180 > > Call Trace: > > [<c023f380>] cfs_sb_do_flush [kernel] 0x70 (0xce879e8c) > > [<c02401fe>] cfsd_rb_phase_flush_0 [kernel] 0x3e (0xce879ea8) > > [<c023c575>] cfs_send_rb_phase_flush [kernel] 0xe5 (0xce879ed4) > > [<c02408ec>] cfs_rb_phase_flush [kernel] 0x1c (0xce879f18) > > [<c0240693>] cfs_rb_do_phase [kernel] 0x183 (0xce879f28) > > [<c01877ec>] ext3_setup_super [kernel] 0x14c (0xce879f34) > > [<c0240024>] cfsd_rb_0 [kernel] 0xb4 (0xce879f60) > > [<c02408d0>] cfs_rb_phase_flush [kernel] 0x0 (0xce879f6c) > > [<c023c287>] cfs_send_rb [kernel] 0xe7 (0xce879f7c) > > [<c0168de4>] mnt_fix_devname [kernel] 0x54 (0xce879f98) > > [<c023d180>] cfs_root_failover [kernel] 0x140 (0xce879fb8) > > [<c023ca90>] cfs_nodedown_thread [kernel] 0x0 (0xce879fd0) > > [<c023caea>] cfs_nodedown_thread [kernel] 0x5a (0xce879fe0) > > [<c01076c9>] kernel_thread_helper [kernel] 0x5 (0xce879ff0) > > > > Code: 8b 83 38 01 00 00 89 34 24 83 c0 28 89 44 24 04 e8 c6 91 01 > > > > Entering kdb (current=3D0xce878000, pid 166648) Oops: Oops > > due to oops @ 0xc0228ac5 > > eax =3D 0x00000000 ebx =3D 0x00000000 ecx =3D 0xc9481488 edx =3D 0xc948= 1700 > > esi =3D 0xc94816f8 edi =3D 0xc9481700 esp =3D 0xce879e6c eip =3D 0xc022= 8ac5 > > ebp =3D 0xce879e88 xss =3D 0xc0350068 xcs =3D 0x00000060 eflags =3D 0x0= 0210246 > > xds =3D 0xc9480068 xes =3D 0x00000068 origeax =3D 0xffffffff ®s =3D = 0xce879e38 > > kdb> > > kdb> bt > > Stack traceback for pid 166648 > > 0xce878000 166648 2 1 0 R 0xce878350 *cfs failover > > EBP EIP Function (args) > > 0xce879e88 0xc0228ac5 cfs_restart_read+0x65 (0xd6c89880, 0xce879ea4, 0x= d6b34400, > > 0xce879ef8, 0xfffffffe) > > kernel .text 0xc0100000 0xc0228a60 0xc02= 28b30 > > 0xce879ea4 0xc023f380 cfs_sb_do_flush+0x70 (0xd6b34400, 0xce879ec0, 0xc= 054ba40, > > 0x0, 0x2) > > kernel .text 0xc0100000 0xc023f310 0xc02= 3f410 > > 0xce879ed0 0xc02401fe cfsd_rb_phase_flush_0+0x3e (0xce879ef8, 0x0, 0x21= 4, 0xce87 > > 9f24, 0xd2fd4880) > > kernel .text 0xc0100000 0xc02401c0 0xc02= 40240 > > 0xce879f14 0xc023c575 cfs_send_rb_phase_flush+0xe5 (0x2, 0x22) > > kernel .text 0xc0100000 0xc023c490 0xc02= 3c580 > > 0xce879f24 0xc02408ec cfs_rb_phase_flush+0x1c (0xd6b34400, 0x2, 0xc0187= 7ec, 0xc0 > > 373039, 0x22e960) > > kernel .text 0xc0100000 0xc02408d0 0xc02= 408f0 > > 0xce879f5c 0xc0240693 cfs_rb_do_phase+0x183 (0xd6b34400, 0x0, 0xc02408d= 0, 0xd6c8 > > 9880, 0xc15a7b00) > > kernel .text 0xc0100000 0xc0240510 0xc02= 40770 > > 0xce879f78 0xc0240024 cfsd_rb_0+0xb4 (0xce879fa2, 0xce879fa4, 0xc15a760= 0, 0xd7cc > > 4e58, 0x1cb) > > kernel .text 0xc0100000 0xc023ff70 0xc02= 40060 > > 0xce879fb4 0xc023c287 cfs_send_rb+0xe7 (0x2, 0x22, 0x0, 0x0, 0xc15a7b00= ) > > kernel .text 0xc0100000 0xc023c1a0 0xc02= 3c290 > > 0xce879fdc 0xc023d180 cfs_root_failover+0x140 (0xd6b34400, 0xffffffff) > > kernel .text 0xc0100000 0xc023d040 0xc02= 3d200 > > 0xce879fec 0xc023caea cfs_nodedown_thread+0x5a > > more> > > kernel .text 0xc0100000 0xc023ca90 0xc02= 3cb10 > > 0xc01076c9 kernel_thread_helper+0x5 > > kernel .text 0xc0100000 0xc01076c4 0xc01= 076d0 > > kdb> > > > |