Re: [SSI-devel] Re: [DRBD-SSI] patch-drbd-0.7.8-ssi_rc3
Brought to you by:
brucewalker,
rogertsang
From: En C. L. <en...@in...> - 2005-03-16 13:49:39
|
Hi Roger, I finally got down to building and testing failover with your latest patch, and I've run into a problem similar to the one you described earlier. I'm running 0.7.10 patched with patch-drbd-0.7.8-ssi_rc3 on a 1.2.1 cluster. The cluster comes up fine with the drbd root device. But, when the initnode is switched off (both nodes are consistent), I get something like this on the takeover node: <snip>... ipcnameserver ready completed drbd0: drbd_nodedown: Signaling receiver thread. drbd0: short read expecting header on sock: r=-512 drbd0: Doing CLMS nodedown callback for service 9 fsck 1.35 (28-Feb-2004) e2fsck 1.35 (28-Feb-2004) fsck.ext3: No such file or directory while trying to open /dev/drbd/0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> /etc/rc.d/rc.sysrecover running ssi-ntpsetrefclk: ntpd is not running; not setting refclk INIT: version 2.85-SSI reloading INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: cannot execute "/sbin/mingetty" INIT: cannot execute "/sbin/mingetty"errno 66 INIT: Sending processes the TERM signal INIT: Sending processes the KILL signal INIT: Pid 132116 [id siR] seems to hang fsck 1.35 (28-Feb-2004) <snip>... The failover continues with rc.nodedown, and completes 'successfully'. However, when I try to boot the original initnode back into the cluster, it just stops at "Starting init" The problems are: 1. /dev/drbd/0 is not available when ckroot.ssi is run from cfs_nodedown. 2. init thinks that rc.sysrecover is hung. However, ps shows that the pid has gone away, and the output seems to show that the last command of rc.sysrecover has been run. I'm seeing this problem on a pair of 1.8GHz UP machines. Was this the problem you saw? If it is, how did it go away? Thanks, En Chiang Roger Tsang wrote: > Hi En Chiang and Aneesh, > > Though I haven't tested drbd-0.7.10 with this patch, I don't see any > problems applying this patch. However I suggest the following tests just > to be sure before releasing to users: > > DRBD-SSI > -Failure of initnode in Consistent state. > -Failure of non-initnode in Consistent state. > -Failure of non-initnode during boot before it gets marked UP. > -Failure of initnode during boot before it gets marked UP. > -Failure of initnode with drbd split-brain in Consistent state. > -Failure of non-initnode with drbd split-brain in Consistent state. > -Failure of any node while doing background sync. > > To induce split-brain while connected, disconnect the secondary node > before doing force primary. Then reconnect. Both nodes should detect > split-brain and go to StandAlone. Now you can try the failover tests with > split-brain. > > -Roger > > > >>Hi Roger, >> >>The 1.2.1 release of Debian has drbd included in it. However, it is >>based on drbd-0.7.10. It's a little older than your latest changes >>(before the fsck problems started showing up). >> >>I'll try your patch with drbd-0.7.10, coz our CVS repository is based on >>it. And if it works, I'll release an rpm for 1.2.1 based on it. Do you >>see any problems with this? >> >>En Chiang >> >>Roger Tsang wrote: >> >>>Use this one instead. It includes a bug fix for drbd-ssi on >>>kernel-2.6 (SSI-1.9.x). I also tested the force kernel panic feature >>>on drbd split-brain detection during CFS root failover. >>> >>>This supercedes all previous drbd-ssi patches including >>>drbd-0.7.10-ssi_rc* patches. Let me know if you require any last >>>minute fixes. Otherwise I think this is pretty much stable - until we >>>spot something I've missed. Thanks. >>> >>>-Roger >>> >>> >>>On Tue, 8 Mar 2005 22:46:05 -0500, Roger Tsang <rog...@gm...> >>>wrote: >>> >>> >>>>DRBD-SSI is a contributed project that adds support for HA-CFS using >>>>DRBD as shared storage devices. >>>> >>>>This patch against drbd-0.7.8 adds: >>>>- Force kernel panic if failing over to StandAlone drbd device. >>>>Assumes split-brain occured. >>>>- Reduced locking in algorithm. >>>>- Reduced code path (in code). >>>> >>>>Limitations (as of drbd-0.7.8-ssi_rc2): >>>>- Until further announcement, not compatible with non-SSI drbd peer >>>>nodes. Do not connect to non-SSI drbd nodes. >>>>- In order to successfully failover CFS, drbd requires its receiver >>>>thread exists - eg. WFConnection or Connected. >>>>- Issues carried over from the original DRBD. >>>> >>>>Improvements since drbd-ssi-1.2.0-fc2.i386.tar release, in addition to >>>>the above: >>>>- More aggressive CLMS priority on node down event. >>>>- Force CLMS caller to wait until DRBD completes nodedown (HA-CFS >>>>failover preparation). >>>>- Disable sizeof drbd struct checks following Aneesh's recommendation >>>>(for Debian). >>>>- Increased messages to console so users can see DRBD progress during >>>>node down. >>>>- Improved drbd.conf template. >>>>- Adds SSIfailover.patch >>>>- Adds rc.sysrecover.patch >>>> >>>>I've successfully done failover (and failback) tests for root and >>>>non-root devices on failure of initnode and non-initnode on SSI-1.2.1 >>>>(CVS March 5, 2005) on Fedora Core 2 and a cluster of UP machines. >>>> >>>>At the moment I have not tested the force kernel panic feature for >>>>failover to standalone devices. To simulate drbd split-brain use drbd >>>>userspace utilities to force primary on your secondary drbd node. >>>> >>>>Maybe we can consider releasing this for use with SSI-1.2.1-FC after >>>>someone has tested this on a cluster of SMP machines. >>>> >>>>Enjoy. >>>> >>>>-Roger >>>> >>>> >>>> >> > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > > |