Thread: RE: [SSI-users] Xen Cluster & DRBD

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Once the master node has gone, I can no longer ping the CVIP address -
even if I bring that node back up again as secondary. The only way to
get it back is to restart the entire cluster.=20

I thought the 'fsck.ext3: No such file or directory while trying to open
/dev/drbd0.........' looked serious enough to be the cause. However,
I've now tried editing some test files and crashing nodes and the drbd
setup seems to be working ok. Thanks for the wake-up call!!

My cvip.conf has <director_node> and <real_server_node> sections for the
second node already.=20

Any ideas as to where else I should look for the problem?

Owen

-----Original Message-----
From: Roger Tsang [mailto:rog...@gm...]=20
Sent: 25 April 2006 03:03
To: Owen Campbell
Cc: ssi...@li...
Subject: Re: [SSI-users] Xen Cluster & DRBD

What kinda problem are you having? nodedown completed.

Roger

On 4/24/06, Owen Campbell <ow...@em...> wrote:
>
>
> Can anyone help to get my cluster of Xen virtual machines to failover
on
> failure of the root node?......
>
> This is a debian sarge based system (both the dom0 and domU's).
>
> The initrd was created with devices labeled as /dev/drbd/0 in
drbd.conf and
> fstab. drbd.conf was then put back to using /dev/drbd0. I've tried
both
> formats in fstab, but with no difference to the results.
>
> I've also tried editing the initrd to remove all trace of /dev/drbd/0,
but
> it also made no difference.
>
> Everything works fine, except failover when the root node goes down.
Then I
> get:
>
> >drbd0: PingAck did not arrive in time.
>
> drbd0: drbd0_asender [131278]: cstate Connected --> NetworkFailure
>
> drbd0: asender terminated
>
> drbd0: drbd0_receiver [131271]: cstate NetworkFailure --> BrokenPipe
>
> drbd0: short read expecting header on sock: r=3D-512
>
> drbd0: worker terminated
>
> drbd0: drbd0_receiver [131271]: cstate BrokenPipe --> Unconnected
>
> drbd0: Connection lost.
>
> drbd0: drbd0_receiver [131271]: cstate Unconnected --> WFConnection
>
> Taking over master from node 1.
>
> Node 1 has gone down!!!
>
> passed the first scan in ipcname_pull_data
>
> num_objects[MSG] =3D 0
>
> num_objects[SEM] =3D 0
>
> num_objects[SHM] =3D 0
>
> ipcnameserver ready completed
>
> drbd0: drbd_nodedown: Signaling receiver thread.
>
> drbd0: drbd_set_state: (mdev->this_bdev->bd_contains =3D=3D 0) in
> drivers/block/drbd/drbd_fs.c:702
>
> drbd0: Secondary/Unknown --> Primary/Unknown
>
> drbd0: Doing CLMS nodedown callback for service 9
>
> EXT3-fs: INFO: recovery required on readonly filesystem.
>
> EXT3-fs: write access will be enabled during recovery.
>
> write handler down off 470000 len 10000
>
> kjournald starting.  Commit interval 5 seconds
>
> EXT3-fs: recovery complete.
>
> EXT3-fs: mounted filesystem with ordered data mode.
>
> fsck 1.35 (28-Feb-2004)
>
> ERROR: Couldn't open /dev/null (No such file or directory)
>
> e2fsck 1.35 (28-Feb-2004)
>
> fsck.ext3: No such file or directory while trying to open /dev/drbd0
>
> The superblock could not be read or does not describe a correct ext2
>
> filesystem.  If the device is valid and it really contains an ext2
>
> filesystem (and not swap or ufs or something else), then the
superblock
>
> is corrupt, and you might try running e2fsck with an alternate
superblock:
>
>     e2fsck -b 8193 <device>
>
> EXT3 FS on drbd0, internal journal
>
> /etc/init.d/rc.sysrecover  running
>
> ssi-ntpsetrefclk: ntpd is not running; not setting refclk
>
> INIT: version 2.86-SSI reloading
>
> INIT: cannot execute "/sbin/getty"
>
> INIT: Sending processes the TERM signal
>
> INIT: Sending processes the KILL signal
>
> INIT: Pid 131747 [id siR] seems to hang
>
> /etc/init.d/rc.nodedown 1 running
>
> fsck 1.35 (28-Feb-2004)
>
> INIT: +++ nodedown completed on node 1
>
> Any help, much appreciated!!!!!
>
>
> Owen
>
>

Thread: RE: [SSI-users] Xen Cluster & DRBD

ssic-linux-users