Can anyone help to get my cluster of Xen virtual machines to failover on failure of the root node?......
This is a debian sarge based system (both the dom0 and domU’s).
The initrd was created with devices labeled as /dev/drbd/0 in drbd.conf and fstab. drbd.conf was then put back to using /dev/drbd0. I’ve tried both formats in fstab, but with no difference to the results.
I’ve also tried editing the initrd to remove all trace of /dev/drbd/0, but it also made no difference.
Everything works fine, except failover when the root node goes down. Then I get:
>drbd0: PingAck did not arrive in time.
drbd0: drbd0_asender : cstate Connected --> NetworkFailure
drbd0: asender terminated
drbd0: drbd0_receiver : cstate NetworkFailure --> BrokenPipe
drbd0: short read expecting header on sock: r=-512
drbd0: worker terminated
drbd0: drbd0_receiver : cstate BrokenPipe --> Unconnected
drbd0: Connection lost.
drbd0: drbd0_receiver : cstate Unconnected --> WFConnection
Taking over master from node 1.
Node 1 has gone down!!!
passed the first scan in ipcname_pull_data
num_objects[MSG] = 0
num_objects[SEM] = 0
num_objects[SHM] = 0
ipcnameserver ready completed
drbd0: drbd_nodedown: Signaling receiver thread.
drbd0: drbd_set_state: (mdev->this_bdev->bd_contains == 0) in drivers/block/drbd/drbd_fs.c:702
drbd0: Secondary/Unknown --> Primary/Unknown
drbd0: Doing CLMS nodedown callback for service 9
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
write handler down off 470000 len 10000
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
fsck 1.35 (28-Feb-2004)
ERROR: Couldn't open /dev/null (No such file or directory)
e2fsck 1.35 (28-Feb-2004)
fsck.ext3: No such file or directory while trying to open /dev/drbd0
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
EXT3 FS on drbd0, internal journal
ssi-ntpsetrefclk: ntpd is not running; not setting refclk
INIT: version 2.86-SSI reloading
INIT: cannot execute "/sbin/getty"
INIT: Sending processes the TERM signal
INIT: Sending processes the KILL signal
INIT: Pid 131747 [id siR] seems to hang
/etc/init.d/rc.nodedown 1 running
fsck 1.35 (28-Feb-2004)
INIT: +++ nodedown completed on node 1
Any help, much appreciated!!!!!