Can anyone help to get my cluster of Xen virtual machines to failover on failure of the root node?......

This is a debian sarge based system (both the dom0 and domUs).

The initrd was created with devices labeled as /dev/drbd/0 in drbd.conf and fstab. drbd.conf was then put back to using /dev/drbd0. Ive tried both formats in fstab, but with no difference to the results.

Ive also tried editing the initrd to remove all trace of /dev/drbd/0, but it also made no difference.

Everything works fine, except failover when the root node goes down. Then I get:

>drbd0: PingAck did not arrive in time.

drbd0: drbd0_asender [131278]: cstate Connected --> NetworkFailure

drbd0: asender terminated

drbd0: drbd0_receiver [131271]: cstate NetworkFailure --> BrokenPipe

drbd0: short read expecting header on sock: r=-512

drbd0: worker terminated

drbd0: drbd0_receiver [131271]: cstate BrokenPipe --> Unconnected

drbd0: Connection lost.

drbd0: drbd0_receiver [131271]: cstate Unconnected --> WFConnection

Taking over master from node 1.

Node 1 has gone down!!!

passed the first scan in ipcname_pull_data

num_objects[MSG] = 0

num_objects[SEM] = 0

num_objects[SHM] = 0

ipcnameserver ready completed

drbd0: drbd_nodedown: Signaling receiver thread.

drbd0: drbd_set_state: (mdev->this_bdev->bd_contains == 0) in drivers/block/drbd/drbd_fs.c:702

drbd0: Secondary/Unknown --> Primary/Unknown

drbd0: Doing CLMS nodedown callback for service 9

EXT3-fs: INFO: recovery required on readonly filesystem.

EXT3-fs: write access will be enabled during recovery.

write handler down off 470000 len 10000

kjournald starting.  Commit interval 5 seconds

EXT3-fs: recovery complete.

EXT3-fs: mounted filesystem with ordered data mode.

fsck 1.35 (28-Feb-2004)

ERROR: Couldn't open /dev/null (No such file or directory)

e2fsck 1.35 (28-Feb-2004)

fsck.ext3: No such file or directory while trying to open /dev/drbd0

The superblock could not be read or does not describe a correct ext2

filesystem.  If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

    e2fsck -b 8193 <device>

EXT3 FS on drbd0, internal journal

/etc/init.d/rc.sysrecover  running

ssi-ntpsetrefclk: ntpd is not running; not setting refclk

INIT: version 2.86-SSI reloading

INIT: cannot execute "/sbin/getty"

INIT: Sending processes the TERM signal

INIT: Sending processes the KILL signal

INIT: Pid 131747 [id siR] seems to hang

/etc/init.d/rc.nodedown 1 running

fsck 1.35 (28-Feb-2004)

INIT: +++ nodedown completed on node 1

Any help, much appreciated!!!!!

Owen