Re: [SSI-users] DRBD Root Failover with Fedora Core 2

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

As a follow-up to my previous post, I have determined that Fedora Core 2 
refuses to start the drbd service on Node2.

Why is that, and what can I do to correct it?  Specifically, "service 
drbd start" returns an error about "rc.nodeinfo".  This file is present, 
and when I manually add the line "drbd   all   Y", I can then issue the 
"onnode 2 service drbd start", (which then attempts to start the service 
on *every* node!)  But when it did start on Node2, it began to sync with 
Node1!  Yay!

So, here are my final hurdles:

1) How can I make Fedora start drbd services automatically at boot?
2) How can I make a service, like httpd, start on a different node (say, 
node 3)?  Is that necessary, or will bash_ll handle this for me?
3) If node 3 dies, or gets abducted by aliens, will node 2 fire httpd 
back up automatically, since the process fell off?

John David wrote:

> Hi all,
>
> I have a 2-node cluster up and running using OpenSSI-1.2.2 and Fedora 
> Core 2, but I am running into a few bumps getting drbd to do anything:
>
> 1.  I have installed drbd successfully on Node1, and it boots properly 
> (although it does do a lot of waiting and counting before actually 
> booting).  For example, it says that it is waiting for a root node for 
> a very long time, then when that is over, it waits for 120 seconds 
> more to see a peer node, (which I can escape by typing "yes").  
> Although this is not a problem right now, Node2's etherboot fails 
> after waiting so long.  Node2 will connect if I reboot it after Node 1 
> has finished, but that's awfully tempermental for a HA server, eh?  
> That essentially means a power outage will always result in Node1 
> being the only one up.  I'm hoping that one of my other problems is 
> the reason for this delay in booting up.
>
> 2.  Node1 likes drbd, but Node2 doesn't have any idea what drbd is:
>
> Node1 offers information through /proc/drbd:
>   [root@node1 root]# cat /proc/drbd
>   version: 0.7.11 (api:77/proto:74)
>   SVN Revision: 1799 build by root@node1, 2005-11-22 19:23:24
>   0: cs:WFConnection st:Primary/Unknown ld:Consistent
>   ns:0 nr:0 dw:59196 dr:31353 al:0 bm:63 lo:0 pe:0 ua:0 ap:0
>
> Node2 remains angry and uncooperative:
>   [root@node2 root]# cat /proc/drbd
>   cat: /proc/drbd: No such file or directory
>
> Node1 appears to have mounted the drbd device properly (but I have no 
> idea how this is supposed to look).  The results for the "mount" 
> command are identical for both nodes:
>   /dev/1/drbd/0 on / type ext3 (rw,chard)
>   none on /proc type proc (rw)
>   none on /sys type sysfs (rw)
>   none on /dev/pts type devpts (rw,gid=5,mode=620)
>   devfs on /dev type devfs (rw)
>   /dev/1/hda1 on /boot type ext3 (rw)
>
> Node1 has a /dev/drbd/0:
>  [root@node1 root]# ls -asl /dev/drbd/0
>  0 brw-------  1 root root 147, 0 Dec 31  1969 /dev/drbd/0
>
> Node2 does not:
>  [root@node2 root]# ls -asl /dev/drbd/0
>  ls: /dev/drbd/0: No such file or directory
>
> This is really strange to me, because if they are booting using the 
> same initrd image (updated by ssi-ksync), why isn't Node2 loading the 
> drbd module and behaving more like Node1?  Moreover, if this node is 
> supposed to perform root failover, it would need to provide drbd 
> service to the other nodes -- which it obviously isn't ready to do if 
> it doesn't have a /dev/drbd/0 device on boot.