Thread: [SSI-devel] Re: [SSI-users] Full HA with only 2 computers ?? --- Drbd root-failover HowTo
Brought to you by:
brucewalker,
rogertsang
From: Jaideep D. <Jai...@hp...> - 2004-05-21 22:42:23
|
I have tried drbd-root failover succesfully. I have compiled a tar-ball that includes a How-to, sample configuration files and the openssi-enabled drbd code. The process does require manual tweaking of the ramdisk since it is not yet integrated it with mkinitrd and installation. But the steps are pretty straightforward and outlined in the How-to. The tar ball is available at http://www.openssi.org/contrib/. I am working on a rpm that should install the modules and drbd utilities on an openssi cluster. Right now the tar-ball contains code that needs to be compiled and installed. Let me know if there are any questions and let us all know how it goes for you if you do end up doing drbd-failover :-). Jai. > > Eric Piollet wrote: > >> I have only 2 computers : >> The computer n°1 : openldap + sendmail (or later postfix) + imap + >> DNS + LAMPP on RH 9 (groupware applis) >> I would to have with my 2 computers full openssi : >> Services : I can have some benefits to use 2 nodes instead of one >> HA: Replication computer n°1 to computer n°2 -> *without a shared >> disk* but a little like drbd system. >> So If my computer n°1 is down , the computer n°2 can reboot with is >> own disk without lost my data. >> >> Is it possible at time ? >> >> >> >> > > I don't have a good answer for you, but I can tell you what I've tried > so far, and hopefully some others on the list with more knowledge of > OpenSSI will chime in. > > My first approach was to use DRBD to mirror the root filesystem (and > another filesystem) to the second node. However, I was never able to > figure out how to get the boot sequence to handle the mounting of a > root filesystem on a DRBD device, because the timing of the boot > process didn't match the timing of the DRBD device becoming available. > I know several people on the list are working on this approach, but I > haven't heard anything recently about the status of their efforts. I > also don't have a clear picture of how the failover would work. My > intent was to keep the root filesystem mirrored so that in the case of > the primary node's failure, the secondary would boot from its copy of > the primary's root filesystem (instead of booting from an Etherboot > CDROM, as it does otherwise), and should come up as though it were the > primary node. However, this still seems to have the problem that the > MAC addresses in /etc/clustertab would reflect the NICs in the old > primary. Nevertheless, this seems to be the best long-term approach, > and any comments from others on the list who are working on this would > be welcome. > > I've also considered using either ISCSI or Lustre with a separate > (probably non-SSI) machine as the root filesystem, but this represents > a single point of failure. I'm also not clear whether Lustre offers > any advantage over ISCSI here - it seems to add an unnecessary level > of complexity to the boot process. > > My current thinking is to mirror the primary's root filesystem to the > secondary via periodic rsyncs. I may be able to get away with this > because the systems should be fairly static once they're configured, > and there isn't much critical application data stored on the root. > Obviously this approach won't work for every application. The > advantage I see of doing it this way is that I don't have to deal with > the complexity of getting DRBD involved in the boot sequence, and I > can exclude the few files (/etc/clustertab is all I know about so far) > that should be kept un-mirrored on the secondary. I might still use > DRBD for non-root filesystems if I needed real mirroring. > > While this probably gets me a backup primary that can be brought up > fairly quickly in the case of a total failure of the original primary, > I'm still not clear on what I need to do to automate the failover. I > assume I need to modify the heartbeat scripts, and probably other boot > scripts, to force a reboot of the secondary node and restart > processes. Any pointers on which files I should be looking at would be > appreciated. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle > 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id149&alloc_id66&op=click > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > |
From: Thorn R. <tr...@in...> - 2004-05-21 23:38:16
|
Thanks very much for posting the tarball. However, I'm not sure how to proceed with my existing cluster which was installed without root failover enabled. Can I re-run the install script? Or are there just a few changes in the init scripts needed to enable failover? Also, which version of drbd did you use? And regarding the size of the index partition - is there a reason you recommend 198 instead of 128 MB as I recall the drbd documentation states? And should this number be multiplied by the number of filesystems I want to mirror? And should the index partition exist on the secondary node also? Thanks again for your help. Jaideep Dharap wrote: > > I have tried drbd-root failover succesfully. I have compiled a tar-ball > that includes a How-to, sample configuration files > and the openssi-enabled drbd code. The process does require manual > tweaking of the ramdisk since it is not yet integrated > it with mkinitrd and installation. But the steps are pretty > straightforward and outlined in the How-to. > The tar ball is available at > http://www.openssi.org/contrib/. > I am working on a rpm that should install the modules and drbd utilities > on an openssi cluster. Right now the tar-ball > contains code that needs to be compiled and installed. > Let me know if there are any questions and let us all know how it goes > for you if you do end up doing drbd-failover :-). > Jai. |
From: Jaideep D. <Jai...@hp...> - 2004-05-22 00:21:51
|
Thorn Roby wrote: > Thanks very much for posting the tarball. However, I'm not sure how to > proceed with my existing cluster which was installed without root > failover enabled. Can I re-run the install script? Or are there just a > few changes in the init scripts needed to enable failover? You dont need to run the install script again. There are basically three steps involved in changing a non-failover cluster to a failover cluster. 1. Run ssi-chnode to turn secondary node into a takeover CLMS master 2. Add the chard option to the root line in /etc/fstab 3. Run mkinitrd to create a new mkinitrd 4. Follow How-to to engineer ramdisk to do drbd-failover. 4. Run ssi-ksync to propogate new mkinitrd to all nodes. > > Also, which version of drbd did you use? drbd-0.7pre5 > And regarding the size of the index partition - is there a reason you > recommend 198 instead of 128 MB as I recall the drbd documentation > states? Sorry my bad. It is 128 Mb. > And should this number be multiplied by the number of filesystems I > want to mirror? I believe that is correct. I havent tried mirroring more than one partition myself. But each partition would require an index partition I would think. > And should the index partition exist on the secondary node also? Yes. The secondary node requires an index partition too. Thanks for your feedback :-). I will include the steps for conversion of a non-failover cluster to a fail-over cluster into the How-to. That should be definitely useful. Jai. > >> >> I have tried drbd-root failover succesfully. I have compiled a >> tar-ball that includes a How-to, sample configuration files >> and the openssi-enabled drbd code. The process does require manual >> tweaking of the ramdisk since it is not yet integrated >> it with mkinitrd and installation. But the steps are pretty >> straightforward and outlined in the How-to. >> The tar ball is available at >> http://www.openssi.org/contrib/. >> I am working on a rpm that should install the modules and drbd >> utilities on an openssi cluster. Right now the tar-ball >> contains code that needs to be compiled and installed. >> Let me know if there are any questions and let us all know how it >> goes for you if you do end up doing drbd-failover :-). >> Jai. > > > |
From: Jaideep D. <Jai...@hp...> - 2004-05-22 00:38:17
|
Jaideep Dharap wrote: > Thorn Roby wrote: > >> Thanks very much for posting the tarball. However, I'm not sure how >> to proceed with my existing cluster which was installed without root >> failover enabled. Can I re-run the install script? Or are there just >> a few changes in the init scripts needed to enable failover? > > > You dont need to run the install script again. > There are basically three steps involved in changing a non-failover > cluster to a failover cluster. > 1. Run ssi-chnode to turn secondary node into a takeover CLMS master > 2. Add the chard option to the root line in /etc/fstab > 3. Run mkinitrd to create a new mkinitrd > 4. Follow How-to to engineer ramdisk to do drbd-failover. > 4. Run ssi-ksync to propogate new mkinitrd to all nodes. Oh just as example for what I mean for step 2 above. Heres my fstab with the "chard" option. UUID=5de23b74-2690-48c1-af1e-42ec0eca2782 / ext3 *chard*,defaults,node=1:2 1 1 |
From: Brian J. W. <Bri...@hp...> - 2004-05-22 00:41:27
|
Jaideep Dharap wrote: > Thorn Roby wrote: > >> Thanks very much for posting the tarball. However, I'm not sure how to >> proceed with my existing cluster which was installed without root >> failover enabled. Can I re-run the install script? Or are there just a >> few changes in the init scripts needed to enable failover? > > > You dont need to run the install script again. > There are basically three steps involved in changing a non-failover > cluster to a failover cluster. > 1. Run ssi-chnode to turn secondary node into a takeover CLMS master As part of this, you'll need to select a partition on your second node as a local boot device. Then whenever ssi-ksync is run, a copy of the kernel, ramdisk, and grub.conf will be put here, so that the node can boot by itself without any other nodes being available. I think ssi-chnode will force you to do this, before it will let you make the node a potential CLMS master (i.e., a failover initnode). Brian |
From: Kilian C. <kil...@st...> - 2004-06-01 09:41:00
|
On Saturday 22 May 2004 02:21, Jaideep Dharap wrote: > > Also, which version of drbd did you use? > > drbd-0.7pre5 Hi, I have some problems with root drbd failover. I followed Jaideep HowTo to=20 transform my actual 5 nodes debian cluster into a drbd-enabled root=20 failover. But I have some hanging points: 1. The ssi-chnode script does not propose to change the init status of a=20 node. It just ask for IP, MAC, and {Ether/PXE}Boot. So what do I have to=20 change in my config files, in order to activate a second init node (except= =20 putting 1 in 'init' column of /etc/clustertab)? 2. How do I have to modify etc/fstab ? In debian, there's usually no UUID,= =20 but a plain "/dev/sda1 / etx3...", eg. So, do I have to replace /dev/sda1= =20 by /dev/nbd/0, or by the UUID of the device? 3. Enabling the nbd device, I get plenty of "attempt to access beyond end o= f=20 device" error messages, which result in a incomplete mount of the=20 partition. I read I have to add a "disk-size" option in drbd.conf (both in= =20 initrd and /etc). I did it, and got sort of "disk-size: unknown option"=20 error messages at boot, and no mount at all. Is it related to drbd version= =20 used here? How can I specifiy the size of my device? =46or the remaining, drbd synchronization works well, with my 'half-mounted= '=20 partition. :) TIA, Regards, =2D-=20 Kilian CAVALOTTI Ing=E9nieur Syst=E8mes & R= =E9seaux Laboratoire STIX =C9cole Polytechniq= ue =4691128 Palaiseau Tel : +33 1 69 33 34 = 95 |
From: En C. L. <en...@in...> - 2004-06-02 09:39:38
|
Hi. > 1. The ssi-chnode script does not propose to change the init status of a > node. It just ask for IP, MAC, and {Ether/PXE}Boot. So what do I have to > change in my config files, in order to activate a second init node (except > putting 1 in 'init' column of /etc/clustertab)? After you put 1 in the init colummn of clustertab, you need to update the ramdisk: /sbin/mkinitrd --tabonly <initrd> <kernel ver> > 2. How do I have to modify etc/fstab ? In debian, there's usually no UUID, > but a plain "/dev/sda1 / etx3...", eg. So, do I have to replace /dev/sda1 > by /dev/nbd/0, or by the UUID of the device? I've used /dev/nbd/0 in my fstab, but I think Jaideep tried it with the UUID. Jaideep? > 3. Enabling the nbd device, I get plenty of "attempt to access beyond end of > device" error messages, which result in a incomplete mount of the > partition. I read I have to add a "disk-size" option in drbd.conf (both in > initrd and /etc). I did it, and got sort of "disk-size: unknown option" > error messages at boot, and no mount at all. Is it related to drbd version > used here? How can I specifiy the size of my device? Yes, I'm not completely sure, but I think drbd 0.7 does not allow the disk-size option. drbd decides the size of the device by the minimum of the two devices specified in the drbd.conf. So, the error message you are seeing usually appears if the second device syncs after mkfs, and the device is smaller than the original device. Can you check if that is the case? Also, if you are layering drbd over an existing ext3 fs, then ensure that you have specified an external meta-disk. I hope that helps. En Chiang |
From: Andreas <ro...@co...> - 2004-05-24 13:45:51
|
Hello Jaideep Dharap wrote: > > I have tried drbd-root failover succesfully. I have compiled a tar-ball > that includes a How-to, sample configuration files > and the openssi-enabled drbd code. The process does require manual > tweaking of the ramdisk since it is not yet integrated > it with mkinitrd and installation. But the steps are pretty > straightforward and outlined in the How-to. > The tar ball is available at > http://www.openssi.org/contrib/. > I am working on a rpm that should install the modules and drbd utilities > on an openssi cluster. Right now the tar-ball > contains code that needs to be compiled and installed. > Let me know if there are any questions and let us all know how it goes > for you if you do end up doing drbd-failover :-). > Jai. > I was able to follow the instructions in the how-to. But now I have a problem. I have a configuration with two nodes (Debian). node 1 is the initnode. After node 2 boots the resync process is started. After that the cluster works fine. The problem I have is with the failover. After I turn off node 1 node 2 takes over. While recovering it starts the script rc.sysrecover I think that script must be updated two. For DEVICE it still calls findfs. I changed that line to DEVICE=/dev/nbd/0 and it works fine. Before I did that /etc/mtab was wrong because fix_mtab wasn't called. The output of df was NOTAVAIL 3842376 3113847 256799 90% / But like I said that was easy to fix. Was that correct? The next problem I have is that after the failover I try to reboot node 2 (the last remaining node in the cluster) and I get a kernel panic. That accurs when the system tries to unmount the lokal filesystems. Another problem I have is with the bootmanager. I still use lilo, but the problem is after the sync with the node 1 lilo does not work any longer. I think while the syncronisation the mbr of the disc of node 2 is changed so that lilo cannot work. After I start node 2 with a knoppix cd and call lilo again (after chroot) I works again. I hope anybody have some ideas how to help. Andreas >> >> Eric Piollet wrote: >> >>> I have only 2 computers : >>> The computer n°1 : openldap + sendmail (or later postfix) + imap + >>> DNS + LAMPP on RH 9 (groupware applis) I would to have with my 2 >>> computers full openssi : >>> Services : I can have some benefits to use 2 nodes instead of one >>> HA: Replication computer n°1 to computer n°2 -> *without a shared >>> disk* but a little like drbd system. >>> So If my computer n°1 is down , the computer n°2 can reboot with is >>> own disk without lost my data. >>> >>> Is it possible at time ? >>> >>> >>> >>> >> >> I don't have a good answer for you, but I can tell you what I've tried >> so far, and hopefully some others on the list with more knowledge of >> OpenSSI will chime in. >> >> My first approach was to use DRBD to mirror the root filesystem (and >> another filesystem) to the second node. However, I was never able to >> figure out how to get the boot sequence to handle the mounting of a >> root filesystem on a DRBD device, because the timing of the boot >> process didn't match the timing of the DRBD device becoming available. >> I know several people on the list are working on this approach, but I >> haven't heard anything recently about the status of their efforts. I >> also don't have a clear picture of how the failover would work. My >> intent was to keep the root filesystem mirrored so that in the case of >> the primary node's failure, the secondary would boot from its copy of >> the primary's root filesystem (instead of booting from an Etherboot >> CDROM, as it does otherwise), and should come up as though it were the >> primary node. However, this still seems to have the problem that the >> MAC addresses in /etc/clustertab would reflect the NICs in the old >> primary. Nevertheless, this seems to be the best long-term approach, >> and any comments from others on the list who are working on this would >> be welcome. >> >> I've also considered using either ISCSI or Lustre with a separate >> (probably non-SSI) machine as the root filesystem, but this represents >> a single point of failure. I'm also not clear whether Lustre offers >> any advantage over ISCSI here - it seems to add an unnecessary level >> of complexity to the boot process. >> >> My current thinking is to mirror the primary's root filesystem to the >> secondary via periodic rsyncs. I may be able to get away with this >> because the systems should be fairly static once they're configured, >> and there isn't much critical application data stored on the root. >> Obviously this approach won't work for every application. The >> advantage I see of doing it this way is that I don't have to deal with >> the complexity of getting DRBD involved in the boot sequence, and I >> can exclude the few files (/etc/clustertab is all I know about so far) >> that should be kept un-mirrored on the secondary. I might still use >> DRBD for non-root filesystems if I needed real mirroring. >> >> While this probably gets me a backup primary that can be brought up >> fairly quickly in the case of a total failure of the original primary, >> I'm still not clear on what I need to do to automate the failover. I >> assume I need to modify the heartbeat scripts, and probably other boot >> scripts, to force a reboot of the secondary node and restart >> processes. Any pointers on which files I should be looking at would be >> appreciated. >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: Oracle 10g >> Get certified on the hottest thing ever to hit the market... Oracle >> 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. >> http://ads.osdn.com/?ad_id149&alloc_id66&op=click >> _______________________________________________ >> Ssic-linux-users mailing list >> Ssi...@li... >> https://lists.sourceforge.net/lists/listinfo/ssic-linux-users >> >> > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click |
From: Jaideep D. <Jai...@hp...> - 2004-05-24 22:59:07
|
Andreas wrote: >> > I was able to follow the instructions in the how-to. But now I have a > problem. I have a configuration with two nodes (Debian). node 1 is the > initnode. After node 2 boots the resync process is started. After that > the cluster works fine. The problem I have is with the failover. After > I turn off node 1 node 2 takes over. While recovering it starts the > script rc.sysrecover I think that script must be updated two. For > DEVICE it still calls findfs. I changed that line to DEVICE=/dev/nbd/0 > and it works fine. Before I did that /etc/mtab was wrong because > fix_mtab wasn't called. The output of df was > > > NOTAVAIL 3842376 3113847 256799 90% / > > > But like I said that was easy to fix. Was that correct? That is correct. Thanks for pointing it out. I will add that to the How-to :-). > > The next problem I have is that after the failover I try to reboot > node 2 (the last remaining node in the cluster) and I get a kernel > panic. That accurs when the system tries to unmount the lokal > filesystems. Do you have the panic trace that we can look at? I havent seen this problem yet. > > Another problem I have is with the bootmanager. I still use lilo, but > the problem is after the sync with the node 1 lilo does not work any > longer. I think while the syncronisation the mbr of the disc of node 2 > is changed so that lilo cannot work. After I start node 2 with a > knoppix cd and call lilo again (after chroot) I works again. > OpenSSI supports grub and thats what I have been using here. If there is no particular reason you are running lilo, I would suggest changing to grub. If you absolutely have to use lilo for some reason. Try running /sbin/lilo on the second node manually after sync. The other thing that I should mention is related to /boot in general: 1. Preferably /boot should be its own partition seperate from the drbd-mirrored root partition. If /boot is part of the drbd-mirrorer partition I would suggest trying out the following. In /etc/clustertab remove the boot device from all nodes and leave the field vacant. This is because ssi-ksync has a particular way of syncing the boot partitions that wont work too well with a drbd-mirrored /boot. Jai. |
From: Andreas <ro...@co...> - 2004-05-25 07:49:27
|
Jaideep Dharap wrote: > Andreas wrote: > >>> >> I was able to follow the instructions in the how-to. But now I have a >> problem. I have a configuration with two nodes (Debian). node 1 is the >> initnode. After node 2 boots the resync process is started. After that >> the cluster works fine. The problem I have is with the failover. After >> I turn off node 1 node 2 takes over. While recovering it starts the >> script rc.sysrecover I think that script must be updated two. For >> DEVICE it still calls findfs. I changed that line to DEVICE=/dev/nbd/0 >> and it works fine. Before I did that /etc/mtab was wrong because >> fix_mtab wasn't called. The output of df was >> >> >> NOTAVAIL 3842376 3113847 256799 90% / >> >> >> But like I said that was easy to fix. Was that correct? > > > That is correct. Thanks for pointing it out. I will add that to the > How-to :-). > >> >> The next problem I have is that after the failover I try to reboot >> node 2 (the last remaining node in the cluster) and I get a kernel >> panic. That accurs when the system tries to unmount the lokal >> filesystems. > > > Do you have the panic trace that we can look at? I havent seen this > problem yet. Sorry no. And the last few times I tried the reboot it worked fine. > >> >> Another problem I have is with the bootmanager. I still use lilo, but >> the problem is after the sync with the node 1 lilo does not work any >> longer. I think while the syncronisation the mbr of the disc of node 2 >> is changed so that lilo cannot work. After I start node 2 with a >> knoppix cd and call lilo again (after chroot) I works again. >> > OpenSSI supports grub and thats what I have been using here. If there is > no particular reason you are running lilo, I would > suggest changing to grub. If you absolutely have to use lilo for some > reason. Try running /sbin/lilo on the second node manually after sync. I suppose it is the time to change to grub. I only use lilo because I have allways used it. > > The other thing that I should mention is related to /boot in general: > 1. Preferably /boot should be its own partition seperate from the There shouldn't be a problem with using a different partition for boot. Thank you. > drbd-mirrored root partition. If /boot is part of the > drbd-mirrorer partition I would suggest trying out the following. In > /etc/clustertab remove the boot device from all nodes and leave the > field vacant. This is because ssi-ksync has a particular way of syncing > the boot partitions that wont work too well with a drbd-mirrored /boot. There is another problem. If I boot the second node and the sync starts I get the following error. On node 2: ---------- drbd0: Resync started as target (need to sync 372748 KB). drbd0: logic error: sock_recvmsg returned 836 drbd0: receive_Data: (!e) in drbd_receive.c:948 drbd0: error receiving Data, I: 4112! drbd0: asender terminated drbd0: worker terminated drbd0: Connection lost. drbd0: Connection established. drbd0: Resync started as target (need to sync 284164 KB). bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. bug: kernel timer added twice at c908c4f6. On node 1: ---------- drbd0: Resync started as source (need to sync 372748 KB). drbd0: _drbd_send_page: size=4096 len=3260 sent=-4 drbd0: sock_endmsg returned -4 drbd0: short sent WriteHint size=8 sent=0 drbd0: short read expecting header on sock: r=-512 drbd0: asender terminated drbd0: worker terminated drbd0: Connection lost. Set_State Primary drbd0: Connection established. drbd0: Resync started as target (need to sync 284232 KB). drbd0: elapsed = 911 in drbd_main.c:753 Andreas > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle 10g. > Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click |