[SSI-devel] Re: [SSI-users] Full HA with only 2 computers ?? --- Drbd root-failover HowTo
Brought to you by:
brucewalker,
rogertsang
From: Jaideep D. <Jai...@hp...> - 2004-05-21 22:42:23
|
I have tried drbd-root failover succesfully. I have compiled a tar-ball that includes a How-to, sample configuration files and the openssi-enabled drbd code. The process does require manual tweaking of the ramdisk since it is not yet integrated it with mkinitrd and installation. But the steps are pretty straightforward and outlined in the How-to. The tar ball is available at http://www.openssi.org/contrib/. I am working on a rpm that should install the modules and drbd utilities on an openssi cluster. Right now the tar-ball contains code that needs to be compiled and installed. Let me know if there are any questions and let us all know how it goes for you if you do end up doing drbd-failover :-). Jai. > > Eric Piollet wrote: > >> I have only 2 computers : >> The computer n°1 : openldap + sendmail (or later postfix) + imap + >> DNS + LAMPP on RH 9 (groupware applis) >> I would to have with my 2 computers full openssi : >> Services : I can have some benefits to use 2 nodes instead of one >> HA: Replication computer n°1 to computer n°2 -> *without a shared >> disk* but a little like drbd system. >> So If my computer n°1 is down , the computer n°2 can reboot with is >> own disk without lost my data. >> >> Is it possible at time ? >> >> >> >> > > I don't have a good answer for you, but I can tell you what I've tried > so far, and hopefully some others on the list with more knowledge of > OpenSSI will chime in. > > My first approach was to use DRBD to mirror the root filesystem (and > another filesystem) to the second node. However, I was never able to > figure out how to get the boot sequence to handle the mounting of a > root filesystem on a DRBD device, because the timing of the boot > process didn't match the timing of the DRBD device becoming available. > I know several people on the list are working on this approach, but I > haven't heard anything recently about the status of their efforts. I > also don't have a clear picture of how the failover would work. My > intent was to keep the root filesystem mirrored so that in the case of > the primary node's failure, the secondary would boot from its copy of > the primary's root filesystem (instead of booting from an Etherboot > CDROM, as it does otherwise), and should come up as though it were the > primary node. However, this still seems to have the problem that the > MAC addresses in /etc/clustertab would reflect the NICs in the old > primary. Nevertheless, this seems to be the best long-term approach, > and any comments from others on the list who are working on this would > be welcome. > > I've also considered using either ISCSI or Lustre with a separate > (probably non-SSI) machine as the root filesystem, but this represents > a single point of failure. I'm also not clear whether Lustre offers > any advantage over ISCSI here - it seems to add an unnecessary level > of complexity to the boot process. > > My current thinking is to mirror the primary's root filesystem to the > secondary via periodic rsyncs. I may be able to get away with this > because the systems should be fairly static once they're configured, > and there isn't much critical application data stored on the root. > Obviously this approach won't work for every application. The > advantage I see of doing it this way is that I don't have to deal with > the complexity of getting DRBD involved in the boot sequence, and I > can exclude the few files (/etc/clustertab is all I know about so far) > that should be kept un-mirrored on the secondary. I might still use > DRBD for non-root filesystems if I needed real mirroring. > > While this probably gets me a backup primary that can be brought up > fairly quickly in the case of a total failure of the original primary, > I'm still not clear on what I need to do to automate the failover. I > assume I need to modify the heartbeat scripts, and probably other boot > scripts, to force a reboot of the secondary node and restart > processes. Any pointers on which files I should be looking at would be > appreciated. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: Oracle 10g > Get certified on the hottest thing ever to hit the market... Oracle > 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. > http://ads.osdn.com/?ad_id149&alloc_id66&op=click > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > > |