[SSI-devel] Re: [SSI-users] Full HA with only 2 computers ?? --- Drbd root-failover HowTo

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I have tried drbd-root failover succesfully. I have compiled a tar-ball 
that includes a How-to, sample configuration files
and the openssi-enabled drbd code. The process does require manual 
tweaking of the ramdisk since it is not yet integrated
it with mkinitrd and installation. But the steps are pretty 
straightforward and outlined in the How-to.
The tar ball is available at
     http://www.openssi.org/contrib/.
I am working on a rpm that should install the modules and drbd utilities 
on an openssi cluster. Right now the tar-ball
contains code that needs to be compiled and installed.
Let me know if there are any questions and let us all know how it goes 
for you if you  do end up doing drbd-failover :-).
                             Jai.

>
> Eric Piollet wrote:
>
>> I have only 2 computers :
>> The computer n°1 : openldap + sendmail (or later postfix) + imap + 
>> DNS + LAMPP on RH 9 (groupware applis)  
>> I would to have with my 2 computers full openssi :
>> Services : I can have some benefits to use 2 nodes instead of one
>> HA: Replication computer n°1 to computer n°2 -> *without a shared 
>> disk* but a little like drbd system.
>> So If my computer n°1 is down , the computer n°2 can reboot with is 
>> own disk without lost my data.
>>  
>> Is it possible at time ?
>>  
>>  
>>  
>>
>
> I don't have a good answer for you, but I can tell you what I've tried 
> so far, and hopefully some others on the list with more knowledge of 
> OpenSSI will chime in.
>
> My first approach was to use DRBD to mirror the root filesystem (and 
> another filesystem) to the second node. However, I was never able to 
> figure out how to get the boot sequence to handle the mounting of a 
> root filesystem on a DRBD device, because the timing of the boot 
> process didn't match the timing of the DRBD device becoming available. 
> I know several people on the list are working on this approach, but I 
> haven't heard anything recently about the status of their efforts. I 
> also don't have a clear picture of how the failover would work. My 
> intent was to keep the root filesystem mirrored so that in the case of 
> the primary node's failure, the secondary would boot from its copy of 
> the primary's root filesystem (instead of booting from an Etherboot 
> CDROM, as it does otherwise), and should come up as though it were the 
> primary node. However, this still seems to have the problem that the 
> MAC addresses in /etc/clustertab would reflect the NICs in the old 
> primary. Nevertheless, this seems to be the best long-term approach, 
> and any comments from others on the list who are working on this would 
> be welcome.
>
> I've also considered using either ISCSI or Lustre with a separate 
> (probably non-SSI) machine as the root filesystem, but this represents 
> a single point of failure. I'm also not clear whether Lustre offers 
> any advantage over ISCSI here - it seems to add an unnecessary level 
> of complexity to the boot process.
>
> My current thinking is to mirror the primary's root filesystem to the 
> secondary via periodic rsyncs. I may be able to get away with this 
> because the systems should be fairly static once they're configured, 
> and there isn't much critical application data stored on the root. 
> Obviously this approach won't work for every application. The 
> advantage I see of doing it this way is that I don't have to deal with 
> the complexity of getting DRBD involved in the boot sequence, and I 
> can exclude the few files (/etc/clustertab is all I know about so far) 
> that should be kept un-mirrored on the secondary. I might still use 
> DRBD for non-root filesystems if I needed real mirroring.
>
> While this probably gets me a backup primary that can be brought up 
> fairly quickly in the case of a total failure of the original primary, 
> I'm still not clear on what I need to do to automate the failover. I 
> assume I need to modify the heartbeat scripts, and probably other boot 
> scripts, to force a reboot of the secondary node and restart 
> processes. Any pointers on which files I should be looking at would be 
> appreciated.
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 
> 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
> http://ads.osdn.com/?ad_id149&alloc_id66&op=click
> _______________________________________________
> Ssic-linux-users mailing list
> Ssi...@li...
> https://lists.sourceforge.net/lists/listinfo/ssic-linux-users
>
>