From: Chris R. <chr...@ne...> - 2004-05-11 18:38:36
|
I also have the problem with the segfault in the evmsgui with a very = similar setup. I ran evmsgui -d debug and have attached that log. I am = running EVMS v2.3.2 and kernel 2.4.24=20 -----Original Message----- From: evm...@li... = [mailto:evm...@li...] On Behalf Of Steve = Dobbelstein Sent: Tuesday, May 11, 2004 9:21 AM To: Alexander Kordecki Cc: evm...@li...; evm...@li... Subject: Re: [Evms-cluster] evms_failover problems Alexander Kordecki <al...@ko...> wrote on 05/11/2004 07:23:43 AM: > Hi Steve ! > > Thanks for your reply. > > > You don't need admin_mode to do a failover. Which version of=20 > > heartbeat are > > you using? Earlier versions had a problem with producing a valid cluster > > membership when one node died. The membership services said there=20 > > was no > > membership even when one node was still alive. The EVMS Cluster Segment > > Manager (CSM) won't make changes to its containers unless it has a valid > > membership (or admin_mode is yes). If there is no valid membership, the > > CSM can't safely make changes since it can't coordinate the changes with > > the other nodes in the cluster. > > I use heartbeat 1.2.0 ... Is this enough ? That's what I'm using on one of my test clusters. Once in a while I = won't get a membership when I run evmsgui. Sometimes if I close then = restart evmsgui I get a membership. > > If you do not run in admin_mode, any containers that are deported or don't > > belong to the node will not be produced by the CSM. > > > > On my test cluster, if I run without admin_mode, the directory > > /dev/evms/.nodes/priv1 goes away on a "evms_failover priv1 stop". =20 > > If I run > > in admin_mode, the directory remains. > > > > Thats the same on my cluster, but in non_admin mode I couldn't get=20 > back the > container if it is deported. Neither on the node where i run the=20 > "evms_failover container stop" nor on the other node. > > Every time I try it, I get the message: "WARNING: Container <name>=20 > failed to > start" > > After this evms seems the behave curious ... It hangs, couldn't=20 > connect to > the other cluster, etc. I found a bug in the 2.3.2 version of evms_failover when I was trying to = reproduce your problem. The end symptom was that containers couldn't be = started. Sounds like your problem. Try this patch. (See attached file: evms_failover.patch) > > Hmmm. I can use evmsgui to change the container from deported to private, > > with or without admin_mode. Can you run "evmsgui -d debug" and then send > > me log, /var/log/evms-engine.log, (gzipped, since it's big) after=20 > > the segfault. I'll see if it has any clues as to what segfaulted. =20 > > You could > > also run evmsgui under gdb and then get a back trace after the segfault. > > > > I attached the log ... But sorry, I have no gdb installed. I'm looking through the log. I'll let you know if I find anything. Steve D. |