Re: [OSR-users] Problem with Service configuration

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wednesday 14 January 2009 15:55:12 Stefano Elmopi wrote:
> Hi,
>
> I have managed to create a 1 node cluster
>
> [root@clu01 cluster]# clustat
> Cluster Status for cluster01 @ Wed Jan 14 17:16:47 2009
> Member Status: Quorate
>
>   Member Name                                           ID   Status
>   ------ ----                                           ---- ------
>   clu01                                                     1 Online,
> Local
>
> but now have the problems with service configuration.
> Before speaking of the Service problem,I ask you for another question.
> On another guide I read to do this command:
>
> com-mkcdsl -r /mnt/newroot -a /etc/sysconfig/network-scripts/ifcfg-eth0
>
> whereas in your guide says to delete it, what is the right thing ?
If it is started in the initrd (means referenced in the cluster configuration 
under com_info) you should delete this file. That's the best idea. Were are 
we talking about 
com-mkcdsl -r /mnt/newroot -a /etc/sysconfig/network-scripts/ifcfg-eth0? That 
should be removed.
>
> For the problem with Service configuration my cluster.conf is:
>
> <?xml version="1.0"?>
> <!DOCTYPE cluster SYSTEM "/opt/atix/comoonics-cs/xml/rh-cluster.dtd">
> <cluster config_version="2" name="cluster01">
>          <cman expected_votes="1" two_node="0">
>            <multicast addr="10.43.100.203"/>
|--------------------------------------^
What does this mean?
>          </cman>
>
>          <fence_daemon clean_start="1" post_fail_delay="0"
> post_join_delay="3"/>
>
>          <clusternodes>
>                  <clusternode name="clu01" votes="1" nodeid="1">
>                          <com_info>
>                                  <syslog name="clu01"/>
>                                  <rootvolume name="/dev/cciss/c0d0p8"
> fstype="ext3" mountopts="ro"/>
>                                  <eth name="eth0"
> mac="00:15:60:56:75:FD" ip="10.43.100.203" mask="255.255.0.0
> " gateway=""/>
>                                  <multicast addr="10.43.100.203"
> interface="eth0"/>
and this ?
>                          </com_info>
>                  </clusternode>
>          </clusternodes>
>
>          <rm log_level="7" log_facility="local4">
>              <failoverdomains>
>                  <failoverdomain name="failover" ordered="0">
>                          <failoverdomainnode name="clu01" priority="1"/>
>                  </failoverdomain>
>              </failoverdomains>
>              <resources>
>                  <ip address="10.43.100.203" monitor_link="1"/>
and this?
>                  <script file="/etc/init.d/httpd" name="httpd"/>
>              </resources>
>             <service autostart="0" domain="failover" name="HTTPD">
>                  <ip ref="10.43.100.203"/>
>                  <script ref="httpd"/>
>             </service>
>          </rm>
>
> </cluster>
>
> but when I start the rgmanager (/etc/init.d/rgmanager start), after a
> few seconds the server reboot !!
I think it's because of the ip you're setting up at the node and with 
rgmanager. The first thing rgmanager does is to stop the ip on all nodes. 
This causes the cluster to "reboot".
I would suppose a cluster.conf like as follows:
<?xml version="1.0"?>
<!DOCTYPE cluster SYSTEM "/opt/atix/comoonics-cs/xml/rh-cluster.dtd">
<cluster config_version="2" name="cluster01">
         <cman expected_votes="1" two_node="0"/>

         <fence_daemon clean_start="1" post_fail_delay="0"  
post_join_delay="3"/>

         <clusternodes>
                 <clusternode name="clu01" votes="1" nodeid="1">
                         <com_info>
                                 <syslog name="clu01"/>
                                 <rootvolume name="/dev/cciss/c0d0p8"  
fstype="ext3" mountopts="ro"/>
                                 <eth name="eth0"  
mac="00:15:60:56:75:FD" ip="10.43.100.203" mask="255.255.0.0
" gateway=""/>
                         </com_info>
                 </clusternode>
         </clusternodes>

         <rm log_level="7" log_facility="local4">
             <failoverdomains>
                 <failoverdomain name="failover" ordered="0">
                         <failoverdomainnode name="clu01" priority="1"/>
                 </failoverdomain>
             </failoverdomains>
             <resources>
<!--  Use a different ip. This is a service ip. That must be different to the 
one used by clusternode clu01 -->
<!--                 <ip address="10.43.100.203" monitor_link="1"/>-->
                 <script file="/etc/init.d/httpd" name="httpd"/>
             </resources>
            <service autostart="0" domain="failover" name="HTTPD">
<!--                 <ip ref="10.43.100.203"/>-->
                 <script ref="httpd"/>
            </service>
         </rm>

</cluster>

> Below a log of reboot:
>
> Jan 14 17:21:23 clu01 clurgmgrd[31140]: <notice> Resource Group
> Manager Starting
> Jan 14 17:21:23 clu01 clurgmgrd[31140]: <info> Loading Service Data
> Jan 14 17:21:23 clu01 clurgmgrd[31140]: <debug> Loading Resource Rules
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> 22 rules loaded
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> Building Resource Trees
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> 3 resources defined
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> Loading Failover Domains
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> 1 domains defined
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> 101 events defined
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <info> Initializing Services
> Jan 14 17:21:24 clu01 clurgmgrd[31140]: <debug> Initializing
> service:HTTPD
> Jan 14 17:21:24 clu01 clurgmgrd: [31140]: <info> Executing /etc/init.d/
> httpd stop
> Jan 14 17:21:24 clu01 clurgmgrd: [31140]: <info> Removing IPv4 address
> 10.43.100.203/16 from eth0
> Jan 14 17:21:24 clu01 openais[2474]: [TOTEM] Could not set traffic
> priority. (Bad file descriptor)
> Jan 14 17:21:24 clu01 openais[2474]: [TOTEM] The network interface is
> down.
> Jan 14 17:21:24 clu01 openais[2474]: [TOTEM] entering GATHER state
> from 15.
> Jan 14 17:21:29 clu01 openais[2474]: [TOTEM] entering GATHER state
> from 0.
> Jan 14 17:21:34 clu01 clurgmgrd[31140]: <info> Services Initialized
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] AIS Executive Service
> RELEASE 'subrev 1358 version 0.80.3'
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] Copyright (C) 2002-2006
> MontaVista Software, Inc and contributor
> s.
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] Copyright (C) 2006 Red
> Hat, Inc.
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] AIS Executive Service:
> started and ready to provide service.
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] openais component
> openais_cpg loaded.
> Jan 14 17:23:31 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais cluster closed process grou
> p service v1.01'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_cfg loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais configuration service'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_msg loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais message service B.01.01'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_lck loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais distributed locking service
>   B.01.01'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_evt loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais event service B.01.01'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_ckpt loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais checkpoint service B.01.01'
>
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_amf loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais availability management fra
> mework B.01.01'
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] openais component
> openais_clm loaded.
> Jan 14 17:23:32 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais cluster membership service
> B.01.01'
> Jan 14 17:23:33 clu01 openais[2470]: [MAIN ] openais component
> openais_evs loaded.
> Jan 14 17:23:33 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais extended virtual synchrony
> service'
> Jan 14 17:23:33 clu01 openais[2470]: [MAIN ] openais component
> openais_cman loaded.
> Jan 14 17:23:33 clu01 openais[2470]: [MAIN ] Registering service
> handler 'openais CMAN membership service 2.0
> 1'
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] Token Timeout (10000 ms)
> retransmit timeout (495 ms)
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] token hold (386 ms)
> retransmits before loss (20 retrans)
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] join (60 ms) send_join (0
> ms) consensus (4800 ms) merge (200 ms)
>
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] downcheck (1000 ms) fail
> to recv const (50 msgs)
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] seqno unchanged const (30
> rotations) Maximum network MTU 1500
> Jan 14 17:23:33 clu01 openais[2470]: [TOTEM] window size per rotation
> (50 messages) maximum messages per rota
> tion (17 messages)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] send threads (0 threads)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] RRP token expired timeout
> (495 ms)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] RRP token problem counter
> (2000 ms)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] RRP threshold (10 problem
> count)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] RRP mode set to none.
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM]
> heartbeat_failures_allowed (0)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] max_network_delay (50 ms)
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] HeartBeat is Disabled. To
> enable set heartbeat_failures_allowed
>
>  > 0
>
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] Receive multicast socket
> recv buffer size (262142 bytes).
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] Transmit multicast socket
> send buffer size (262142 bytes).
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] The network interface
> [10.43.100.203] is now up.
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] Created or loaded
> sequence id 164.10.43.100.203 for this ring.
> Jan 14 17:23:34 clu01 openais[2470]: [TOTEM] entering GATHER state
> from 15.
> Jan 14 17:23:34 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais extended virtual synchrony
>   service'
> Jan 14 17:23:34 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais cluster membership service
>   B.01.01'
> Jan 14 17:23:34 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais availability management fr
> amework B.01.01'
> Jan 14 17:23:34 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais checkpoint service B.01.01
> '
> Jan 14 17:23:34 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais event service B.01.01'
> Jan 14 17:23:35 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais distributed locking servic
> e B.01.01'
> Jan 14 17:23:35 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais message service B.01.01'
> Jan 14 17:23:35 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais configuration service'
> Jan 14 17:23:35 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais cluster closed process gro
> up service v1.01'
> Jan 14 17:23:35 clu01 openais[2470]: [SERV ] Initialising service
> handler 'openais CMAN membership service 2.
> 01'
> Jan 14 17:23:35 clu01 openais[2470]: [CMAN ] CMAN 2.0.84 (built Oct  5
> 2008 13:08:55) started
> Jan 14 17:23:35 clu01 openais[2470]: [SYNC ] Not using a virtual
> synchrony filter.
> Jan 14 17:23:35 clu01 openais[2470]: [TOTEM] Creating commit token
> because I am the rep.
> Jan 14 17:23:35 clu01 openais[2470]: [TOTEM] Saving state aru 0 high
> seq received 0
> Jan 14 17:23:35 clu01 openais[2470]: [TOTEM] Storing new sequence id
> for ring a8
> Jan 14 17:23:35 clu01 openais[2470]: [TOTEM] entering COMMIT state.
> Jan 14 17:23:35 clu01 openais[2470]: [TOTEM] entering RECOVERY state.
> Jan 14 17:23:36 clu01 openais[2470]: [TOTEM] position [0] member
> 10.43.100.203:
> Jan 14 17:23:36 clu01 openais[2470]: [TOTEM] previous ring seq 164 rep
> 10.43.100.203
> Jan 14 17:23:36 clu01 openais[2470]: [TOTEM] aru 0 high delivered 0
> received flag 1
> Jan 14 17:23:36 clu01 openais[2470]: [TOTEM] Did not need to originate
> any messages in recovery.
> Jan 14 17:23:36 clu01 openais[2470]: [TOTEM] Sending initial ORF token
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] New Configuration:
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] Members Left:
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] Members Joined:
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] CLM CONFIGURATION CHANGE
> Jan 14 17:23:36 clu01 openais[2470]: [CLM  ] New Configuration:
> Jan 14 17:23:37 clu01 openais[2470]: [CLM  ]    r(0) ip(10.43.100.203)
> Jan 14 17:23:37 clu01 openais[2470]: [CLM  ] Members Left:
> Jan 14 17:23:37 clu01 openais[2470]: [CLM  ] Members Joined:
> Jan 14 17:23:37 clu01 openais[2470]: [CLM  ]    r(0) ip(10.43.100.203)
> Jan 14 17:23:37 clu01 openais[2470]: [SYNC ] This node is within the
> primary component and will provide servi
> ce.
> Jan 14 17:23:37 clu01 openais[2470]: [TOTEM] entering OPERATIONAL state.
> Jan 14 17:23:37 clu01 openais[2470]: [CMAN ] quorum regained, resuming
> activity
> Jan 14 17:23:37 clu01 openais[2470]: [CLM  ] got nodejoin message
> 10.43.100.203
>
> Thanks !!
>
>
> Ing. Stefano Elmopi
> Gruppo Darco - Area ICT Sistemi
> Via Ostiense 131/L Corpo B, 00154 Roma
>
> cell. 3466147165
> tel.  0657060500
> email:ste...@so...
>
>
> ---------------------------------------------------------------------------
>--- This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> Open-sharedroot-users mailing list
> Ope...@li...
> https://lists.sourceforge.net/lists/listinfo/open-sharedroot-users


-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/