Re: [SSI] Converting a non-shared root SSI cluster to use CFS

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Below is something I wrote up last week, but was waiting for Bruce to 
comment on it before sending it out.  Now that I see what you've done 
with /etc/init.d/clusterinit, I thought I'd send this out.  I will 
examine what you've done today.

Last week I did something similiar.  I wasn't as concerned with the 
dependent node networking, but I wanted to replace rc.sysinit for 
dependent nodes only.  I copied the redhat rc.sysinit to 
rc.sysinit.nodeup and removed all the things which the dependent nodes 
should not be duplicating.  I also removed the execution of rc for run 
level 3 from rc.nodeup.  Keep in mind that only the first booting node 
runs rc.sysinit just like base linux.  Since only dependent nodes run 
rc.nodeup, only the dependent nodes run rc.sysinit.nodeup.

---------

You've brought up an important architectural issue.  Once there is a
single root it requires clusterization to have duplicate services
running.  One way to clusterize things is adding context dependent links
   (i.e. /var/run as you proposed for the *.pid files).

The current set-up of having rc.nodeup call rc.sysinit then running
complete rc 3 runlevel processing was fine when we had a non-shared
root.  Now with CFS and GFS we really need to NOT do this.  Looking at
rc.sysinit on a redhat install, I see that it does all sorts of stuff
which should NOT be done again on a joining node in the shared-root case.

In a cluster there would generally be two kinds of services.  The first
kind is a single instance of the service (single process or set of
processes on one node) running with keepalive to restart it on node
failures.  The second kind is the service that is cluster aware, so that
processes could exist on multiple nodes, but they cooperate with each
other.  In non-stop clusters we parallelized inetd, for example. It
maintained processes on all nodes, and kept a list of pids which it
updated as nodes came and went.

The whole /var/run/service_name.pid mechanism I would propose is only
used for non-cluster aware serives which are restricted to running on
the root node, but may be restarted on node failure.  It is assumed that
to restart the service we might have to remove the .pid file and then on
(re)start the service would create the file again with the new pid.

Aneesh Kumar K.V wrote:

  > Hi,
  >
  > I guess we need to have node specific /var/run directory also.
  >
  > Otherwise on debian  some sevices may not come up on node2. They check
  > /var/run/service_name.pid file to see whether the service is already
  > running or not.
  >
  > That make it for debian /etc/init.d/rcS  add these lines before doing
  > the for loop  show below
  >
  > #
  > # Cluster specific remounts.
  > #
  > #
  > mount --bind /etc/network-`/usr/sbin/clusternode_num` /etc/network
  > mount --bind /run-`/usr/sbin/clusternode_num` /var/run
  >
  > #
  > #       Call all parts in order.
  > #
  > for i in /etc/rcS.d/S??*
  >
  >
  >  -aneesh
  >
  >
  >
  >
  >
  > _______________________________________________________________
  >
  > Don't miss the 2002 Sprint PCS Application Developer's Conference
  > August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm
  >
  > _______________________________________________
  > ssic-linux-devel mailing list
  > ssi...@li...
  > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel
  >
  >

-- 
David B. Zafman                            | Hewlett-Packard Company
Linux Kernel Developer                  | Open SSI Clustering Project
mailto:dav...@hp...        | http://www.hp.com
"Thus spake the master programmer:  When you have learned to snatch
the error code from the trap frame, it will be time for you to leave."