Hi,
Ok that sounds to be a good idea.For cluster we will give an upgrade
to packages initscripts in redhat and sysvinit in Debian.( may be
initscripts-cluster and sysvinit-cluster ) We can also ask the user to
install this as a part of creating cluster.
Regarding /var/run/service_name.pid. The general logic used by many of
the xinetd services are
see whether service_name.pid exist . If then read the PID.Do
kill(pid,0) to see if the service is really running. If so exit if not
recreate the service_name.pid file with the new pid entry.
In our case with clusterwide signaling we will be able to signal any
application running on other nodes. Now if I start any service that is
multi instance( That is running on all the nodes. May be a load balanced
web server )we need to make sure that the server on node1 and server on
node2 reads different service_name.pid Or they change the above logic
explained above( Which is guess is going to be tough job ).
I guess we need to make sure that each node sees different /var/run/
directory so that we can start these servers on all the nodes at the
same time without modifying any server code.
-aneesh
On Mon, 2002-06-10 at 23:23, David B. Zafman wrote:
>
> Below is something I wrote up last week, but was waiting for Bruce to
> comment on it before sending it out. Now that I see what you've done
> with /etc/init.d/clusterinit, I thought I'd send this out. I will
> examine what you've done today.
>
> Last week I did something similiar. I wasn't as concerned with the
> dependent node networking, but I wanted to replace rc.sysinit for
> dependent nodes only. I copied the redhat rc.sysinit to
> rc.sysinit.nodeup and removed all the things which the dependent nodes
> should not be duplicating. I also removed the execution of rc for run
> level 3 from rc.nodeup. Keep in mind that only the first booting node
> runs rc.sysinit just like base linux. Since only dependent nodes run
> rc.nodeup, only the dependent nodes run rc.sysinit.nodeup.
>
> ---------
>
> You've brought up an important architectural issue. Once there is a
> single root it requires clusterization to have duplicate services
> running. One way to clusterize things is adding context dependent links
> (i.e. /var/run as you proposed for the *.pid files).
>
> The current set-up of having rc.nodeup call rc.sysinit then running
> complete rc 3 runlevel processing was fine when we had a non-shared
> root. Now with CFS and GFS we really need to NOT do this. Looking at
> rc.sysinit on a redhat install, I see that it does all sorts of stuff
> which should NOT be done again on a joining node in the shared-root case.
>
> In a cluster there would generally be two kinds of services. The first
> kind is a single instance of the service (single process or set of
> processes on one node) running with keepalive to restart it on node
> failures. The second kind is the service that is cluster aware, so that
> processes could exist on multiple nodes, but they cooperate with each
> other. In non-stop clusters we parallelized inetd, for example. It
> maintained processes on all nodes, and kept a list of pids which it
> updated as nodes came and went.
>
> The whole /var/run/service_name.pid mechanism I would propose is only
> used for non-cluster aware serives which are restricted to running on
> the root node, but may be restarted on node failure. It is assumed that
> to restart the service we might have to remove the .pid file and then on
> (re)start the service would create the file again with the new pid.
>
>
> Aneesh Kumar K.V wrote:
>
> > Hi,
> >
> > I guess we need to have node specific /var/run directory also.
> >
> > Otherwise on debian some sevices may not come up on node2. They check
> > /var/run/service_name.pid file to see whether the service is already
> > running or not.
> >
> > That make it for debian /etc/init.d/rcS add these lines before doing
> > the for loop show below
> >
> > #
> > # Cluster specific remounts.
> > #
> > #
> > mount --bind /etc/network-`/usr/sbin/clusternode_num` /etc/network
> > mount --bind /run-`/usr/sbin/clusternode_num` /var/run
> >
> > #
> > # Call all parts in order.
> > #
> > for i in /etc/rcS.d/S??*
> >
> >
> > -aneesh
> >
> >
> >
> >
> >
> > _______________________________________________________________
> >
> > Don't miss the 2002 Sprint PCS Application Developer's Conference
> > August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm
> >
> > _______________________________________________
> > ssic-linux-devel mailing list
> > ssic-linux-devel@...
> > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel
> >
> >
>
>
> --
> David B. Zafman | Hewlett-Packard Company
> Linux Kernel Developer | Open SSI Clustering Project
> mailto:david.zafman@... | http://www.hp.com
> "Thus spake the master programmer: When you have learned to snatch
> the error code from the trap frame, it will be time for you to leave."
>
>
>
>
> _______________________________________________________________
>
> Don't miss the 2002 Sprint PCS Application Developer's Conference
> August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
>
> _______________________________________________
> ssic-linux-devel mailing list
> ssic-linux-devel@...
> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel
|