[CI] A Couple of fundamental questions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Sorry for more confusion, but I am having some trouble getting some 
things working.

I am trying out the CI software to see if it will work in a losely 
coupled cluster setup I am building. Using
ci-linux-2.4.18-v0.7.6 and cluster-tools-0.7.6.

I have two nodes (for now), identical, both have fresh RH7.3 installs 
and fresh 2.4.18 patched kernels.  The CI kernel patches are installed, 
and seem to be working.  cluster -v gives:

[root@b root]# cluster -V
Node 1:
	State:  UP
	Previous state:  COMINGUP
	Reason for last transition:  API
	Last transition ID:  4
	Last transition time:  Wed Mar  5 16:49:45.654512 2003
	First transition ID:  3
	First transition time:  Wed Mar  5 16:49:45.604512 2003
	Number of CPUs:  1
	Number of CPUs online:  1
Node 2:
	State:  UP
	Previous state:  COMINGUP
	Reason for last transition:  API
	Last transition ID:  2
	Last transition time:  Wed Mar  5 16:49:37.174512 2003
	First transition ID:  1
	First transition time:  Wed Mar  5 16:49:37.104512 2003
	Number of CPUs:  1
	Number of CPUs online:  1
[root@b root]#

Installed the cluster-tools ssi components spawndaemon and keepalive. 
Here come the questions:

At first, the /dev/keepalivecfg pipe was not created as part of install, 
so I made it by hand.   Also, I had to add the keepalive section into 
the inittab by hand as well.  Did I miss something in the install?

Now after a reboot, the CI stuff seems to be working ok, but an attempt 
to use the spawndaemon leaves the following log entry:

Mar  5 17:06:00 b spawndaemon[1051]: spawndaemon: Could not open pipe 
/dev/keepalivecfg.  Keepalive is not active.  Retrying ...

I have to find the running keepalive and kill it, let init restart it 
then it seems to be able to open the pipe.  Is this a problem in how I 
am starting keepalive?

And finally, does keepalive run on each node, or only once on the 
cluster?  Currently I am running keepalive on each node. If I run 
spawndaemon on node 1 and try and register a daemon to run on node 2, 
the keepalive on 1 registers a failure to start the daemon but it indeed 
does start, but on node 1. node 2 seems oblibious to the whole thing. 
Hmmmm.

[root@b log]# spawndaemon -L -v human keepalive
running: TRUE
quiesce flag: FALSE
pid: 1058
node number: -1
registered processes: 0
table size: 200
max. possible processes: 200
polling: FALSE
polling interval: 5
primary node: None
secondary node: None
[root@b log]#

It seems incorrect that the node number is -1....

Have I missed something fundamental in the configuration, or am I off in 
some other variable space?

TIA, r.