From: Russ G. <gr...@la...> - 2003-03-06 00:46:43
|
Sorry for more confusion, but I am having some trouble getting some things working. I am trying out the CI software to see if it will work in a losely coupled cluster setup I am building. Using ci-linux-2.4.18-v0.7.6 and cluster-tools-0.7.6. I have two nodes (for now), identical, both have fresh RH7.3 installs and fresh 2.4.18 patched kernels. The CI kernel patches are installed, and seem to be working. cluster -v gives: [root@b root]# cluster -V Node 1: State: UP Previous state: COMINGUP Reason for last transition: API Last transition ID: 4 Last transition time: Wed Mar 5 16:49:45.654512 2003 First transition ID: 3 First transition time: Wed Mar 5 16:49:45.604512 2003 Number of CPUs: 1 Number of CPUs online: 1 Node 2: State: UP Previous state: COMINGUP Reason for last transition: API Last transition ID: 2 Last transition time: Wed Mar 5 16:49:37.174512 2003 First transition ID: 1 First transition time: Wed Mar 5 16:49:37.104512 2003 Number of CPUs: 1 Number of CPUs online: 1 [root@b root]# Installed the cluster-tools ssi components spawndaemon and keepalive. Here come the questions: At first, the /dev/keepalivecfg pipe was not created as part of install, so I made it by hand. Also, I had to add the keepalive section into the inittab by hand as well. Did I miss something in the install? Now after a reboot, the CI stuff seems to be working ok, but an attempt to use the spawndaemon leaves the following log entry: Mar 5 17:06:00 b spawndaemon[1051]: spawndaemon: Could not open pipe /dev/keepalivecfg. Keepalive is not active. Retrying ... I have to find the running keepalive and kill it, let init restart it then it seems to be able to open the pipe. Is this a problem in how I am starting keepalive? And finally, does keepalive run on each node, or only once on the cluster? Currently I am running keepalive on each node. If I run spawndaemon on node 1 and try and register a daemon to run on node 2, the keepalive on 1 registers a failure to start the daemon but it indeed does start, but on node 1. node 2 seems oblibious to the whole thing. Hmmmm. [root@b log]# spawndaemon -L -v human keepalive running: TRUE quiesce flag: FALSE pid: 1058 node number: -1 registered processes: 0 table size: 200 max. possible processes: 200 polling: FALSE polling interval: 5 primary node: None secondary node: None [root@b log]# It seems incorrect that the node number is -1.... Have I missed something fundamental in the configuration, or am I off in some other variable space? TIA, r. |