From: <er...@he...> - 2004-09-03 19:26:27
|
On Tue, Aug 24, 2004 at 04:01:23PM -0700, Vipul Deokar wrote: > Folks, > > I have a configuration of 4 identical "compute" nodes > (with disks) and 1 slightly different "master" node. > The master node has a slightly more powerful CPU, more > RAM, an additonal 10/100 NIC interface to connect to > extranet, a CDROM drive on secondary IDE, a more > recent verison of BIOS firmware. > > With Red Hat 9 (runlevel 3) and Clustermatic4 i386 > RPMS installed on computeNode#1 as master, I can build > a cluster with the other 4 nodes as slaves. (I see one > node rebooting every 5-6 minutes that I need to > debug). I am using this cluster currently. > > > However, with RedHat 9 (runlevel 3 or 5; more > packages) and Clustermatic4 i386 RPMS installed on the > "master", I cannot get any of the "compute" nodes up > as slaves. The node_up script fails on all nodes with > : > nodeup : Starting 1 child processes. > nodeup : Finished creating child processes. > nodeup : I/O error talking to child > nodeup : Child process for node 1 died with signal > 4 > > > The same config, node_up, config.boot scripts execute > in both cluster configuration attempts - one succeeds > and the other fails. Any insight why this would > happen? Signal 4 = SIGILL. This usually happens when the front end node is some newer rev of processor than the back end nodes. Red Hat is pretty good about installing the best glibc it can on the front end. (e.g. install the i686 version instead of the i386 one) I usually see something like this when the slave node is some other CPU type that doesn't qualify as i686. The issue is really that the library gets loaded on the front end and those instructions turn out to not be valid on the destination node. My work around is to downgrade the glibc on the front end to one that will work on all nodes. In other words, load the i386 one instead of the i686 one. - Erik |