Re: [BProc] boot.img never comes on clustermatic

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I copy down there two messages which made it work..though I am not 
totally sure it's
your problem

Lothar


I just saw something like this on a cluster here and I fixed it.  If
you're having the same problem, the following patch should fix it for
you.  This only modifies the "beoserv" binary (which runs on the
master and does things like server boot images and RARP responses) so
you won't have to change the slave boot images or anything.

Index: mcsend.c
===================================================================
RCS file: /users/hendriks/repository/beoboot/mcsend.c,v
retrieving revision 1.25
retrieving revision 1.26
diff -u -r1.25 -r1.26

--- mcsend.c	27 Aug 2002 16:25:13 -0000	1.25
+++ mcsend.c	17 Dec 2002 21:49:13 -0000	1.26
@@ -28,7 +28,7 @@
  * negligence or otherwise) arising in any way out of the use of this
  * software, even if advised of the possibility of such damage.
  *
- *  $Id: mcsend.c,v 1.25 2002/08/27 16:25:13 hendriks Exp $
+ *  $Id: mcsend.c,v 1.26 2002/12/17 21:49:13 hendriks Exp $
  *--------------------------------------------------------------------*/
 #include <sys/time.h>
 #include <sys/types.h> 
@@ -1029,6 +1029,10 @@
 		}
 		break;
 	    case SND_TIME_WAIT:
+		if (ifc->sendok > 0 &&
+		    !FD_ISSET(ifc->fd, wset) && sender_ready(s))
+		    FD_SET(ifc->fd, wset);
+
 		timeleft = SENDER_TIMEOUT - (now.tv_sec - s->lastuse);
 		if (timeleft <= 0) {
 		    sender_discard(s);



you'll have to rebuilt beoboot.  Grab the source RPM (included in
Clustermatic 3) and apply this patch to it.

You'll have to do something like:

rpm -i beoboot-....src.rpm

rpmbuild -bp /usr/src/redhat/SPECS/beoboot.spec

cd  /usr/src/redhat/BUILD/beoboot-....

patch -p1 < patchfile

make beoserv


Then replace the beoserv in /usr/sbin with the one built there.  See
local Linux guru for more help on building stuff. :)

- Erik

> Erik A. Hendriks wrote:
> 
> >On Mon, Dec 16, 2002 at 10:05:11AM -0800, lo...@tr... wrote:
> >  
> >
> >>Well that's how it goes. Looks to me as if the problem is
> >>on the master side....but no idea what.
> >>    



Lothar


Florent Calvayrac wrote:

> Hi
>
> I gave a try to clustermatic today, on a test cluster
> made of 8 nodes with dual pentium 3 processors,
> serverwork chipset with integrated eepro100 , + one eepro
> on the master, myrinet 2000 boards and switch.
>
> Either with fast ethernet or myrinet the nodes
> hang on the boot.img fetch after they got an
> IP from RARP. I tried the mcastbcast hack,
> crossover cables instead of our Cisco switch,
> and boot over myrinet (with different MACs )  :
>  all the same, nothing happens
> and tcpdump does not show anything after the RARP
> resolution.
>
> I chose addresses in the 192.168.33 range in order
> not to interfer with the campus here.
>
> Did I forget something in the redhat 8.0 security
> settings ? I set no firewall and started all services...
>
> Please help
>
> thanks in advance
>
>