From: <lo...@tr...> - 2003-01-22 23:05:31
|
I copy down there two messages which made it work..though I am not totally sure it's your problem Lothar I just saw something like this on a cluster here and I fixed it. If you're having the same problem, the following patch should fix it for you. This only modifies the "beoserv" binary (which runs on the master and does things like server boot images and RARP responses) so you won't have to change the slave boot images or anything. Index: mcsend.c =================================================================== RCS file: /users/hendriks/repository/beoboot/mcsend.c,v retrieving revision 1.25 retrieving revision 1.26 diff -u -r1.25 -r1.26 --- mcsend.c 27 Aug 2002 16:25:13 -0000 1.25 +++ mcsend.c 17 Dec 2002 21:49:13 -0000 1.26 @@ -28,7 +28,7 @@ * negligence or otherwise) arising in any way out of the use of this * software, even if advised of the possibility of such damage. * - * $Id: mcsend.c,v 1.25 2002/08/27 16:25:13 hendriks Exp $ + * $Id: mcsend.c,v 1.26 2002/12/17 21:49:13 hendriks Exp $ *--------------------------------------------------------------------*/ #include <sys/time.h> #include <sys/types.h> @@ -1029,6 +1029,10 @@ } break; case SND_TIME_WAIT: + if (ifc->sendok > 0 && + !FD_ISSET(ifc->fd, wset) && sender_ready(s)) + FD_SET(ifc->fd, wset); + timeleft = SENDER_TIMEOUT - (now.tv_sec - s->lastuse); if (timeleft <= 0) { sender_discard(s); you'll have to rebuilt beoboot. Grab the source RPM (included in Clustermatic 3) and apply this patch to it. You'll have to do something like: rpm -i beoboot-....src.rpm rpmbuild -bp /usr/src/redhat/SPECS/beoboot.spec cd /usr/src/redhat/BUILD/beoboot-.... patch -p1 < patchfile make beoserv Then replace the beoserv in /usr/sbin with the one built there. See local Linux guru for more help on building stuff. :) - Erik > Erik A. Hendriks wrote: > > >On Mon, Dec 16, 2002 at 10:05:11AM -0800, lo...@tr... wrote: > > > > > >>Well that's how it goes. Looks to me as if the problem is > >>on the master side....but no idea what. > >> Lothar Florent Calvayrac wrote: > Hi > > I gave a try to clustermatic today, on a test cluster > made of 8 nodes with dual pentium 3 processors, > serverwork chipset with integrated eepro100 , + one eepro > on the master, myrinet 2000 boards and switch. > > Either with fast ethernet or myrinet the nodes > hang on the boot.img fetch after they got an > IP from RARP. I tried the mcastbcast hack, > crossover cables instead of our Cisco switch, > and boot over myrinet (with different MACs ) : > all the same, nothing happens > and tcpdump does not show anything after the RARP > resolution. > > I chose addresses in the 192.168.33 range in order > not to interfer with the campus here. > > Did I forget something in the redhat 8.0 security > settings ? I set no firewall and started all services... > > Please help > > thanks in advance > > |