From: Florent C. <Flo...@un...> - 2003-01-24 17:50:50
|
thank you very much Lothar. I do not think your patch helped, because this is how I managed to make Clustermatic work : since we are building a diskless cluster at the same time, and I was destroying floppy disks in my tries to make a working one, and moreover I got tired of waiting 2 minutes at each boot try, I decided to give a try to PXElinux. After some problems (the switch allowed ethernet ports to the nodes slower than PXE/dhcp would give up - thanks to the very speed in boot I was looking for - had to solve it bu setting spanning tree to fast and channel to off, as suggested by Cisco), I managed to load directly a phase 2 image by pxe. So everything seems to work ... thanks anyway ot...@tr... wrote: > I copy down there two messages which made it work..though I am not > totally sure it's > your problem > > Lothar > > > I just saw something like this on a cluster here and I fixed it. If > you're having the same problem, the following patch should fix it for > you. This only modifies the "beoserv" binary (which runs on the > master and does things like server boot images and RARP responses) so > you won't have to change the slave boot images or anything. > > Index: mcsend.c > =================================================================== > RCS file: /users/hendriks/repository/beoboot/mcsend.c,v > retrieving revision 1.25 > retrieving revision 1.26 > diff -u -r1.25 -r1.26 > --- mcsend.c 27 Aug 2002 16:25:13 -0000 1.25 > +++ mcsend.c 17 Dec 2002 21:49:13 -0000 1.26 > @@ -28,7 +28,7 @@ > * negligence or otherwise) arising in any way out of the use of this > * software, even if advised of the possibility of such damage. > * > - * $Id: mcsend.c,v 1.25 2002/08/27 16:25:13 hendriks Exp $ > + * $Id: mcsend.c,v 1.26 2002/12/17 21:49:13 hendriks Exp $ > *--------------------------------------------------------------------*/ > #include <sys/time.h> > #include <sys/types.h> @@ -1029,6 +1029,10 @@ > } > break; > case SND_TIME_WAIT: > + if (ifc->sendok > 0 && > + !FD_ISSET(ifc->fd, wset) && sender_ready(s)) > + FD_SET(ifc->fd, wset); > + > timeleft = SENDER_TIMEOUT - (now.tv_sec - s->lastuse); > if (timeleft <= 0) { > sender_discard(s); > > > > you'll have to rebuilt beoboot. Grab the source RPM (included in > Clustermatic 3) and apply this patch to it. > > You'll have to do something like: > > rpm -i beoboot-....src.rpm > > rpmbuild -bp /usr/src/redhat/SPECS/beoboot.spec > > cd /usr/src/redhat/BUILD/beoboot-.... > > patch -p1 < patchfile > > make beoserv > > > Then replace the beoserv in /usr/sbin with the one built there. See > local Linux guru for more help on building stuff. :) > > - Erik > >> Erik A. Hendriks wrote: >> >> >On Mon, Dec 16, 2002 at 10:05:11AM -0800, lo...@tr... wrote: >> > > >> >>Well that's how it goes. Looks to me as if the problem is >> >>on the master side....but no idea what. >> >> > > > > > Lothar > > > Florent Calvayrac wrote: > >> Hi >> >> I gave a try to clustermatic today, on a test cluster >> made of 8 nodes with dual pentium 3 processors, >> serverwork chipset with integrated eepro100 , + one eepro >> on the master, myrinet 2000 boards and switch. >> >> Either with fast ethernet or myrinet the nodes >> hang on the boot.img fetch after they got an >> IP from RARP. I tried the mcastbcast hack, >> crossover cables instead of our Cisco switch, >> and boot over myrinet (with different MACs ) : >> all the same, nothing happens >> and tcpdump does not show anything after the RARP >> resolution. >> >> I chose addresses in the 192.168.33 range in order >> not to interfer with the campus here. >> >> Did I forget something in the redhat 8.0 security >> settings ? I set no firewall and started all services... >> >> Please help >> >> thanks in advance >> >> > > -- Florent Calvayrac | Tel : 02 43 83 26 26 Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18 UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 |