From: Erik A. H. <er...@he...> - 2002-05-29 15:13:07
|
On Wed, May 29, 2002 at 03:00:16PM +0100, Miguel Costa wrote: > Hello. > I've just subscribed to this mailing list and I'm sorry if this is the > worng place to ask this but a glance through the archives suggest my > problem is on-topic. > > I've been using Scyld for a few months and just began trying out > clustermatic. I already had redhat 7.2 installed on a separate partition > on the master so I installed Clustermatic according to the instructions > in README, which went fine. > > The problem is that when I boot a node from the CD or a floppy > (according to README), it hangs, I believe, when starting to download > the phase 2 image. It sends RARP requests, retrieves the right IP > address for its MAC address and hangs on the following line, something > like > > RARP: bproc=2223; (something I forgot, sorry); > file=/var/beowulf/boot.img Your problem may be the network. Some network switches don't pass multicast traffic fast enough for the boot image download to work. The boot file download works with this. If you say "mcastbcast ethX" in /etc/beowulf/config, it will tell beoserv to use broadcast instead of multicast. I haven't run into a switch that couldn't handle a lot of broadcast traffic. In general, I'd say as switches get fancier and more expensive they get more likely to choke on lots of multicast. Anyway, given what you've said, that's my guess. You can also run beoserv with a few "-v" to see what's going on. If you see file requests over and over again, this is probably it. Also if your network speed is different on the front end and the nodes (i.e. gig-E mixed with 100mbit) you might want to say mcastthrottle ethX YY where ethX is the interface and YY is the desired number of megabits per second. Improving this multicast protocol crap is somewhere on the TO-DO list. > (I did create boot.img for the right kernel according to README) > > i'm not using phase 1 from Scyld - that one, which is installed on the > node's hard drive, doesn't even retrieve the IP adress when I boot > clustermatic on the master The RARP step has been modified to include some extra information. The boot file download now uses multicast or broadcast so the Scyld boot images and our clustermatic stuff are basically incompatible. - Erik -- Erik Arjan Hendriks Printed On 100 Percent Recycled Electrons er...@he... Contents may settle during shipment |