From: Joshua J. E. <jj...@sa...> - 2002-10-28 21:30:56
|
Hello, //** THE SETUP ** I've got a test cluster that works using bproc-3.1.9 (RH7.2 master node) from the March ClusterMatic CD. I'm trying to build a new master node (RH8.0) from source using bproc-3.2.2 with beoboot-lanl.1.3. Beowulf starts up clean. Nodes all boot with linuxbios, so I don't need to muck with a phase 1 kernel. The phase 2 kernel was built with: 'beoboot -2 -n -o vmlinuz-beoboot'. //** THE PROBLEM ** When a slave boots, it gets stuck in an infinte loop like such: while (1) { // slave issues dhpc request // slave does arp for master -- master responds // dhcp serves up the kernel // new in.tftpd process starts up on master // slave starts the tftp download and downloads a few blocks } I end up with tons of tftp daemons all trying to serve a single node, and beoserv never receives a RARP. This seems detached from bproc master problems --stopping beowulf produces the same effect. So the question is: has anyone seen this before? What is causing the slave to continue to issue DHCP requests after the first request seemingly succeeds? Everything works fine when using the 3.1.9 master node. Is this merely another SUA (Stupid User Artifact) where the answer should be blindingly obvious? Thanks for any help, -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 |