From: steven j. <py...@li...> - 2002-10-28 22:22:16
|
Greetings, I've seen similar when a network card of driver had problems. It can also happen if memory isn't right. Have you tried running memtest86 on it (Recent memtest86 build with an elf image suitable for netbooting in etherboot). Which chipset/mainboard? G'day, sjames On 28 Oct 2002, Joshua J. England wrote: > Hello, > > //** THE SETUP ** > I've got a test cluster that works using bproc-3.1.9 (RH7.2 master node) > from the March ClusterMatic CD. I'm trying to build a new master node > (RH8.0) from source using bproc-3.2.2 with beoboot-lanl.1.3. Beowulf > starts up clean. > > Nodes all boot with linuxbios, so I don't need to muck with a phase 1 > kernel. > > The phase 2 kernel was built with: > 'beoboot -2 -n -o vmlinuz-beoboot'. > > > //** THE PROBLEM ** > When a slave boots, it gets stuck in an infinte loop like such: > while (1) { > // slave issues dhpc request > // slave does arp for master -- master responds > // dhcp serves up the kernel > // new in.tftpd process starts up on master > // slave starts the tftp download and downloads a few blocks > } > > I end up with tons of tftp daemons all trying to serve a single node, > and beoserv never receives a RARP. > > This seems detached from bproc master problems --stopping beowulf > produces the same effect. > > So the question is: has anyone seen this before? What is causing the > slave to continue to issue DHCP requests after the first request > seemingly succeeds? Everything works fine when using the 3.1.9 master > node. Is this merely another SUA (Stupid User Artifact) where the > answer should be blindingly obvious? > > Thanks for any help, > > -JE > ----------------------------------------------- > Josh England > Sandia National Laboratory, Livermore, CA > Distributed Information Systems > email: jj...@sa... > phone: (925) 294-2076 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |