From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:08:04
|
I think I'm getting very close now. I'm finally catching some RARPs with beoserv when a slave boots, although the slave dies pretty quickly. The last thing seen on the slave is: boot: Server IP address: 10.0.4.100 boot: My IP address : 10.0.4.10 boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 bpslave: IO daemon started; pid=11 beoserv on the master shows: beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 beoserv: Starting node_up worker for 1 clients. nodeup : Child process for node 0 died with signal 4 I'm booting from an elf image created from a standard bproc kernel, along with the initrd created by 'beoboot -2'. Is this considered a badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? Also, what is the role of the 'bootfile' parameter in /etc/beowulf/config? It looks like beoserv feeds it to a slave after a RARP request, but changing it seems to have no effect. Sorry for the onslaught of questions, free beer at SC for all who help. :) -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 |
From: steven j. <py...@li...> - 2002-11-01 00:21:14
|
Greetings, I regularly use the initrd from beoboot in the elf image. It's not that. Shooting in the dark: any chance the kernel or modules are compiled fro the wrong kind of processor (K7 on a P 4 for example)? If possable, try serial console on the slave to see if there's an OOPS associated with the failure. G'day, sjames On 31 Oct 2002, Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > bpslave: IO daemon started; pid=11 > > beoserv on the master shows: > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > beoserv: Starting node_up worker for 1 clients. > nodeup : Child process for node 0 died with signal 4 > > > I'm booting from an elf image created from a standard bproc kernel, > along with the initrd created by 'beoboot -2'. Is this considered a > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > Also, what is the role of the 'bootfile' parameter in > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > RARP request, but changing it seems to have no effect. > > > Sorry for the onslaught of questions, free beer at SC for all who help. > :) > > -JE > ----------------------------------------------- > Josh England > Sandia National Laboratory, Livermore, CA > Distributed Information Systems > email: jj...@sa... > phone: (925) 294-2076 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Influence the future > of Java(TM) technology. Join the Java Community > Process(SM) (JCP(SM)) program now. > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:33:19
|
I'm looking on the console -- no OOPS, and this kernel is compiled for Pentium-Classic to be sure of compatibility even though these chips are PII. I haven't tried commenting stuff out in node_up.conf yet, just because it might make matters worse. There are some references to a bproc-aware nsswitch, but I don't see the libs for that anywhere. Could that possibly be the problem? -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 On Thu, 2002-10-31 at 04:20, steven james wrote: > Greetings, > > I regularly use the initrd from beoboot in the elf image. It's not that. > > Shooting in the dark: any chance the kernel or modules are compiled fro > the wrong kind of processor (K7 on a P 4 for example)? > > If possable, try serial console on the slave to see if there's an OOPS > associated with the failure. > > G'day, > sjames > > > On 31 Oct 2002, Joshua J. England wrote: > > > > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > > > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > bpslave: IO daemon started; pid=11 > > > > beoserv on the master shows: > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > beoserv: Starting node_up worker for 1 clients. > > nodeup : Child process for node 0 died with signal 4 > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > along with the initrd created by 'beoboot -2'. Is this considered a > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > > > Also, what is the role of the 'bootfile' parameter in > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > RARP request, but changing it seems to have no effect. > > > > > > Sorry for the onslaught of questions, free beer at SC for all who help. > > :) > > > > -JE > > ----------------------------------------------- > > Josh England > > Sandia National Laboratory, Livermore, CA > > Distributed Information Systems > > email: jj...@sa... > > phone: (925) 294-2076 > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: Influence the future > > of Java(TM) technology. Join the Java Community > > Process(SM) (JCP(SM)) program now. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > _______________________________________________ > > BProc-users mailing list > > BPr...@li... > > https://lists.sourceforge.net/lists/listinfo/bproc-users > > > > -- > -------------------------steven james, director of research, linux labs > ... ........ ..... .... 230 peachtree st nw ste 701 > the original linux labs atlanta.ga.us 30303 > -since 1995 http://www.linuxlabs.com > office 404.577.7747 fax 404.577.7743 > ----------------------------------------------------------------------- > > |
From: <jam...@ab...> - 2002-11-01 00:25:50
|
On 2002.11.01 Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 Why with -i ??? -- J.A. Magallon <jam...@ab...> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-rc1-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-2mdk)) |
From: <er...@he...> - 2002-11-01 00:30:03
|
On Fri, Nov 01, 2002 at 01:25:43AM +0100, J.A. Magall=F3n wrote: >=20 > On 2002.11.01 Joshua J. England wrote: > >=20 > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > >=20 > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 >=20 > Why with -i ??? That way the decision to ignore version mismatches is up to the master node. Otherwise, if you wanted to ignore a mismatch you'd have to say -i on bpmaster and then go modify all your slave nodes. - Erik |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:35:56
|
Thats hard-coded in boot.c, don't ask me. -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 On Thu, 2002-10-31 at 16:25, J.A. Magall=F3n wrote: >=20 > On 2002.11.01 Joshua J. England wrote: > >=20 > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > >=20 > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 >=20 > Why with -i ??? >=20 > --=20 > J.A. Magallon <jam...@ab...> \ Software is lik= e sex: > werewolf.able.es \ It's better when it'= s free > Mandrake Linux release 9.1 (Cooker) for i586 > Linux 2.4.20-rc1-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-2mdk)) |
From: <er...@he...> - 2002-11-01 00:28:42
|
On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > bpslave: IO daemon started; pid=11 > > beoserv on the master shows: > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > beoserv: Starting node_up worker for 1 clients. > nodeup : Child process for node 0 died with signal 4 > > > I'm booting from an elf image created from a standard bproc kernel, > along with the initrd created by 'beoboot -2'. Is this considered a > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? This is just the node setup program from beoboot. BProc is running and it appears to be at least mostly happy. Try this: /usr/lib/beoboot/bin/node_up -s ## This is the way to run the node setup program in interactive mode. This will let you muck around with it without having to reboot all the time. SIGILL sounds like there might be a migration problem of time kind. Did I just say BProc appeared happy? Whups. Here come the questions: Are there mixed architectures between the slave and the front end? (e.g. a P4 front end and an athlon slave node) If so, you need to make sure that the libraries you have installed will run on both nodes. I believe Red Hat (and possibly others) have started shipping libraries compiled specifically for i686, etc. Are there any messages on the slave's console at all? Some kind of mapping failure could be a clue here. Make sure your library list (bplib -l) doesn't include everything in /lib and /usr/lib. Here are the "libraries" lines that I'm using in my /etc/beowulf/config libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* libraries /lib/librt-2* /lib/libpthread-* libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* libraries /lib/libresolv-2* libraries /lib/libpthread* libraries /lib/libnss_bproc* libraries /lib/libdl-2* libraries /lib/libnsl* libraries /usr/lib/libncurses* libraries /lib/libutil-2* > Also, what is the role of the 'bootfile' parameter in > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > RARP request, but changing it seems to have no effect. Hrm. It should have some effect. Make sure you SIGHUP beoserv after modifying the file. - Erik |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:43:15
|
Uh-Oh. I think you might have hit it. I'm running RH8.0 on a PIII as the master for smartcore PII slaves. I think i686 libs might not be happy on the PIIs. What to do? Install i386 libs in a separate partition or scrap the master and go with an identical arch? -JE On Thu, 2002-10-31 at 17:20, er...@he... wrote: > On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > > > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > bpslave: IO daemon started; pid=11 > > > > beoserv on the master shows: > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > beoserv: Starting node_up worker for 1 clients. > > nodeup : Child process for node 0 died with signal 4 > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > along with the initrd created by 'beoboot -2'. Is this considered a > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > This is just the node setup program from beoboot. BProc is running > and it appears to be at least mostly happy. > > Try this: > /usr/lib/beoboot/bin/node_up -s ## > > This is the way to run the node setup program in interactive mode. > This will let you muck around with it without having to reboot all the > time. > > SIGILL sounds like there might be a migration problem of time kind. > Did I just say BProc appeared happy? Whups. > > Here come the questions: > > Are there mixed architectures between the slave and the front end? > (e.g. a P4 front end and an athlon slave node) If so, you need to make > sure that the libraries you have installed will run on both nodes. I > believe Red Hat (and possibly others) have started shipping libraries > compiled specifically for i686, etc. > > Are there any messages on the slave's console at all? Some kind of > mapping failure could be a clue here. Make sure your library list > (bplib -l) doesn't include everything in /lib and /usr/lib. > > Here are the "libraries" lines that I'm using in my /etc/beowulf/config > > libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* > libraries /lib/librt-2* /lib/libpthread-* > libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* > libraries /lib/libresolv-2* > libraries /lib/libpthread* > libraries /lib/libnss_bproc* > libraries /lib/libdl-2* > libraries /lib/libnsl* > libraries /usr/lib/libncurses* > libraries /lib/libutil-2* > > > > Also, what is the role of the 'bootfile' parameter in > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > RARP request, but changing it seems to have no effect. > > Hrm. It should have some effect. Make sure you SIGHUP beoserv after > modifying the file. > > - Erik |
From: steven j. <py...@li...> - 2002-11-01 12:36:03
|
Greetings, I don't think I would mix library versions. OTOH, the PIII should be able to run libs and kernel targeted for PII without problem. It's just a question of convincing RH to install that way. Worst case, put HD in a PII box, do install, then move HD to PIII. G'day, sjames On 31 Oct 2002, Joshua J. England wrote: > Uh-Oh. I think you might have hit it. I'm running RH8.0 on a PIII as > the master for smartcore PII slaves. I think i686 libs might not be > happy on the PIIs. > > What to do? Install i386 libs in a separate partition or scrap the > master and go with an identical arch? > > -JE > > On Thu, 2002-10-31 at 17:20, er...@he... wrote: > > On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > > > > > I think I'm getting very close now. I'm finally catching some RARPs > > > with beoserv when a slave boots, although the slave dies pretty > > > quickly. The last thing seen on the slave is: > > > > > > boot: Server IP address: 10.0.4.100 > > > boot: My IP address : 10.0.4.10 > > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > > bpslave: IO daemon started; pid=11 > > > > > > beoserv on the master shows: > > > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > > beoserv: Starting node_up worker for 1 clients. > > > nodeup : Child process for node 0 died with signal 4 > > > > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > > along with the initrd created by 'beoboot -2'. Is this considered a > > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > > > This is just the node setup program from beoboot. BProc is running > > and it appears to be at least mostly happy. > > > > Try this: > > /usr/lib/beoboot/bin/node_up -s ## > > > > This is the way to run the node setup program in interactive mode. > > This will let you muck around with it without having to reboot all the > > time. > > > > SIGILL sounds like there might be a migration problem of time kind. > > Did I just say BProc appeared happy? Whups. > > > > Here come the questions: > > > > Are there mixed architectures between the slave and the front end? > > (e.g. a P4 front end and an athlon slave node) If so, you need to make > > sure that the libraries you have installed will run on both nodes. I > > believe Red Hat (and possibly others) have started shipping libraries > > compiled specifically for i686, etc. > > > > Are there any messages on the slave's console at all? Some kind of > > mapping failure could be a clue here. Make sure your library list > > (bplib -l) doesn't include everything in /lib and /usr/lib. > > > > Here are the "libraries" lines that I'm using in my /etc/beowulf/config > > > > libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* > > libraries /lib/librt-2* /lib/libpthread-* > > libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* > > libraries /lib/libresolv-2* > > libraries /lib/libpthread* > > libraries /lib/libnss_bproc* > > libraries /lib/libdl-2* > > libraries /lib/libnsl* > > libraries /usr/lib/libncurses* > > libraries /lib/libutil-2* > > > > > > > Also, what is the role of the 'bootfile' parameter in > > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > > RARP request, but changing it seems to have no effect. > > > > Hrm. It should have some effect. Make sure you SIGHUP beoserv after > > modifying the file. > > > > - Erik > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Influence the future > of Java(TM) technology. Join the Java Community > Process(SM) (JCP(SM)) program now. > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |