You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: <ha...@no...> - 2002-11-07 19:31:14
|
When kernel contain certain filesystems built-in, romfs format of initrd is not recognized during beoboot. Probably beoboot should enforce romfs choice via kernel parameters? It seemes to be the case with FAT filesystems and cramfs - kernel complains about them but does not try romfs afterwards. This happens with 2.4.19, bproc 3.2.2, and beoboot 1.3 (with two fixes discussed here recently, kindly sent to me by Luiz Otavio). Working .config seemes to be: CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_INITRD=y CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_UMSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_CRAMFS=m CONFIG_TMPFS=y CONFIG_RAMFS=y CONFIG_ROMFS_FS=y i.e. use modules for most alternatives to romfs. Regards Vaclav |
From: <ha...@no...> - 2002-11-01 19:56:57
|
Could you please recommend which versions of: - bproc kernel patch - bproc - beoboot are 'low-adventure' choice with stock 2.4.19 ? bproc-3.1.10 has just 2.4.18 patch with it. bproc-3.2.2 goes well with 2.4.19 but probably (* see below) needs beoboot-lanl.1.3 which does not compile out of the box (** see below). So what do you use with 2.4.19 ? A) - 2.4.19 patch from bproc-3.2.2 - bproc-3.2.2 - somehow fixed beoboot-lanl.1.3 B) - 2.4.19 patch from bproc-3.2.2 - bproc-3.1.10 - beoboot-lanl.1.2 C) - 2.4.18 patch from bproc-3.1.10 (hope it fits on 2.4.19) - bproc-3.1.10 - beoboot-lanl.1.2 D) some other combination, say mixing in something from -jam patch collection?? Any advice more than welcome. (*) bproc-3.2.2 + 2.4.29 + old beoboot from March Clustermatic leads to funny end in stage 2 boot when it comes to ramdisk: FAT: bogus logical sector size ... (At least I attribute this to non-matching beoboot) (**) make of beoboot-lanl.1.3 reports 'malformed floating constant' :) cause Makefile contains: VERSION:=lanl.1.3 and it should contain: VERSION:="lanl.1.3" and there are other quirks: rarpserv.c:48:20: cmconf.h: No such file or directory - I should probably install something? Best Regards Vaclav Hanzl |
From: steven j. <py...@li...> - 2002-11-01 12:36:03
|
Greetings, I don't think I would mix library versions. OTOH, the PIII should be able to run libs and kernel targeted for PII without problem. It's just a question of convincing RH to install that way. Worst case, put HD in a PII box, do install, then move HD to PIII. G'day, sjames On 31 Oct 2002, Joshua J. England wrote: > Uh-Oh. I think you might have hit it. I'm running RH8.0 on a PIII as > the master for smartcore PII slaves. I think i686 libs might not be > happy on the PIIs. > > What to do? Install i386 libs in a separate partition or scrap the > master and go with an identical arch? > > -JE > > On Thu, 2002-10-31 at 17:20, er...@he... wrote: > > On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > > > > > I think I'm getting very close now. I'm finally catching some RARPs > > > with beoserv when a slave boots, although the slave dies pretty > > > quickly. The last thing seen on the slave is: > > > > > > boot: Server IP address: 10.0.4.100 > > > boot: My IP address : 10.0.4.10 > > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > > bpslave: IO daemon started; pid=11 > > > > > > beoserv on the master shows: > > > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > > beoserv: Starting node_up worker for 1 clients. > > > nodeup : Child process for node 0 died with signal 4 > > > > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > > along with the initrd created by 'beoboot -2'. Is this considered a > > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > > > This is just the node setup program from beoboot. BProc is running > > and it appears to be at least mostly happy. > > > > Try this: > > /usr/lib/beoboot/bin/node_up -s ## > > > > This is the way to run the node setup program in interactive mode. > > This will let you muck around with it without having to reboot all the > > time. > > > > SIGILL sounds like there might be a migration problem of time kind. > > Did I just say BProc appeared happy? Whups. > > > > Here come the questions: > > > > Are there mixed architectures between the slave and the front end? > > (e.g. a P4 front end and an athlon slave node) If so, you need to make > > sure that the libraries you have installed will run on both nodes. I > > believe Red Hat (and possibly others) have started shipping libraries > > compiled specifically for i686, etc. > > > > Are there any messages on the slave's console at all? Some kind of > > mapping failure could be a clue here. Make sure your library list > > (bplib -l) doesn't include everything in /lib and /usr/lib. > > > > Here are the "libraries" lines that I'm using in my /etc/beowulf/config > > > > libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* > > libraries /lib/librt-2* /lib/libpthread-* > > libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* > > libraries /lib/libresolv-2* > > libraries /lib/libpthread* > > libraries /lib/libnss_bproc* > > libraries /lib/libdl-2* > > libraries /lib/libnsl* > > libraries /usr/lib/libncurses* > > libraries /lib/libutil-2* > > > > > > > Also, what is the role of the 'bootfile' parameter in > > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > > RARP request, but changing it seems to have no effect. > > > > Hrm. It should have some effect. Make sure you SIGHUP beoserv after > > modifying the file. > > > > - Erik > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Influence the future > of Java(TM) technology. Join the Java Community > Process(SM) (JCP(SM)) program now. > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:43:15
|
Uh-Oh. I think you might have hit it. I'm running RH8.0 on a PIII as the master for smartcore PII slaves. I think i686 libs might not be happy on the PIIs. What to do? Install i386 libs in a separate partition or scrap the master and go with an identical arch? -JE On Thu, 2002-10-31 at 17:20, er...@he... wrote: > On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > > > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > bpslave: IO daemon started; pid=11 > > > > beoserv on the master shows: > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > beoserv: Starting node_up worker for 1 clients. > > nodeup : Child process for node 0 died with signal 4 > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > along with the initrd created by 'beoboot -2'. Is this considered a > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > This is just the node setup program from beoboot. BProc is running > and it appears to be at least mostly happy. > > Try this: > /usr/lib/beoboot/bin/node_up -s ## > > This is the way to run the node setup program in interactive mode. > This will let you muck around with it without having to reboot all the > time. > > SIGILL sounds like there might be a migration problem of time kind. > Did I just say BProc appeared happy? Whups. > > Here come the questions: > > Are there mixed architectures between the slave and the front end? > (e.g. a P4 front end and an athlon slave node) If so, you need to make > sure that the libraries you have installed will run on both nodes. I > believe Red Hat (and possibly others) have started shipping libraries > compiled specifically for i686, etc. > > Are there any messages on the slave's console at all? Some kind of > mapping failure could be a clue here. Make sure your library list > (bplib -l) doesn't include everything in /lib and /usr/lib. > > Here are the "libraries" lines that I'm using in my /etc/beowulf/config > > libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* > libraries /lib/librt-2* /lib/libpthread-* > libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* > libraries /lib/libresolv-2* > libraries /lib/libpthread* > libraries /lib/libnss_bproc* > libraries /lib/libdl-2* > libraries /lib/libnsl* > libraries /usr/lib/libncurses* > libraries /lib/libutil-2* > > > > Also, what is the role of the 'bootfile' parameter in > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > RARP request, but changing it seems to have no effect. > > Hrm. It should have some effect. Make sure you SIGHUP beoserv after > modifying the file. > > - Erik |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:35:56
|
Thats hard-coded in boot.c, don't ask me. -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 On Thu, 2002-10-31 at 16:25, J.A. Magall=F3n wrote: >=20 > On 2002.11.01 Joshua J. England wrote: > >=20 > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > >=20 > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 >=20 > Why with -i ??? >=20 > --=20 > J.A. Magallon <jam...@ab...> \ Software is lik= e sex: > werewolf.able.es \ It's better when it'= s free > Mandrake Linux release 9.1 (Cooker) for i586 > Linux 2.4.20-rc1-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-2mdk)) |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:33:19
|
I'm looking on the console -- no OOPS, and this kernel is compiled for Pentium-Classic to be sure of compatibility even though these chips are PII. I haven't tried commenting stuff out in node_up.conf yet, just because it might make matters worse. There are some references to a bproc-aware nsswitch, but I don't see the libs for that anywhere. Could that possibly be the problem? -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 On Thu, 2002-10-31 at 04:20, steven james wrote: > Greetings, > > I regularly use the initrd from beoboot in the elf image. It's not that. > > Shooting in the dark: any chance the kernel or modules are compiled fro > the wrong kind of processor (K7 on a P 4 for example)? > > If possable, try serial console on the slave to see if there's an OOPS > associated with the failure. > > G'day, > sjames > > > On 31 Oct 2002, Joshua J. England wrote: > > > > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > > > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > > bpslave: IO daemon started; pid=11 > > > > beoserv on the master shows: > > > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > > beoserv: Starting node_up worker for 1 clients. > > nodeup : Child process for node 0 died with signal 4 > > > > > > I'm booting from an elf image created from a standard bproc kernel, > > along with the initrd created by 'beoboot -2'. Is this considered a > > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > > > Also, what is the role of the 'bootfile' parameter in > > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > > RARP request, but changing it seems to have no effect. > > > > > > Sorry for the onslaught of questions, free beer at SC for all who help. > > :) > > > > -JE > > ----------------------------------------------- > > Josh England > > Sandia National Laboratory, Livermore, CA > > Distributed Information Systems > > email: jj...@sa... > > phone: (925) 294-2076 > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: Influence the future > > of Java(TM) technology. Join the Java Community > > Process(SM) (JCP(SM)) program now. > > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > > _______________________________________________ > > BProc-users mailing list > > BPr...@li... > > https://lists.sourceforge.net/lists/listinfo/bproc-users > > > > -- > -------------------------steven james, director of research, linux labs > ... ........ ..... .... 230 peachtree st nw ste 701 > the original linux labs atlanta.ga.us 30303 > -since 1995 http://www.linuxlabs.com > office 404.577.7747 fax 404.577.7743 > ----------------------------------------------------------------------- > > |
From: <er...@he...> - 2002-11-01 00:30:03
|
On Fri, Nov 01, 2002 at 01:25:43AM +0100, J.A. Magall=F3n wrote: >=20 > On 2002.11.01 Joshua J. England wrote: > >=20 > > I think I'm getting very close now. I'm finally catching some RARPs > > with beoserv when a slave boots, although the slave dies pretty > > quickly. The last thing seen on the slave is: > >=20 > > boot: Server IP address: 10.0.4.100 > > boot: My IP address : 10.0.4.10 > > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 >=20 > Why with -i ??? That way the decision to ignore version mismatches is up to the master node. Otherwise, if you wanted to ignore a mismatch you'd have to say -i on bpmaster and then go modify all your slave nodes. - Erik |
From: <er...@he...> - 2002-11-01 00:28:42
|
On Thu, Oct 31, 2002 at 04:04:52PM -0800, Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > bpslave: IO daemon started; pid=11 > > beoserv on the master shows: > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > beoserv: Starting node_up worker for 1 clients. > nodeup : Child process for node 0 died with signal 4 > > > I'm booting from an elf image created from a standard bproc kernel, > along with the initrd created by 'beoboot -2'. Is this considered a > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? This is just the node setup program from beoboot. BProc is running and it appears to be at least mostly happy. Try this: /usr/lib/beoboot/bin/node_up -s ## This is the way to run the node setup program in interactive mode. This will let you muck around with it without having to reboot all the time. SIGILL sounds like there might be a migration problem of time kind. Did I just say BProc appeared happy? Whups. Here come the questions: Are there mixed architectures between the slave and the front end? (e.g. a P4 front end and an athlon slave node) If so, you need to make sure that the libraries you have installed will run on both nodes. I believe Red Hat (and possibly others) have started shipping libraries compiled specifically for i686, etc. Are there any messages on the slave's console at all? Some kind of mapping failure could be a clue here. Make sure your library list (bplib -l) doesn't include everything in /lib and /usr/lib. Here are the "libraries" lines that I'm using in my /etc/beowulf/config libraries /lib/ld-2* /lib/libc-2* /lib/libm-2* /lib/libcrypt* libraries /lib/librt-2* /lib/libpthread-* libraries /usr/lib/libbproc* /lib/libtermcap* /lib/libproc* libraries /lib/libresolv-2* libraries /lib/libpthread* libraries /lib/libnss_bproc* libraries /lib/libdl-2* libraries /lib/libnsl* libraries /usr/lib/libncurses* libraries /lib/libutil-2* > Also, what is the role of the 'bootfile' parameter in > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > RARP request, but changing it seems to have no effect. Hrm. It should have some effect. Make sure you SIGHUP beoserv after modifying the file. - Erik |
From: <jam...@ab...> - 2002-11-01 00:25:50
|
On 2002.11.01 Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 Why with -i ??? -- J.A. Magallon <jam...@ab...> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-rc1-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-2mdk)) |
From: steven j. <py...@li...> - 2002-11-01 00:21:14
|
Greetings, I regularly use the initrd from beoboot in the elf image. It's not that. Shooting in the dark: any chance the kernel or modules are compiled fro the wrong kind of processor (K7 on a P 4 for example)? If possable, try serial console on the slave to see if there's an OOPS associated with the failure. G'day, sjames On 31 Oct 2002, Joshua J. England wrote: > > I think I'm getting very close now. I'm finally catching some RARPs > with beoserv when a slave boots, although the slave dies pretty > quickly. The last thing seen on the slave is: > > boot: Server IP address: 10.0.4.100 > boot: My IP address : 10.0.4.10 > boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 > bpslave: IO daemon started; pid=11 > > beoserv on the master shows: > > beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 > beoserv: Starting node_up worker for 1 clients. > nodeup : Child process for node 0 died with signal 4 > > > I'm booting from an elf image created from a standard bproc kernel, > along with the initrd created by 'beoboot -2'. Is this considered a > badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? > > Also, what is the role of the 'bootfile' parameter in > /etc/beowulf/config? It looks like beoserv feeds it to a slave after a > RARP request, but changing it seems to have no effect. > > > Sorry for the onslaught of questions, free beer at SC for all who help. > :) > > -JE > ----------------------------------------------- > Josh England > Sandia National Laboratory, Livermore, CA > Distributed Information Systems > email: jj...@sa... > phone: (925) 294-2076 > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Influence the future > of Java(TM) technology. Join the Java Community > Process(SM) (JCP(SM)) program now. > http://ads.sourceforge.net/cgi-bin/redirect.pl?sunm0004en > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |
From: Joshua J. E. <jj...@sa...> - 2002-11-01 00:08:04
|
I think I'm getting very close now. I'm finally catching some RARPs with beoserv when a slave boots, although the slave dies pretty quickly. The last thing seen on the slave is: boot: Server IP address: 10.0.4.100 boot: My IP address : 10.0.4.10 boot: starting bpslave: bpslave -d -i 10.0.4.100 2223 bpslave: IO daemon started; pid=11 beoserv on the master shows: beoserv: RARP: 00:30:59:00:98:26 == 10.0.4.10 beoserv: Starting node_up worker for 1 clients. nodeup : Child process for node 0 died with signal 4 I'm booting from an elf image created from a standard bproc kernel, along with the initrd created by 'beoboot -2'. Is this considered a badbadthing? Do I need to roll my own initrd and run 'bpslave' from it? Also, what is the role of the 'bootfile' parameter in /etc/beowulf/config? It looks like beoserv feeds it to a slave after a RARP request, but changing it seems to have no effect. Sorry for the onslaught of questions, free beer at SC for all who help. :) -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 |
From: <er...@he...> - 2002-10-31 17:29:02
|
On Thu, Oct 31, 2002 at 08:50:51AM -0800, Joshua J. England wrote: > On Thu, 2002-10-31 at 09:23, er...@he... wrote: > > On Wed, Oct 30, 2002 at 06:26:35PM -0300, Luiz Ot=E1vio de Lima Rodri= gues wrote: > > > Another problem is when I compel beoboot. An error happens that I > > > do not obtain to arrange of form some: =20 > > >=20 > > > (cat config.boot.in; /mkpcitable </usr/share/hwdata/pcitable) > co= nfig.boot > > > Split Loop, < STDIN > line 10. make: * * * [ config.boot ] Error = 255 > >=20 > > I saw this too as soon as I tried Red Hat 8.0. There seems to be som= e > > new brain damage related to internationalization. I said "unset LANG= " > > and it worked again. The script is so simple I can't imagine there's > > actually a bug in it. >=20 > perl 5.8 claims that implicitly splitting into the @_ array is > deprecated as it can clobber any subroutine arguments. I experienced > the same problem, but fixed it by splitting into an explicit array: <SARCASTIC RANT> Oh, of course! I don't know why I didn't parse "Split loop" as implicit use of @_ is deprecated. Silly me. I'm sure if I were a real 'l3et perl HaX0r, I'd know that deprecated meant "unset LANG". Stupid perl. </SARCASTIC RANT> Seriously, though, thanks for the fix. Incidentally, explicitly doing "$@ =3D split /[\t\n]/;" makes it work fine too. - Erik > #!/usr/bin/perl > # > # Create a config.boot from a redhat style pci table. > # > # $Id: mkpcitable,v 1.4 2002/06/04 14:07:50 packager Exp $ > while (<STDIN>) { > s/\#.*//; # Strip comments > @a =3D split /[\t\n]/; > if ($#a =3D=3D 3 || $#a =3D=3D 5) { > ($vendor, $device, $driver) =3D (hex($a[0]), hex($a[1]), $a[$#a - 1]); > $driver =3D~ s/\"//g; > printf("pci 0x%04lx 0x%04lx %s\n", $vendor, $device, $driver) > unless ($driver =3D~ m/^unknown/i || > $driver =3D~ m/^ignore/i || > $driver =3D~ m/^Card:/i || > $driver =3D~ m/^Server:/i ); > } > } |
From: Joshua J. E. <jj...@sa...> - 2002-10-31 16:54:27
|
On Thu, 2002-10-31 at 09:23, er...@he... wrote: > On Wed, Oct 30, 2002 at 06:26:35PM -0300, Luiz Ot=E1vio de Lima Rodrigues= wrote: > > Another problem is when I compel beoboot. An error happens that I > > do not obtain to arrange of form some: =20 > >=20 > > (cat config.boot.in; /mkpcitable </usr/share/hwdata/pcitable) > config= .boot > > Split Loop, < STDIN > line 10. make: * * * [ config.boot ] Error 255 >=20 > I saw this too as soon as I tried Red Hat 8.0. There seems to be some > new brain damage related to internationalization. I said "unset LANG" > and it worked again. The script is so simple I can't imagine there's > actually a bug in it. perl 5.8 claims that implicitly splitting into the @_ array is deprecated as it can clobber any subroutine arguments. I experienced the same problem, but fixed it by splitting into an explicit array: #!/usr/bin/perl # # Create a config.boot from a redhat style pci table. # # $Id: mkpcitable,v 1.4 2002/06/04 14:07:50 packager Exp $ while (<STDIN>) { s/\#.*//; # Strip comments @a =3D split /[\t\n]/; if ($#a =3D=3D 3 || $#a =3D=3D 5) { ($vendor, $device, $driver) =3D (hex($a[0]), hex($a[1]), $a[$#a - 1]); $driver =3D~ s/\"//g; printf("pci 0x%04lx 0x%04lx %s\n", $vendor, $device, $driver) unless ($driver =3D~ m/^unknown/i || $driver =3D~ m/^ignore/i || $driver =3D~ m/^Card:/i || $driver =3D~ m/^Server:/i ); } } -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 |
From: <er...@he...> - 2002-10-31 16:31:25
|
On Wed, Oct 30, 2002 at 06:26:35PM -0300, Luiz Ot=E1vio de Lima Rodrigues= wrote: >=20 > Wilton and All, >=20 > I find that I discovered the problem in the beoboot fase 2 image, I > did not generate it with the correct commands, then I go to test a litt= le > more. However appeared another one doubts, is that in the package cmto= ols > has two patches for the MPICH, which of the two I applies? e in which > order? =20 >=20 > Another problem is when I compel beoboot. An error happens that I > do not obtain to arrange of form some: =20 >=20 > (cat config.boot.in; /mkpcitable </usr/share/hwdata/pcitable) > config= .boot > Split Loop, < STDIN > line 10. make: * * * [ config.boot ] Error 255 I saw this too as soon as I tried Red Hat 8.0. There seems to be some new brain damage related to internationalization. I said "unset LANG" and it worked again. The script is so simple I can't imagine there's actually a bug in it. > Others two errors I arranged of the following form: =20 >=20 > In line 5 of the Makefile I modified it to: VERSION:=3D"lanl.1.3" That's the right fix. My version number sampter will be fixed for the next rev of just about everything. > In line 76 of node_up/ethreads.c it to: //#undef errno This isn't quite the right answer.. You need to #undef errno and then add "extern int errno;" This is because things may actually be linked against the static errno declaration, not __errno_location. (e.g. libmodutils on Red Hat 7.3) So node_up tells ethreads to use that errno before calling modprobe_main. > Despite the error solved when typing Make the compilation it does > not finish again successfully and I obtain to install beoboot. What are the other problems? - Erik |
From: <er...@he...> - 2002-10-31 16:10:17
|
On Wed, Oct 30, 2002 at 05:14:43PM -0800, Joshua J. England wrote: > > My slaves are finally booting from a phase 2 image! However, they die > while apparently executing code in the ramdisk with: > > boot: LANL beoboot version lanl.1.3 > boot: System boot phase 2 in progress. > boot: Reading config file from: config.boot > Failed to get BProc version. > > A fatal error has occurred. > > > The bproc modules are loaded and everything is running on the master. I > followed the code up until a syscall(), but don't know where to look > from there. > > Is there any obvious reason why this would be failing? Usually this means that the BProc module didn't load correctly on the slave node. I would check that the bproc module is actually on the boot image. You can gunzip and loop back mount the initrd image to look. If it's not there, then make sure there's a vmadump.o and bproc.o that matches the kernel you're using in the appropriate modules directory in /lib/modules/. If it is there, make sure it actually loads properly on the kernel you're using. Just in case: the "beoboot" kernel is for phase 1 *ONLY*. It does not support BProc. You should probably use the same kernel that you're running on your front end. - Erik |
From: Joshua J. E. <jj...@sa...> - 2002-10-31 01:17:30
|
My slaves are finally booting from a phase 2 image! However, they die while apparently executing code in the ramdisk with: boot: LANL beoboot version lanl.1.3 boot: System boot phase 2 in progress. boot: Reading config file from: config.boot Failed to get BProc version. A fatal error has occurred. The bproc modules are loaded and everything is running on the master. I followed the code up until a syscall(), but don't know where to look from there. Is there any obvious reason why this would be failing? -JE ----------------------------------------------- Josh England Sandia National Laboratory, Livermore, CA Distributed Information Systems email: jj...@sa... phone: (925) 294-2076 |
From: <jam...@ab...> - 2002-10-30 23:45:08
|
On 2002.10.30 ha...@no... wrote: > > Is it hard to reconcile bproc with highmem ? > > > > I found J.A. Magallon's patch collection 2.4.20-pre10-jam1... > > Grrrrrrr, just recompiled everything with highmem off and still have > unresolved kmap_pagetable in vmadump.o. > > Definitely I screwed something up - maybe my thoughts in previos post? > > How is J.A. Magallon's bproc compile ;-) ?? > Beacuse I added also #include <hihgmem.h> in vmadump.c. ;) But I had not posted the corrected bproc sources in the same site, sorry (task changes, highmen, comment out the scheduling in slave.c...). I will look at what Andrea says in that mail, also...thanks for the pointer. I would not trust too much on my tree wrt bproc (yes, it compiles and seems to work). We have an older version running in a system used to solve FEM problems with MPI, and runs well. But as I want the updated VM in -aa tree, and -aa is not supported by bproc, my kernel will always be a monster... BTW, I have just installed 2.4.20-rc1-jam0 (with bproc-3.2.2) and seems to work. When I can test is seriously I will 'officially' release jam1 and the modified userspace tree. Erik, will you admit a patch in the line of #ifdef NEW_O1_SCHED set_user_nice(xxxx) #else current-> nice = xxxx; #end TIA -- J.A. Magallon <jam...@ab...> \ Software is like sex: werewolf.able.es \ It's better when it's free Mandrake Linux release 9.1 (Cooker) for i586 Linux 2.4.20-rc1-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-2mdk)) |
From: <ll...@st...> - 2002-10-30 21:27:21
|
Wilton and All, I find that I discovered the problem in the beoboot fase 2 image, I did not generate it with the correct commands, then I go to test a = little more. However appeared another one doubts, is that in the package = cmtools has two patches for the MPICH, which of the two I applies? e in which order? =20 Another problem is when I compel beoboot. An error happens that I do not obtain to arrange of form some: =20 (cat config.boot.in; /mkpcitable </usr/share/hwdata/pcitable) > = config.boot Split Loop, < STDIN > line 10. make: * * * [ config.boot ] Error 255 Others two errors I arranged of the following form: =20 In line 5 of the Makefile I modified it to: VERSION:=3D"lanl.1.3" In line 76 of node_up/ethreads.c it to: //#undef errno Despite the error solved when typing Make the compilation it does not finish again successfully and I obtain to install beoboot. Thanks a lot Luiz Ot=E1vio ----------------- Original Message ------------------ Wilton and All, Eu acho que descobri o problema na imagem, =E9 que eu n=E3o gerei ela com os comandos corretos, ent=E3o vou testar um pouco mais. Entretanto = surgiu outra duvida, =E9 que no pacote cmtools tem dois patches para o MPICH, = qual dos dois eu aplico? e em qual ordem? Outro problema =E9 quando compilo o beoboot. Acontece um erro que = n=E3o consigo arrumar de forma alguma: (cat config.boot.in ; ./mkpcitable < /usr/share/hwdata/pcitable) > config.boot Split loop, <STDIN> line 10. make:*** [config.boot] Error 255 Outros dois erros eu arrumei da seguinte forma: Na linha 5 do Makefile alterei para: VERSION:=3D"lanl.1.3" Na linha 76 do node_up/ethreads.c para: //#undef errno Apesar do erro n=E3o solucionado ao digitar Make novamente a compila=E7=E3o termina com sucesso e consigo instalar o beoboot. Muito obrigado, Luiz Ot=E1vio -----Mensagem original----- De: Wilton Wong [mailto:ww...@ha...] Enviada em: quarta-feira, 30 de outubro de 2002 03:55 pm Para: Luiz Otavio de Lima Rodrigues Cc: 'bpr...@li...' Assunto: Re: [BProc] Beoboot troubles On Tue, 29 Oct 2002, Luiz Otavio de Lima Rodrigues wrote: What exactly is the error is is returning when it doesn't load /var/beowulf/boot.img ? > Another one doubts is in relation to the access, in diverse > applications is used rsh to have access in the slaves, however these images rsh is not used.. it is replaced by a similar command called bpsh, I = also belive there is an enhanced version of lam-mpi that was done by = clubmask http://clubmask.sourceforge.net, I personally haven't looked at it yet = so I'm not sure what has been done. =20 > "Beoboot no longer requires a modified C library. The dynamic linker patch > was only for demand loading of libraries which we're trying to get = away > from."=20 You can continue using the libs you are using now, it should not affect anything, the only difference between stock glibc and the bproc (3.1) = glibc is that the loader was modified to request libraries from the master node = if it is not found locally. - Wilton ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance = UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux = Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data = Ltd. ]---- |
From: steven j. <py...@li...> - 2002-10-30 20:38:49
|
Greetings, I had missed the LinuxBIOS part when I first saw your message. Which method are you using through LinuxBIOS? It sounds like the problem is that LinuxBIOS wants an ELF image rather than a beoboot formatted image. In that case, you'll need to tell beoboot to generate seperate kernel and ramdisk, then use mkelfImage to put them together. G'day, sjames On 30 Oct 2002, Joshua J. England wrote: > Thanks. > > The slave nodes are booting from linuxBIOS, so I guess I need to look at > that side of the house for the solution. > > -JE > > On Wed, 2002-10-30 at 09:52, er...@he... wrote: > > On Tue, Oct 29, 2002 at 03:04:04PM -0800, Joshua J. England wrote: > > > OK, the problem is definitely with the kernel image. > > > > > > The slave nodes complain: > > > 'Loading 10.0.4.100:/bproc/vmlinuz-beoboot error: not a valid image' > > > > > > This image was created with 'beoboot -2 -n -o vmlinuz-beoboot' from a > > > bproc 2.4.19 kernel. What could be wrong? > > > > -n creates a network boot image that only works with the beoboot > > phase1 (two kernel monte) stuff. I don't know what boot loader > > you're using here but it's not beoboot. > > > > Most likely what you want to do is create images with -i. That will > > create a kernel and initial ram disk image and tell you what command > > line to use. Then you can load those with whatever mechanism your > > boot loader likes. > > > > - Erik > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > -- -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |
From: Nicholas H. <he...@se...> - 2002-10-30 19:47:13
|
On Wed, 30 Oct 2002 11:54:52 -0700 Wilton Wong <ww...@ha...> wrote: > > On Tue, 29 Oct 2002, Luiz Otavio de Lima Rodrigues wrote: > > What exactly is the error is is returning when it doesn't load > /var/beowulf/boot.img ? > > > Another one doubts is in relation to the access, in diverse > > applications is used rsh to have access in the slaves, however these > > images > > rsh is not used.. it is replaced by a similar command called bpsh, I > also belive there is an enhanced version of lam-mpi that was done by > clubmask http://clubmask.sourceforge.net, I personally haven't looked > at it yet so I'm not sure what has been done. > Yes we do have LAM compiled for bproc. I have placed the RPMS on the web. The 6.6b2 RPM is really the latest CVS snapshot and changes without notice. The nice part is that the CVS version of LAM comes with bproc support enabled. www.liniac.upenn.edu/~henken/lam Nic -- Nicholas Henke Linux Cluster Systems Programmer |
From: Wilton W. <ww...@ha...> - 2002-10-30 18:55:05
|
On Tue, 29 Oct 2002, Luiz Otavio de Lima Rodrigues wrote: What exactly is the error is is returning when it doesn't load /var/beowulf/boot.img ? > Another one doubts is in relation to the access, in diverse > applications is used rsh to have access in the slaves, however these images rsh is not used.. it is replaced by a similar command called bpsh, I also belive there is an enhanced version of lam-mpi that was done by clubmask http://clubmask.sourceforge.net, I personally haven't looked at it yet so I'm not sure what has been done. > "Beoboot no longer requires a modified C library. The dynamic linker patch > was only for demand loading of libraries which we're trying to get away > from." You can continue using the libs you are using now, it should not affect anything, the only difference between stock glibc and the bproc (3.1) glibc is that the loader was modified to request libraries from the master node if it is not found locally. - Wilton ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |
From: Joshua J. E. <jj...@sa...> - 2002-10-30 17:10:00
|
Thanks. The slave nodes are booting from linuxBIOS, so I guess I need to look at that side of the house for the solution. -JE On Wed, 2002-10-30 at 09:52, er...@he... wrote: > On Tue, Oct 29, 2002 at 03:04:04PM -0800, Joshua J. England wrote: > > OK, the problem is definitely with the kernel image. > > > > The slave nodes complain: > > 'Loading 10.0.4.100:/bproc/vmlinuz-beoboot error: not a valid image' > > > > This image was created with 'beoboot -2 -n -o vmlinuz-beoboot' from a > > bproc 2.4.19 kernel. What could be wrong? > > -n creates a network boot image that only works with the beoboot > phase1 (two kernel monte) stuff. I don't know what boot loader > you're using here but it's not beoboot. > > Most likely what you want to do is create images with -i. That will > create a kernel and initial ram disk image and tell you what command > line to use. Then you can load those with whatever mechanism your > boot loader likes. > > - Erik |
From: <er...@he...> - 2002-10-30 16:59:47
|
On Tue, Oct 29, 2002 at 03:04:04PM -0800, Joshua J. England wrote: > OK, the problem is definitely with the kernel image. > > The slave nodes complain: > 'Loading 10.0.4.100:/bproc/vmlinuz-beoboot error: not a valid image' > > This image was created with 'beoboot -2 -n -o vmlinuz-beoboot' from a > bproc 2.4.19 kernel. What could be wrong? -n creates a network boot image that only works with the beoboot phase1 (two kernel monte) stuff. I don't know what boot loader you're using here but it's not beoboot. Most likely what you want to do is create images with -i. That will create a kernel and initial ram disk image and tell you what command line to use. Then you can load those with whatever mechanism your boot loader likes. - Erik |
From: <er...@he...> - 2002-10-30 16:56:22
|
On Wed, Oct 30, 2002 at 05:20:22PM +0100, ha...@no... wrote: > Is it hard to reconcile bproc with highmem ? > > I found J.A. Magallon's patch collection 2.4.20-pre10-jam1 and wanted > to use it on Abit-IT7-MAX2 based cluster (patches contain both HPT374 > support and bproc kernel patch; this made me happy). > > I did few replacements in bproc (using J.A.M.'s hints): > > nice = current->nice -----> nice = task_nice(current) > current->nice = nice -----> set_user_nice(current,nice) > DEF_NICE -----> (0) > > and bproc-3.2.2 compiled OK. However depmod complains about unresolved > symbol kmap_pagetable in vmadump.o. > > This is caused by new highmem (1G and more memory support) and there > already was lot of buzz regarding many drivers broken by highmem. The 2.4.19 highmem is a non-issue for BProc. I've been running boxes with >= 1GB of ram and highmem turned on for about a year and I haven't seen any problems. What I've seen of highmem so far indicates that it's only an issue when one thread wants to access user space memory from another thread. You can see some complication in ptrace as a result of this. As long as you're operating on "current" there are no changes. Vmadump always operates on "current" so I didn't have to do anything there. The same is true for BProc. If all of that gets changed around in 2.4.20, then I guess I'll have to revisit it at that point. As far as other kernel patches are concerned.... I unfortunately don't have the time to worry about working with anything other than the stock kernel. - Erik |
From: <ha...@no...> - 2002-10-30 16:35:27
|
> Is it hard to reconcile bproc with highmem ? > > I found J.A. Magallon's patch collection 2.4.20-pre10-jam1... Grrrrrrr, just recompiled everything with highmem off and still have unresolved kmap_pagetable in vmadump.o. Definitely I screwed something up - maybe my thoughts in previos post? How is J.A. Magallon's bproc compile ;-) ?? Regards Vaclav |