You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: Greg W. <gw...@la...> - 2005-06-16 18:52:47
|
I don't know of anyone using it, but it seems to me that the simplest way would be to NFS mount a filesystem on the nodes. That way valgrind will be able to find the executable. Otherwise you could copy the foo executable to the node using bpcp. Greg On Jun 16, 2005, at 12:30 PM, Julian Seward wrote: > > Valgrind is a GPL'd tool suite for doing memory debugging and > profiling > on x86-linux and amd64-linux. We are looking into the issue of making > Valgrind work well on BProc and hence on MPI. > > I'd like to re-ask Ceri's question: has anyone used or tried to use > Valgrind over BProc? If so, what did you have to do to make it work? > > Thanks, > > J > > >> From: Cerion Armour-Brown <cerion@op...> >> Subject: valgrind & bproc >> Date: 2005-05-30 01:40 >> Hi, >> I"m a developer working on Valgrind, and I"m trying to work out >> the best way >> to use valgrind with bproc. >> >> Does anyone already do this? (Directly with bproc - not via mpi). >> If so, I"d really appreciate some details on how you"ve set this up. >> >> I understand that using valgrind with mpirun is fairly >> straightforward >> (though >> I haven"t set up mpi yet to try it out). From what I"ve read, it >> seems >> valgrind must be accessible from the nodes (nfs, or whatever), but >> the >> program to run is migrated from the master, yes? >> >> Using bpsh, I don"t see how I can avoid needing both valgrind, and >> the >> program >> to run, accessible from the nodes, since running >> $ bpsh -a valgrind foo >> will migrate valgrind, then, on the nodes, valgrind will look for >> foo. >> Valgrind is unlike gdb in that you cannot "attach" it once the >> program has >> started. So the trick used for gdb (bpsh -a foo, find foo"s pid, >> attach >> gdb) >> won"t work. >> >> Any pointers much appreciated, >> Cerion >> > > > > ------------------------------------------------------- > SF.Net email is sponsored by: Discover Easy Linux Migration Strategies > from IBM. Find simple to follow Roadmaps, straightforward articles, > informative Webcasts and more! Get everything you need to get up to > speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Julian S. <ju...@va...> - 2005-06-16 18:30:07
|
Valgrind is a GPL'd tool suite for doing memory debugging and profiling on x86-linux and amd64-linux. We are looking into the issue of making Valgrind work well on BProc and hence on MPI. I'd like to re-ask Ceri's question: has anyone used or tried to use Valgrind over BProc? If so, what did you have to do to make it work? Thanks, J > From: Cerion Armour-Brown <cerion@op...> > Subject: valgrind & bproc > Date: 2005-05-30 01:40 > Hi, > I"m a developer working on Valgrind, and I"m trying to work out the best way > to use valgrind with bproc. > > Does anyone already do this? (Directly with bproc - not via mpi). > If so, I"d really appreciate some details on how you"ve set this up. > > I understand that using valgrind with mpirun is fairly straightforward >(though > I haven"t set up mpi yet to try it out). From what I"ve read, it seems > valgrind must be accessible from the nodes (nfs, or whatever), but the > program to run is migrated from the master, yes? > > Using bpsh, I don"t see how I can avoid needing both valgrind, and the >program > to run, accessible from the nodes, since running > $ bpsh -a valgrind foo > will migrate valgrind, then, on the nodes, valgrind will look for foo. > Valgrind is unlike gdb in that you cannot "attach" it once the program has > started. So the trick used for gdb (bpsh -a foo, find foo"s pid, attach >gdb) > won"t work. > > Any pointers much appreciated, > Cerion |
From: Jan H. <hue...@un...> - 2005-06-15 14:24:45
|
My fault, the client kernal didn't have NFSv3 compiled in. Jan. |
From: Jan H. <jan...@iw...> - 2005-06-15 12:58:04
|
Once again additional information. After I changed the /etc/clustermatic/fstab file and created a new initrd for the nodes with beoboot, the nodes display the following error nfs warning: mount version older than kernel NFS: NFSv3 not supported. Jan. |
From: Jan H. <hue...@un...> - 2005-06-15 12:31:16
|
I have setup bproc 4.0.0pre8 successfully, the nodes boot, I can use bpsh and execute commands on them and they see the master node, of course. Now I tried to mount a directory from the master by adding it to the /etc/clustermatic/fstab file. But now the nodes would hang up if I run the node_up script on them The same happens if I do it manually bpsh 0 mkir /home_local bpsh 0 /bin/mount -t nfs -o rsize=8192,wsize=8192,nolock,ro n-1:/home_local /home_local It just hangs here. Any suggestions appreciated. Oh, yes, the nfsserver is running, /etc/exports is up to date. I can export the directory to other machines and the master is able to export directories from other machines. It doesn't seem to make a difference whether I call the master n-1, master (as named in the config file) or by its IP number. Jan. |
From: Erik H. <eah...@gm...> - 2005-06-15 04:17:11
|
At first glance looks to me like support for 4-level page tables showed up. It shouldn't be a horrible patch, I think. I'll try and take a look at it when I get a chance. - Erik On 6/14/05, Daryl W. Grunau <dw...@la...> wrote: > Hi, has anyone successfully built kmonte.ko from beoboot-cm1.10 on a rece= nt > kernel? I get as far as: >=20 > make -C monte libmonte.a > make[1]: Entering directory `/usr/src/redhat/BUILD/beoboot-cm1.10/mont= e' > gcc -Wall -g -DPACKAGE_VERSION=3D'"cm1.10"' -c -o libmonte.o libmonte.= c > ar rcs libmonte.a libmonte.o > make[1]: Leaving directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte= ' > ld -melf_i386 -r -o init1.o boot1.o rarp.o recv.o module.o cmconf.o -L= monte -lmonte > make -C monte LINUX=3D/lib/modules/2.6.11-1.BProc_FC3beoboot/build EXT= RAKDEFS=3D"" kmonte.ko > make[1]: Entering directory `/usr/src/redhat/BUILD/beoboot-cm1.10/mont= e' > LD /usr/src/redhat/BUILD/beoboot-cm1.10/monte/built-in.o > CC [M] /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.o > /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c: In function `get_= phys_addr_': > /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c:132: error: reques= t for member `pgd' in something not a structure or union > /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c: In function `mont= e_restart': > /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c:630: warning: impl= icit declaration of function `remap_page_range' > make[3]: *** [/usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.o] Err= or 1 > make[2]: *** [_module_/usr/src/redhat/BUILD/beoboot-cm1.10/monte] Erro= r 2 > make[1]: *** [kmonte.ko] Error 2 > make[1]: Leaving directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte= ' > make: *** [monte/kmonte.ko] Error 2 >=20 > It appears the kernel gurus have moved the pgd_t around (sigh). Any > help/info appreciated! >=20 > Daryl >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you sho= tput > a projector? How fast can you ride your desk chair down the office luge t= rack? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=3D20 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Daryl W. G. <dw...@la...> - 2005-06-14 14:54:55
|
Hi, has anyone successfully built kmonte.ko from beoboot-cm1.10 on a recent kernel? I get as far as: make -C monte libmonte.a make[1]: Entering directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte' gcc -Wall -g -DPACKAGE_VERSION='"cm1.10"' -c -o libmonte.o libmonte.c ar rcs libmonte.a libmonte.o make[1]: Leaving directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte' ld -melf_i386 -r -o init1.o boot1.o rarp.o recv.o module.o cmconf.o -Lmonte -lmonte make -C monte LINUX=/lib/modules/2.6.11-1.BProc_FC3beoboot/build EXTRAKDEFS="" kmonte.ko make[1]: Entering directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte' LD /usr/src/redhat/BUILD/beoboot-cm1.10/monte/built-in.o CC [M] /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.o /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c: In function `get_phys_addr_': /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c:132: error: request for member `pgd' in something not a structure or union /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c: In function `monte_restart': /usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.c:630: warning: implicit declaration of function `remap_page_range' make[3]: *** [/usr/src/redhat/BUILD/beoboot-cm1.10/monte/kmonte.o] Error 1 make[2]: *** [_module_/usr/src/redhat/BUILD/beoboot-cm1.10/monte] Error 2 make[1]: *** [kmonte.ko] Error 2 make[1]: Leaving directory `/usr/src/redhat/BUILD/beoboot-cm1.10/monte' make: *** [monte/kmonte.ko] Error 2 It appears the kernel gurus have moved the pgd_t around (sigh). Any help/info appreciated! Daryl |
From: Greg W. <gw...@la...> - 2005-06-14 13:37:59
|
Jan, On Jun 14, 2005, at 6:40 AM, Jan Huelsberg wrote: > Now I am a little bit further the in the last posting. > As Erik pointed out bproc is not a communication layer, so I > configured mpich without the option --with-comm=bproc. > The compiling went fine. > I set the RSHCOMMAND to bpsh and the examples seem to run. > But I don't get the prompt back, something hangs. > > Furthermore I was surprised, that I don't have to copy the > executables to the nodes at all. > Is bproc handling this all by itself? That's the whole point of bproc. The nodes have no executables at all, but the process is migrated from the front end to the node in order to run. > > I just changed to the examples directory, compiled the cpi.c > program, used the mpirun that was shipped with the cmtools and set > up the machines.LINUX file. > That's it. Bproc provides it's own mechanism for obtaining the node names. machines.LINUX is not used. > > Did I miss something? > > Jan. > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can > you shotput > a projector? How fast can you ride your desk chair down the office > luge track? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Jan H. <hue...@un...> - 2005-06-14 12:41:00
|
Now I am a little bit further the in the last posting. As Erik pointed out bproc is not a communication layer, so I configured mpich without the option --with-comm=bproc. The compiling went fine. I set the RSHCOMMAND to bpsh and the examples seem to run. But I don't get the prompt back, something hangs. Furthermore I was surprised, that I don't have to copy the executables to the nodes at all. Is bproc handling this all by itself? I just changed to the examples directory, compiled the cpi.c program, used the mpirun that was shipped with the cmtools and set up the machines.LINUX file. That's it. Did I miss something? Jan. |
From: Erik H. <eah...@gm...> - 2005-06-13 16:33:44
|
On 6/13/05, Andrew Shewmaker <ag...@gm...> wrote: > You can get the tar ball from the source rpm with the following command: >=20 > rpm -ivh <source rpm> rpm2cpio is handy too. The tarballs won't be pre-patched but the tarballs, patches and intstructions (in the form of an rpm spec file) will be there. - Erik |
From: Andrew S. <ag...@gm...> - 2005-06-13 16:04:20
|
You can get the tar ball from the source rpm with the following command: rpm -ivh <source rpm> Then look in /usr/src/redhat/SOURCES (or /usr/src/packages/SOURCES on SuSE) Note that in general, source rpms will patch the original tarball. To get a patched source from the source rpm you would follow 'rpm -i' with: cd /usr/src/redhat rpmbuild -bp SPECS/<package name>.spec The bp options stand for build preparation stage. You will find the patched source in: /usr/src/redhat/BUILD You should also look in the spec file to see how the package is being built= . Finally, you should be able to use any compiler you want to with the rpmbuild process and a proper spec file. I highly recommend learning more about rpms if you manage any number of systems. Andrew On 6/13/05, Rene Salmon <rs...@tu...> wrote: > Hi, >=20 > Is there just a patched mpich tar ball available for download? > I am not very familiar with building RPMS from source. >=20 > We installed the MPICH binary RPMS but they are not doing exactly what=20 > we want. For example we would like to compile MPICH with the intel=20 > compilers and also add the intel fortran to mpif77. >=20 > Thanks > Rene --=20 Andrew Shewmaker |
From: Rene S. <rs...@tu...> - 2005-06-13 15:38:08
|
Hi, Is there just a patched mpich tar ball available for download? I am not very familiar with building RPMS from source. We installed the MPICH binary RPMS but they are not doing exactly what we want. For example we would like to compile MPICH with the intel compilers and also add the intel fortran to mpif77. Thanks Rene Erik Hendriks wrote: > BProc isn't a communication layer - it's a process management system. > Clusters running BProc use P4 or GM or whatever is appropriate with > mpich. The trick with BProc is the process start up. You'll need a > patched mpich and a special mpirun program. This stuff is available > at www.clustermatic.org. I suggest grabbing the packages from > Clustermatic 5. > > - Erik > > On 6/13/05, Jan Huelsberg <hue...@un...> wrote: > >>Sorry for the lack of information. >>Here it comes >> >>I tried to install MPICH 1.2.6 on a bproc-4.0.0pre8 system >>The configure was done with >> >>--with-comm=bproc >> >>When I compile I run into the following error >> >>/usr/local/src/mpich-1.2.6/lib/libmpich.a(p4_sock_cr.o)(.text+0x8fd): In >>function 'net_create_slave': >>:undefined reference to 'bproc_gethostbyname' >>collect2: ld returned 1 exit status >> >>After a few more errrors the compiling end unsuccessful. >> >>Any help here would be appreciated. >> >>Jan. >> >> >> >> >>------------------------------------------------------- >>This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput >>a projector? How fast can you ride your desk chair down the office luge track? >>If you want to score the big prize, get to know the little guy. >>Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20 >>_______________________________________________ >>BProc-users mailing list >>BPr...@li... >>https://lists.sourceforge.net/lists/listinfo/bproc-users >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you shotput > a projector? How fast can you ride your desk chair down the office luge track? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- - -- Rene Salmon Tulane University Center for Computational Science Richardson Building 310 New Orleans, LA 70118 http://www.ccs.tulane.edu Tel 504-862-8393 Fax 504-862-8392 |
From: Erik H. <eah...@gm...> - 2005-06-13 14:25:06
|
BProc isn't a communication layer - it's a process management system.=20 Clusters running BProc use P4 or GM or whatever is appropriate with mpich. The trick with BProc is the process start up. You'll need a patched mpich and a special mpirun program. This stuff is available at www.clustermatic.org. I suggest grabbing the packages from Clustermatic 5. - Erik On 6/13/05, Jan Huelsberg <hue...@un...> wrote: > Sorry for the lack of information. > Here it comes >=20 > I tried to install MPICH 1.2.6 on a bproc-4.0.0pre8 system > The configure was done with >=20 > --with-comm=3Dbproc >=20 > When I compile I run into the following error >=20 > /usr/local/src/mpich-1.2.6/lib/libmpich.a(p4_sock_cr.o)(.text+0x8fd): In > function 'net_create_slave': > :undefined reference to 'bproc_gethostbyname' > collect2: ld returned 1 exit status >=20 > After a few more errrors the compiling end unsuccessful. >=20 > Any help here would be appreciated. >=20 > Jan. >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: NEC IT Guy Games. How far can you sho= tput > a projector? How fast can you ride your desk chair down the office luge t= rack? > If you want to score the big prize, get to know the little guy. > Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=3D20 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Jan H. <hue...@un...> - 2005-06-13 10:15:15
|
Sorry for the lack of information. Here it comes I tried to install MPICH 1.2.6 on a bproc-4.0.0pre8 system The configure was done with --with-comm=bproc When I compile I run into the following error /usr/local/src/mpich-1.2.6/lib/libmpich.a(p4_sock_cr.o)(.text+0x8fd): In function 'net_create_slave': :undefined reference to 'bproc_gethostbyname' collect2: ld returned 1 exit status After a few more errrors the compiling end unsuccessful. Any help here would be appreciated. Jan. |
From: Jan H. <hue...@un...> - 2005-06-13 10:06:43
|
I tried to install MPICH 1.2.6 on a bproc-4.0.0pre8 system The configure was done with --with-comm=bproc When I compile I run into the following error undefined reference to 'bproc_gethostbyname' collect2: ld returned 1 exit status After a few more errrors the compiling end unsuccessful. Any help here would be appreciated. Jan. |
From: Cerion Armour-B. <ce...@op...> - 2005-05-30 08:40:41
|
Hi, I'm a developer working on Valgrind, and I'm trying to work out the best way to use valgrind with bproc. Does anyone already do this? (Directly with bproc - not via mpi). If so, I'd really appreciate some details on how you've set this up. I understand that using valgrind with mpirun is fairly straightforward (though I haven't set up mpi yet to try it out). From what I've read, it seems valgrind must be accessible from the nodes (nfs, or whatever), but the program to run is migrated from the master, yes? Using bpsh, I don't see how I can avoid needing both valgrind, and the program to run, accessible from the nodes, since running $ bpsh -a valgrind foo will migrate valgrind, then, on the nodes, valgrind will look for foo. Valgrind is unlike gdb in that you cannot 'attach' it once the program has started. So the trick used for gdb (bpsh -a foo, find foo's pid, attach gdb) won't work. Any pointers much appreciated, Cerion |
From: Dale H. <ro...@ma...> - 2005-05-12 19:50:31
|
On Thu, May 12, 2005 at 10:27:43AM -0700, Erik Hendriks elucidated: > <crazyidea> > Write a plugin that would let beoboot use stuff like system's ifconfig > program directly. For example a line like this: > > useutil ifconfig myri0 mtu 4000 > It would be useful, I think. FWIW, dropping the MTU down to 4000 worked for me. I'd probably apply some of the patches if it wasn't that tomorrow (Friday) is my last day at work, I'm being laid off. Such is the way of academia when the grants dry up. Dale |
From: Erik H. <eah...@gm...> - 2005-05-12 17:34:28
|
Whups, looks like I forgot to reply to all. ---------- Forwarded message ---------- From: Erik Hendriks <eah...@gm...> Date: May 12, 2005 8:59 AM Subject: Re: [BProc] page allocation problems To: Sean <se...@la...> On 5/11/05, Sean <se...@la...> wrote: > HI > I wrote the attached patch to ifdup.c that lets you set the mtu in > beoboot quite some time ago. I guess I should check it in. I believe it made it in as part of ifdup, just not part of the ifconfig plugin. IIRC, ifdup doesn't handle the case where you're not trying to copy another interface's configuration. Maybe ifdup isn't an appropriate place for it at all. Maybe the more appriopriate thing would be to expect ifdup followed by ifconfig to tweak parameters like that. Ideally, ifconfig would be just like the real ifconfig but that would take work. <crazyidea> Write a plugin that would let beoboot use stuff like system's ifconfig program directly. For example a line like this: useutil ifconfig myri0 mtu 4000 The useutil plugin would do bproc_execdump ifconfig with those options and the go undump it on the slave node. It should work as long as this plugin is called after vmadlib to setup the libraries. It would make the size of the setup process a bit bigger but I think it'll likely be worth it for the flexibility it provides. </crazyidea> - Erik |
From: Erik H. <eah...@gm...> - 2005-05-12 09:54:18
|
On 5/7/05, Dale Harris <ro...@ma...> wrote: >=20 > Hey I'm seeing some page allocation errors. Has anyone seen anything > like this. Of course it a 2.6.9 vanilla kernel patched for bproc, and > myrinet GM driver is running. I don't think this has anything to do with BProc per se. I've seen stuff like this before whenever I turn on jumbo frames on any machine and start shoving a lot of data through. I'm not 100% sure I'm right about what's going on but here's my guess: Once you start using jumbo frame sizes (~9k) it gets harder to allocate skuffs in the kernel. These are the buffers that hold network packets. Pages for this kind of stuff are allocated in powers of two. 9k will require 4 pages. Also, since it's kernel space stuff that will be used for DMA buffers, it will want 4 contiguous pages.=20 That can be hard to find since memory gets fragmented. Normally it would be possible for the swapper to page some stuff out (e.g. disk blocks) but kmalloc was called from an interrupt in this case which makes that impossible. The allocator has no options left so it gives up and the allocation fails. Allocation failures in cases like this shouldn't be treated as a major problem. Network drivers need to be able to deal with this sort of thing - and they do. I think the message below is supposed to be a helpful debugging aid. It's considered a warning. I don't know if BProc is causing this to happen in some subtle way.=20 If anything, I would expect that BProc could cause bigger traffic bursts than would normally be experienced by most servers. - Erik > swapper: page allocation failure. order:2, mode:0x20 > [<c013e28e>] __alloc_pages+0x1b3/0x358 > [<c013e458>] __get_free_pages+0x25/0x3f > [<c01416dc>] kmem_getpages+0x21/0xc9 > [<c01423bb>] cache_grow+0xab/0x14d > [<c01425d1>] cache_alloc_refill+0x174/0x219 > [<c0142a24>] __kmalloc+0x85/0x8c > [<c0219da1>] alloc_skb+0x47/0xe0 > [<f8febfb8>] gmip_recv_interrupt+0x216/0x4c7 [gm] > [<c011a76a>] load_balance+0x15c/0x170 > [<f8ff1691>] __gm_ethernet_wake_callback+0x6a/0x9c [gm] > [<f8fde584>] gm_handle_claimed_interrupt+0x580/0x62e [gm] > [<c025d0ae>] udp_queue_rcv_skb+0x174/0x2a4 > [<c025d6c6>] udp_rcv+0x164/0x407 > [<c023b6d8>] ip_defrag+0x112/0x1bf > [<c0239d4d>] ip_local_deliver+0xe8/0x279 > [<c023a26d>] ip_rcv+0x38f/0x510 > [<c021ffe3>] netif_receive_skb+0x1c7/0x2a1 > [<c022013b>] process_backlog+0x7e/0x10b > [<c022023f>] net_rx_action+0x77/0xf6 > [<f8fea40e>] gm_linux_intr+0x9c/0xac [gm] > [<c010899d>] handle_IRQ_event+0x31/0x65 > [<c0108d0b>] do_IRQ+0x9e/0x130 > [<c0106b6c>] common_interrupt+0x18/0x20 > [<c010401e>] default_idle+0x0/0x2c > [<c0104047>] default_idle+0x29/0x2c > [<c01040bc>] cpu_idle+0x3f/0x58 > [<c034a895>] start_kernel+0x197/0x1d5 > [<c034a336>] unknown_bootoption+0x0/0x15c [ snip ] |
From: Erik H. <eah...@gm...> - 2005-05-11 18:29:28
|
On 5/9/05, Dale Harris <ro...@ma...> wrote: > On Mon, May 09, 2005 at 10:54:16AM -0700, Erik Hendriks elucidated: > > On 5/7/05, Dale Harris <ro...@ma...> wrote: > > > > > > Hey I'm seeing some page allocation errors. Has anyone seen anything > > > like this. Of course it a 2.6.9 vanilla kernel patched for bproc, an= d > > > myrinet GM driver is running. > > > > I don't think this has anything to do with BProc per se. I've seen > > stuff like this before whenever I turn on jumbo frames on any machine > > and start shoving a lot of data through. I'm not 100% sure I'm right > > about what's going on but here's my guess: > > > > Once you start using jumbo frame sizes (~9k) it gets harder to > > allocate skuffs in the kernel. These are the buffers that hold > > network packets. Pages for this kind of stuff are allocated in powers > > of two. 9k will require 4 pages. Also, since it's kernel space stuff >=20 > Okay, so this does make sense. Myrinet is using Jumbo frames and set > itself to a default of MTU of 9000. Myrinet suggested dropping the > frame size down to 4000. >=20 > With bproc is can I just put a: >=20 > plugin ifconfig myri0 mtu 4000 >=20 > To set the MTU when the nodes boot? Or does it support setting the MTU? It doesn't although it certainly should... There's code in the ifdup beoboot module to do it. That could probably be copied to the ifconfig module without too much trouble. - Erik |
From: Sean <se...@la...> - 2005-05-11 18:09:09
|
HI I wrote the attached patch to ifdup.c that lets you set the mtu in beoboot quite some time ago. I guess I should check it in. Sean Dale Harris wrote: >On Mon, May 09, 2005 at 10:54:16AM -0700, Erik Hendriks elucidated: > > >>On 5/7/05, Dale Harris <ro...@ma...> wrote: >> >> >>>Hey I'm seeing some page allocation errors. Has anyone seen anything >>>like this. Of course it a 2.6.9 vanilla kernel patched for bproc, and >>>myrinet GM driver is running. >>> >>> >>I don't think this has anything to do with BProc per se. I've seen >>stuff like this before whenever I turn on jumbo frames on any machine >>and start shoving a lot of data through. I'm not 100% sure I'm right >>about what's going on but here's my guess: >> >>Once you start using jumbo frame sizes (~9k) it gets harder to >>allocate skuffs in the kernel. These are the buffers that hold >>network packets. Pages for this kind of stuff are allocated in powers >>of two. 9k will require 4 pages. Also, since it's kernel space stuff >> >> > > >Okay, so this does make sense. Myrinet is using Jumbo frames and set >itself to a default of MTU of 9000. Myrinet suggested dropping the >frame size down to 4000. > > >With bproc is can I just put a: > >plugin ifconfig myri0 mtu 4000 > > >To set the MTU when the nodes boot? Or does it support setting the MTU? > > >Dale > > > >------------------------------------------------------- >This SF.Net email is sponsored by: NEC IT Guy Games. >Get your fingers limbered up and give it your best shot. 4 great events, 4 >opportunities to win big! Highest score wins.NEC IT Guy Games. Play to >win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 >_______________________________________________ >BProc-users mailing list >BPr...@li... >https://lists.sourceforge.net/lists/listinfo/bproc-users > > > |
From: Dale H. <ro...@ma...> - 2005-05-09 20:26:53
|
On Mon, May 09, 2005 at 10:54:16AM -0700, Erik Hendriks elucidated: > On 5/7/05, Dale Harris <ro...@ma...> wrote: > > > > Hey I'm seeing some page allocation errors. Has anyone seen anything > > like this. Of course it a 2.6.9 vanilla kernel patched for bproc, and > > myrinet GM driver is running. > > I don't think this has anything to do with BProc per se. I've seen > stuff like this before whenever I turn on jumbo frames on any machine > and start shoving a lot of data through. I'm not 100% sure I'm right > about what's going on but here's my guess: > > Once you start using jumbo frame sizes (~9k) it gets harder to > allocate skuffs in the kernel. These are the buffers that hold > network packets. Pages for this kind of stuff are allocated in powers > of two. 9k will require 4 pages. Also, since it's kernel space stuff Okay, so this does make sense. Myrinet is using Jumbo frames and set itself to a default of MTU of 9000. Myrinet suggested dropping the frame size down to 4000. With bproc is can I just put a: plugin ifconfig myri0 mtu 4000 To set the MTU when the nodes boot? Or does it support setting the MTU? Dale |
From: Dale H. <ro...@ma...> - 2005-05-08 00:22:25
|
Hey I'm seeing some page allocation errors. Has anyone seen anything like this. Of course it a 2.6.9 vanilla kernel patched for bproc, and myrinet GM driver is running. swapper: page allocation failure. order:2, mode:0x20 [<c013e28e>] __alloc_pages+0x1b3/0x358 [<c013e458>] __get_free_pages+0x25/0x3f [<c01416dc>] kmem_getpages+0x21/0xc9 [<c01423bb>] cache_grow+0xab/0x14d [<c01425d1>] cache_alloc_refill+0x174/0x219 [<c0142a24>] __kmalloc+0x85/0x8c [<c0219da1>] alloc_skb+0x47/0xe0 [<f8febfb8>] gmip_recv_interrupt+0x216/0x4c7 [gm] [<c011a76a>] load_balance+0x15c/0x170 [<f8ff1691>] __gm_ethernet_wake_callback+0x6a/0x9c [gm] [<f8fde584>] gm_handle_claimed_interrupt+0x580/0x62e [gm] [<c025d0ae>] udp_queue_rcv_skb+0x174/0x2a4 [<c025d6c6>] udp_rcv+0x164/0x407 [<c023b6d8>] ip_defrag+0x112/0x1bf [<c0239d4d>] ip_local_deliver+0xe8/0x279 [<c023a26d>] ip_rcv+0x38f/0x510 [<c021ffe3>] netif_receive_skb+0x1c7/0x2a1 [<c022013b>] process_backlog+0x7e/0x10b [<c022023f>] net_rx_action+0x77/0xf6 [<f8fea40e>] gm_linux_intr+0x9c/0xac [gm] [<c010899d>] handle_IRQ_event+0x31/0x65 [<c0108d0b>] do_IRQ+0x9e/0x130 [<c0106b6c>] common_interrupt+0x18/0x20 [<c010401e>] default_idle+0x0/0x2c [<c0104047>] default_idle+0x29/0x2c [<c01040bc>] cpu_idle+0x3f/0x58 [<c034a895>] start_kernel+0x197/0x1d5 [<c034a336>] unknown_bootoption+0x0/0x15c swapper: page allocation failure. order:2, mode:0x20 [<c013e28e>] __alloc_pages+0x1b3/0x358 [<c013e458>] __get_free_pages+0x25/0x3f [<c01416dc>] kmem_getpages+0x21/0xc9 [<c01423bb>] cache_grow+0xab/0x14d [<c01425d1>] cache_alloc_refill+0x174/0x219 [<c0142a24>] __kmalloc+0x85/0x8c [<c0219da1>] alloc_skb+0x47/0xe0 [<f8febfb8>] gmip_recv_interrupt+0x216/0x4c7 [gm] [<c011a76a>] load_balance+0x15c/0x170 [<f8ff1691>] __gm_ethernet_wake_callback+0x6a/0x9c [gm] [<f8fde584>] gm_handle_claimed_interrupt+0x580/0x62e [gm] [<c025d0ae>] udp_queue_rcv_skb+0x174/0x2a4 [<c025d6c6>] udp_rcv+0x164/0x407 [<c023b6d8>] ip_defrag+0x112/0x1bf [<c0239d4d>] ip_local_deliver+0xe8/0x279 [<c023a26d>] ip_rcv+0x38f/0x510 [<c021ffe3>] netif_receive_skb+0x1c7/0x2a1 [<c022013b>] process_backlog+0x7e/0x10b [<c022023f>] net_rx_action+0x77/0xf6 [<f8fea40e>] gm_linux_intr+0x9c/0xac [gm] [<c010899d>] handle_IRQ_event+0x31/0x65 [<c0108d0b>] do_IRQ+0x9e/0x130 [<c0106b6c>] common_interrupt+0x18/0x20 [<c010401e>] default_idle+0x0/0x2c [<c0104047>] default_idle+0x29/0x2c [<c01040bc>] cpu_idle+0x3f/0x58 [<c034a895>] start_kernel+0x197/0x1d5 [<c034a336>] unknown_bootoption+0x0/0x15c [<c013e28e>] __alloc_pages+0x1b3/0x358 [<c013e458>] __get_free_pages+0x25/0x3f [<c01416dc>] kmem_getpages+0x21/0xc9 [<c0142239>] alloc_slabmgmt+0x55/0x5f [<c01423bb>] cache_grow+0xab/0x14d [<c01425d1>] cache_alloc_refill+0x174/0x219 [<c0142a24>] __kmalloc+0x85/0x8c [<c0219da1>] alloc_skb+0x47/0xe0 [<f8febfb8>] gmip_recv_interrupt+0x216/0x4c7 [gm] [<c0106b6c>] common_interrupt+0x18/0x20 [<f8ff1691>] __gm_ethernet_wake_callback+0x6a/0x9c [gm] [<f8fde584>] gm_handle_claimed_interrupt+0x580/0x62e [gm] [<c0219206>] sock_alloc_send_skb+0x2f/0x33 [<c023ea78>] ip_append_data+0x7f7/0x8a7 [<c021997f>] release_sock+0x1b/0x71 [<c025c3d5>] udp_sendmsg+0x2eb/0x759 [<c023e1c4>] ip_generic_getfrag+0x0/0xbd [<f8fea40e>] gm_linux_intr+0x9c/0xac [gm] [<c010899d>] handle_IRQ_event+0x31/0x65 [<c0108d0b>] do_IRQ+0x9e/0x130 [<c0106b6c>] common_interrupt+0x18/0x20 [<c021007b>] input_devices_read+0x42c/0x593 [<c0146c9d>] page_address+0x19/0xa5 [<c0219458>] sock_no_sendpage+0x35/0x85 [<c01579e7>] do_readv_writev+0x1ec/0x273 [<c025c952>] udp_sendpage+0x10f/0x136 [<c0118e76>] task_rq_lock+0x36/0x66 [<c026426b>] inet_sendpage+0x9b/0xca [<fa0b25db>] svc_sendto+0xac/0x29e [sunrpc] [<fa1129e9>] nfsd_cache_update+0x8c/0x14c [nfsd] [<fa0b2cbb>] svc_udp_sendto+0x1e/0x3d [sunrpc] [<fa0b3f0c>] svc_send+0xb9/0xfc [sunrpc] [<fa0b1b58>] svc_process+0x2b4/0x788 [sunrpc] [<fa1094b2>] nfsd+0x1f3/0x39e [nfsd] [<fa1092bf>] nfsd+0x0/0x39e [nfsd] [<c010428d>] kernel_thread_helper+0x5/0xb 1 warning and 1 error issued. Results may not be reliable. -- Dale Harris ro...@ma... /.-) |
From: <bc...@au...> - 2005-03-31 06:39:54
|
<BODY><P>Hi, just wondering if anybody tested ran cm5 with fedora Core 3 ?</P> <P> </P> <P>thanks</P> <P> </P></BODY> |
From: Erik H. <eah...@gm...> - 2005-03-29 23:22:52
|
On Fri, 25 Mar 2005 11:02:03 -0500, Luke Schierer <lu...@ac...> wrote: > Attached is a simple Perl script that I can use to tank the system. > The script uses blocking NFS file locking (a great, simple way to > coordinate jobs across a cluster), and works fine > on other computers. For example, if you spawn a bunch of them at once > > for i in `seq 1 8 ` ; do filelocktest name_of_existing_file & done > > The last script will finish 8 seconds later, each script taking > a turn holding the lock on the file for 1 second. It also > works across multiple (non-clustermatic) machines if the > name_of_existing_file is on a commonly NFS mounted directory. > > However, if you try the script on our cluster (where all > the nodes have /home NFS mounted and /proc/sys/bproc/shell_hack > is off): > > bpsh 1-30 filelocktest name_of_existing_file > > It does not run in 30 seconds as expected. The locks are obtained > much more slowly than 1/sec and after little while the whole > system freezes up and dumps the message that I sent earlier. > Note that while using ~10 nodes takes > much longer than 10 seconds, it usually succeeds after a certain > amount of time, and doesn't crash. 30 nodes and more crashes pretty > reliably. > > On another note our final piece of cluster weirdness that I've > detected is also NFS related, though not as important. > When I read a file off a > master NFS server drive from a node I get 50 MB/s, which > is how fast the drive goes (Yay! The 2.4 kernel maxed out at > ~20MB/s over NFS for a single client.) But then I read the > same file from the master NFS server again from a different node > now that it is cached on the server and I get only 10 MB/s. > To make certain that I'm not nuts I read the same file over NFS from > a non-clustermatic computer and I get 100 MB/s, the legal gigabit limit > (Sweet!). > Summary: NFS to clustermatic nodes is much slower if the file is > cached in the master NFS server. > > It seems very odd that I'm getting these NFS problems. Shouldn't that > be pretty much be independent of the bproc changes to the kernel? > Would having an NFS server separate from the bproc master fix things? Yeah, that is weird. BProc doesn't touch NFS code at all and it shouldn't get in the way of scheduling or the RPC threads or anything like that. Are you running a lockd and/or statd on the nodes? (node_up doesn't deal with that kind of stuff right now which prevents locking from working) I would expect you to just get errors in that case though. Other than that the only thing I can thing to look for would be to make sure that all the mount options and server options are the same. It's possible that node_up won't have the same defaults as the normal mount program. - Erik |