You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: <er...@he...> - 2004-05-12 15:57:38
|
On Sat, May 08, 2004 at 02:46:35PM -0500, Jim Phillips wrote: > Hi, > > I'm trying to use Clustermatic 4 to boot a Sun V60x dual Xeon 3.06 GHz. I > tested this with a demo 2.8 GHz machine and it worked fine. Now with the > new machine I boot off of the floppy and everything looks good right up to > "monte: restarting system in 2 seconds..." but then it just hangs. > > Did anything change with the newer processors? There is probably a > different BIOS in the new machine as well, so is there something to > disable that might be interfering? Might this be addressed in a newer > version of beoboot? Could I try to PXE boot directly into phase 2? I haven't tried one of those machines myself at this point. It could certainly be a cranky BIOS of some kind. One possibility is if you're using a different possibly larger kernel libmonte might just be failing to load it correctly. Newer kernels (especially 2.6) are overflowing a counter in the kernel headers which is supposed to tell you how big the kernel is. Monte's output should indicate whether it's doing something sensible or not. Look for the line that says something like: monte: region: 372 pages at 0x100000 ^^^ this number (* 4096) should roughly match the size of your bzImage. If it doesn't then the loader is screwing up. The patch below should fix libmonte so that it ignores the counter. Obviously PXE boot straight to phase 2 might be an easier short term solution. - Erik Index: libmonte.c =================================================================== RCS file: /home/repository/beoboot/monte/libmonte.c,v retrieving revision 1.13 retrieving revision 1.14 diff -u -r1.13 -r1.14 --- libmonte.c 20 Sep 2002 21:39:53 -0000 1.13 +++ libmonte.c 5 Dec 2003 18:15:53 -0000 1.14 @@ -19,7 +19,7 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * - * $Id: libmonte.c,v 1.13 2002/09/20 21:39:53 hendriks Exp $ + * $Id: libmonte.c,v 1.14 2003/12/05 18:15:53 hendriks Exp $ *--------------------------------------------------------------------*/ #define _GNU_SOURCE /* needed for mremap() */ #include <stdio.h> @@ -238,7 +238,6 @@ int monte_load_linux_kernel(struct monte_boot_t *boot, const void *buffer, long size){ - int len; void *setup_data, *kernel_data; struct monte_region_t *region; struct kernel_setup_t *stmp; @@ -275,21 +274,14 @@ buffer += (stmp->setup_sects+1)*512; /* update buffer pointers */ size -= (stmp->setup_sects+1)*512; - /* Load the kernel code itself */ - len = boot->setup->kernel_para*16; - if (len > size) { - if (len - size >= 16) { - fprintf(stderr, "monte: not enough kernel data." - " want=%d; got=%ld\n", len, size); - return -1; - } - len = size; - } + /* The number of kernel "paragraphs" is getting overflowed by + * todays kernels. Ignore it and just load the rest of the data + * we have. */ region = region_new(boot, (void*)boot->setup->start); - kernel_data = region_size(region, len); - memcpy(kernel_data, buffer, len); + kernel_data = region_size(region, size); + memcpy(kernel_data, buffer, size); printf("monte: kernel code : %8d bytes at %p\n", - len, (void *) boot->setup->start); + size, (void *) boot->setup->start); if (boot->param.flags & MONTE_PROTECTED) { if (save_old_setup(boot)) return -1; |
From: Matt L. L. <mll...@hp...> - 2004-05-09 20:46:01
|
We have a BProc IB cluster up and running (so does LANL). The main issue is that there are several open source IB software stacks. We are involved in the OpenIB project (www.openib.org) which is unifying these stacks. Currently the biggest issue is that the stacks are difficult to build. We are working through these issues with the openib developers. The folks at LANL and Sandia are working on getting LinuxBIOS working over IB (i.e. booting over IB instead of ethernet). I suspect this will take a few more weeks to work out the kinks. =20 For those you interested in IB please join the openib mail list. I'm hosting the openib.org site so we can set things up to be able to cross post messages to openib and bproc users list. - Matt On Sun, 2004-05-09 at 10:10, Yinghai Lu wrote: > One month ago, some said there will be one announcement on using IB wit= h > bproc.=20 >=20 > Is there any progress? >=20 > YH >=20 > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- > =E5=8F=91=E4=BB=B6=E4=BA=BA: bpr...@li... > [mailto:bpr...@li...] =E4=BB=A3=E8=A1=A8 Gre= g Watson > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2004=E5=B9=B45=E6=9C=889=E6=97=A5= 6:25 > =E6=94=B6=E4=BB=B6=E4=BA=BA: BProc users list > =E4=B8=BB=E9=A2=98: [BProc] Clustermatic tutorial at LCI >=20 > If anyone is interested in getting some hands-on experience with =20 > Clustermatic (including LinuxBIOS), we will be running a tutorial at =20 > the LCI conference in Austin, TX on 17 May. For more details, see =20 > http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/ > tutorials.html#clustermatic. This is a great opportunity to meet some =20 > of the people that developed Clustermatic and find out exactly why =20 > things work the way they do! >=20 > Regards, >=20 > Greg >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to de= liver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=3Dosdnemail3 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to de= liver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=3Dosdnemail3 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Yinghai L. <yh...@ty...> - 2004-05-09 17:20:33
|
One month ago, some said there will be one announcement on using IB with bproc.=20 Is there any progress? YH -----=D3=CA=BC=FE=D4=AD=BC=FE----- =B7=A2=BC=FE=C8=CB: bpr...@li... [mailto:bpr...@li...] =B4=FA=B1=ED Greg = Watson =B7=A2=CB=CD=CA=B1=BC=E4: 2004=C4=EA5=D4=C29=C8=D5 6:25 =CA=D5=BC=FE=C8=CB: BProc users list =D6=F7=CC=E2: [BProc] Clustermatic tutorial at LCI If anyone is interested in getting some hands-on experience with =20 Clustermatic (including LinuxBIOS), we will be running a tutorial at =20 the LCI conference in Austin, TX on 17 May. For more details, see =20 http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/=20 tutorials.html#clustermatic. This is a great opportunity to meet some =20 of the people that developed Clustermatic and find out exactly why =20 things work the way they do! Regards, Greg ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to = deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=3Dosdnemail3 _______________________________________________ BProc-users mailing list BPr...@li... https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: Greg W. <gw...@la...> - 2004-05-09 13:24:48
|
If anyone is interested in getting some hands-on experience with Clustermatic (including LinuxBIOS), we will be running a tutorial at the LCI conference in Austin, TX on 17 May. For more details, see http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/ tutorials.html#clustermatic. This is a great opportunity to meet some of the people that developed Clustermatic and find out exactly why things work the way they do! Regards, Greg |
From: Steven J. <py...@li...> - 2004-05-09 12:56:24
|
Greetings, Normally, I just PXE boot (or etherboot when using LinuxBIOS see www.linuxbios.org) directly into stage 2. I've found monte to be a bit fragile. You could also consider kexec http://developer.osdl.org/rddunlap/kexec/ Ideally, flash parts on mainboards will get big enough to use a small kernel with kexec as the bootloader, but PXE and etherboot work in the meanwhile. G'day, sjames -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 2701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office & fax 866.545.6306 ----------------------------------------------------------------------- On Sat, 8 May 2004, Jim Phillips wrote: > Hi, > > I'm trying to use Clustermatic 4 to boot a Sun V60x dual Xeon 3.06 GHz. I > tested this with a demo 2.8 GHz machine and it worked fine. Now with the > new machine I boot off of the floppy and everything looks good right up to > "monte: restarting system in 2 seconds..." but then it just hangs. > > Did anything change with the newer processors? There is probably a > different BIOS in the new machine as well, so is there something to > disable that might be interfering? Might this be addressed in a newer > version of beoboot? Could I try to PXE boot directly into phase 2? > > -Jim > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Jim P. <ji...@ks...> - 2004-05-08 19:46:38
|
Hi, I'm trying to use Clustermatic 4 to boot a Sun V60x dual Xeon 3.06 GHz. I tested this with a demo 2.8 GHz machine and it worked fine. Now with the new machine I boot off of the floppy and everything looks good right up to "monte: restarting system in 2 seconds..." but then it just hangs. Did anything change with the newer processors? There is probably a different BIOS in the new machine as well, so is there something to disable that might be interfering? Might this be addressed in a newer version of beoboot? Could I try to PXE boot directly into phase 2? -Jim |
From: <er...@he...> - 2004-04-28 19:16:10
|
I've just posted some ports of BProc and Beoboots on sourceforge. They're still rough but the number of oopses and explosions is probably low enough for other people to not go insane. I've only had time to do x86 and AMD64 at this point. BProc: http://sourceforge.net/project/showfiles.php?group_id=24453&package_id=16594&release_id=234569 Beoboot: http://sourceforge.net/project/showfiles.php?group_id=24453&package_id=32140&release_id=234571 Release notes are attached below. Enjoy, Erik BProc 4.0.0pre4 ------------------------------------------------------------------ The big thing in this release is a port to Linux 2.6.5 This port is still a work in progress. It's still rough and there are a lot of things that won't work or won't work quite right. Some known issues include: - execve hook is currently unimplemented - daemon connection management isn't working quite right. Ping timeouts and the like are broken right now. - only part of the ptrace interface is implemented. (it grew by a lot) All the hooks are probably not in quite the right places yet. Most of the time was spent reworking process movement. Process movement is atomic now from the point of view of other proceses. This gets rid of a whole lot of confusion and weird flags when a process is involved in moving itself. The ptrace interface should be a lot more solid as a result. So far the port has been done for x86 and AMD64. The architecture specific bits haven't been done yet for Alpha or PPC. Beoboot beoboot-cm 1.9 ------------------------------------------------------- This version should be used with BProc version 4.0.0pre4+ This version requires cmtools version 1.1+ WARNING: This is a barely tested release for Linux 2.6.5. This version has been ported to Linux 2.6.5. This affected mostly the module gathering and loading code. Romfs has been dropped in favor Linux 2.6's initramfs stuff. Doing that allowed us to get rid of genromfs and all the pivot_root() related code in the boot up program. Unfortunately, Linux 2.6.5 requires a small patch for the romfs file systems to be recognized. It's included as "initramfs-search-for-init.patch". Hopefully, that one will make it into future linux releases. The name of the two kernel monte patch has changed. It is now called "linux-2.6.5-save_boot_params.patch" and can be found in the monte directory. Monte still requires sys_call_table to be exported (this will be fixed at some point). If the BProc patch isn't in the phase 1 kernel, you will have to add two lines to of code somewhere to export it. I added it to the end of kernel/sys.c like this: extern void *sys_call_table[]; EXPORT_SYMBOL(sys_call_table); So far, this has been tested on x86 and AMD64 only. The 'kmod' plugin for node_up is broken in this release. For information on other changes see the ChangeLog. |
From: Martin C. <yl...@ya...> - 2004-04-23 19:52:34
|
Anti-aging formula that does wonders This formula does miracles to your body Helps against: Lack of Energy,Weight Gain,Wrinkles,Baldness, and many more Just take a look: http://www.summershapes.biz/ RemoveHere http://www.summershapes.biz/remove.php ----980233639115932---- |
From: <er...@he...> - 2004-04-21 22:47:55
|
On Tue, Apr 20, 2004 at 02:41:05PM -0400, Daniel Gruner wrote: > Hi > > Is there any kind of documentation on the BProc api changes between 3.x and > 4.0? I am interested in getting lam-mpi to work, and it requires porting. > It complains that a bunch of functions are no longer there: > > bproc_getnodebyname > bproc_nodenumber > > Could someone point me to the proper replacement functions? also, look at the release notes for 4.0.0pre1 for some of the hidden gotchas. - Erik |
From: <er...@he...> - 2004-04-21 22:43:54
|
On Tue, Apr 20, 2004 at 02:41:05PM -0400, Daniel Gruner wrote: > Hi > > Is there any kind of documentation on the BProc api changes between 3.x and > 4.0? I am interested in getting lam-mpi to work, and it requires porting. > It complains that a bunch of functions are no longer there: > > bproc_getnodebyname > bproc_nodenumber > > Could someone point me to the proper replacement functions? I haven't written any docs yet. Those two are missing but could easily go back in. I think bproc_nodenumber got axed because it was really slow. It's implemented as bproc_nodelist followed by a search to find the specified IP address. This got compouned by the fact that the people using it here called it many times in rapid succession. That caused bproc_nodelist to get called a lot. Performance sucked and bpmaster got hammered by the requests in the mean time. Obviously, the reasonable way to handle this would be to get the nodelist once and then scan it multiple times. This is hard to hide in the library - you don't really want to cache since the information can change. If it just becomes an external scan function, there's hardly any reason to have it at all. That was my conclusion anyway. The bpfs file system stuff got written to fix the slowness of stuff that used to ask the master daemon for machine state. bproc_getnodebyname was a trivial little crutch. It basically amounts to strtol(). The theory here was that it was going to be replaced by the nodeset + nodefilter stuff. That stuff doesn't currently understand stuff like n10 though. I think the API could probably be hashed out a little more. I feel like there's too many variants on the move calls. I was planning to solicit input on it when I get some time to work on it again. In any case, I wouldn't count on it staying exactly the same until the "pre" disappears from the version number. - Erik |
From: Greg W. <gw...@la...> - 2004-04-21 01:01:25
|
On 20/04/2004, at 6:37 PM, Brian W. Barrett wrote: > On Apr 20, 2004, at 3:06 PM, Greg Watson wrote: > >> None that I know of. The file clients/sys/bproc.h documents all the >> interfaces that are available. >> >> In order to get the node number from the IP address, you now use: > > <snip> > >> In order to look up a node by it's name, you first convert the name >> to an IP address, then use the above. If you're running the latest >> beonss then you can look up nodes using the names "master", "self", >> "n0", "n1", etc. using the standard gethostbyname() routines. > > Ok, that makes sense and I think I could fix up the bproc support in > LAM to deal with these changes no problem. Can I assume that anyone > running bproc 4.0 is using the beonss package? If not, is there any > way to convert from name to IP address other than the functions that > disappeared for bproc 4.0? I guess what I'm asking is if it is safe > to look for bproc_getnodebyname() and if I don't find that, assume I > can call gethostbyname("n0") and get a reasonable result? I think it's a reasonable assumption. If they're not using beonss, then providing the mapping in /etc/hosts will be equally effective. BTW, does LAM need to do the lookup on the nodes or on the frontend? Currently, beonss only allows a node to look up it's own name or the name of the frontend, but not the names of other nodes. > > Daniel, We (the LAM team) don't have any access to BProc 4.0 clusters. > Given that this looks fairly simple to hack up, I can probably fix > something up for LAM 7.1, but would need a volunteer to do some > testing (hint, hint). > I'm going to need LAM on a bproc cluster in the next month or so. I'll be able to help out if this timeframe works for you. Regards, Greg |
From: Brian W. B. <brb...@la...> - 2004-04-21 00:37:35
|
On Apr 20, 2004, at 3:06 PM, Greg Watson wrote: > None that I know of. The file clients/sys/bproc.h documents all the > interfaces that are available. > > In order to get the node number from the IP address, you now use: <snip> > In order to look up a node by it's name, you first convert the name to > an IP address, then use the above. If you're running the latest beonss > then you can look up nodes using the names "master", "self", "n0", > "n1", etc. using the standard gethostbyname() routines. Ok, that makes sense and I think I could fix up the bproc support in LAM to deal with these changes no problem. Can I assume that anyone running bproc 4.0 is using the beonss package? If not, is there any way to convert from name to IP address other than the functions that disappeared for bproc 4.0? I guess what I'm asking is if it is safe to look for bproc_getnodebyname() and if I don't find that, assume I can call gethostbyname("n0") and get a reasonable result? Daniel, We (the LAM team) don't have any access to BProc 4.0 clusters. Given that this looks fairly simple to hack up, I can probably fix something up for LAM 7.1, but would need a volunteer to do some testing (hint, hint). Brian -- Brian Barrett LAM/MPI developer and all around nice guy Have a LAM/MPI day: http://www.lam-mpi.org/ |
From: Greg W. <gw...@la...> - 2004-04-20 22:06:52
|
Daniel, None that I know of. The file clients/sys/bproc.h documents all the interfaces that are available. In order to get the node number from the IP address, you now use: struct bproc_node_set_t list; if (bproc_nodelist(&list) == -1) { perror("bproc_nodelist"); exit(1); } for (i=0; i < list.size; i++) { struct sockaddr_in *sin = (struct sockaddr_in *)&list.node[i].addr; if (memcmp(&addr, &sin->sin_addr, sizeof(addr)) == 0) { return list.node[i].node; } } In order to look up a node by it's name, you first convert the name to an IP address, then use the above. If you're running the latest beonss then you can look up nodes using the names "master", "self", "n0", "n1", etc. using the standard gethostbyname() routines. Regards, Greg On 20/04/2004, at 12:41 PM, Daniel Gruner wrote: > Hi > > Is there any kind of documentation on the BProc api changes between > 3.x and > 4.0? I am interested in getting lam-mpi to work, and it requires > porting. > It complains that a bunch of functions are no longer there: > > bproc_getnodebyname > bproc_nodenumber > > Could someone point me to the proper replacement functions? > > Thanks, > Daniel > -- > > Dr. Daniel Gruner dg...@ti... > Dept. of Chemistry dan...@ut... > University of Toronto phone: (416)-978-8689 > 80 St. George Street fax: (416)-978-5325 > Toronto, ON M5S 3H6, Canada finger for PGP public key > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Dale H. <ro...@ma...> - 2004-04-20 21:58:16
|
Trying out bjs. What is the proper configuration for a shared policy? Currently I have pool default policy shared nodes 0-126 But bjs dies if I try to submit anything... like I was trying to do: bjssub -i -n 10 /sbin/ifconfig give me an error like, Failed to read sexp from the bjs. The strace of the bjs daemon give me: write(2, "1", 1) = 1 write(2, "0", 1) = 1 write(2, ")", 1) = 1 write(2, "(", 1) = 1 write(2, "s", 1) = 1 write(2, "e", 1) = 1 write(2, "c", 1) = 1 write(2, "s", 1) = 1 write(2, " ", 1) = 1 write(2, "1", 1) = 1 write(2, ")", 1) = 1 write(2, ")", 1) = 1 write(2, ")", 1) = 1 write(2, "\n", 1) = 1 time(NULL) = 1082498098 wait4(-1, 0xbfffee10, WNOHANG, NULL) = -1 ECHILD (No child processes) --- SIGSEGV (Segmentation fault) @ 0 (0) --- bjs version 1.2 bproc 3.2.6 Any suggestions? Something else, bjs appears to set the mode of the nodes to execute by root only. I suppose that is to keep people from working around it? -- Dale Harris ro...@ma... /.-) |
From: Daniel G. <dg...@ti...> - 2004-04-20 18:41:18
|
Hi Is there any kind of documentation on the BProc api changes between 3.x and 4.0? I am interested in getting lam-mpi to work, and it requires porting. It complains that a bunch of functions are no longer there: bproc_getnodebyname bproc_nodenumber Could someone point me to the proper replacement functions? Thanks, Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Thomas E. <eck...@gm...> - 2004-04-16 11:57:11
|
Hi all, "head" is used in "beoboot" (here: beoboot-cm-1.8) when copying kernel-modules. Recent coreutils (v5.2.0 tested on amd64) issue a warning that the head -N syntax is obsolete. Simply fixed by using head -n N The attached (trivial) patch ajusts beoboot to the changed syntax Thomas |
From: Greg W. <gw...@la...> - 2004-04-12 19:55:18
|
It really depends on your application. For small clusters mounting home=20= via NFS seems to be the best alternative. When you get up to 256+ nodes=20= or you need to do high-speed parallel I/O, NFS just does not cut it.=20 For this type of application both Panasas (http://www.panasas.com) and=20= Lustre (http://www.lustre.org) look like promising alternatives. Greg On 12/04/2004, at 7:00 AM, Rigler, Steve wrote: > Greetings all, > > I am a new user to bproc and have begun using it with a small (6-node) > cluster running Fedora and Clustermatic 4. This is actually our > first investigation into using clusters for HPC, as our environment > has primarily used big, shared-memory machines (mainly SGI) in the > past. > > I am curious what other people are doing in the way of filesystems. > Our environment uses NIS and automounts extensively (home directories, > software and some data are automounted). We'll probably have some > storage "locally" attached to the cluster, but we'll probably never > be able to get away from the need for automounts. > > V9fs looks interesting, but it seems to handle ownership in an odd > fashion; when a regular user creates a file, the file looks like it > is owned by root although regular users can modify or delete that > file. > > I'm curious what others are doing in these areas. > > Thanks, > Steve > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=3Dclick > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Rigler, S. <SR...@Ma...> - 2004-04-12 13:01:07
|
Greetings all, I am a new user to bproc and have begun using it with a small (6-node) cluster running Fedora and Clustermatic 4. This is actually our first investigation into using clusters for HPC, as our environment has primarily used big, shared-memory machines (mainly SGI) in the past. I am curious what other people are doing in the way of filesystems. Our environment uses NIS and automounts extensively (home directories, software and some data are automounted). We'll probably have some storage "locally" attached to the cluster, but we'll probably never be able to get away from the need for automounts. V9fs looks interesting, but it seems to handle ownership in an odd fashion; when a regular user creates a file, the file looks like it is owned by root although regular users can modify or delete that file. I'm curious what others are doing in these areas. Thanks, Steve |
From: Greg W. <gw...@la...> - 2004-04-09 22:33:10
|
On 09/04/2004, at 5:37 AM, ha...@no... wrote: > (Any work done this way? Which way is going v9fs these days? Any > revival of Ron Minnich's AutoCacher in bproc/v9fs context?) > We got swamped with so many other things that v9fs kind of got put on the back burner. Also, we're waiting to see if Panasas and/or Lustre will meet our filesystem requirements. It may get revisited sometime though. Greg |
From: <ha...@no...> - 2004-04-09 10:20:48
|
> > >I think a "wrapper" could take the form of a shared library. You > > >could manually LD_PRELOAD yourself or you could modify bpsh to > > >automatically set LD_PRELOAD for the child processes. > ... > > 1. When a process gets certain signal, it VMAdumps itself to the network stream > > and bpmaster stores it into a file on the master. > > ... ... ... > > This could be also used as a general check point/restarting functionality. > > Yeah. I think what you've described here is just a simple > checkpointing mechanism. The only snag is that you'll have to re-open > files after restoring the checkpoint. Which probably means to track opening of files - while in LD_PRELOAD tricks, one could spoof also fopen(), fopen64(), freopen(), freopen64(), open(), open64() etc. etc. - then it could checkpoint at least some class of naive programs. I am just considering LD_PRELOAD as a way of implementing my file caching needs (read-only cachefs) with a vain hope that this way I would work against more stable API than when doing it in kernel (using module which spoofs open(2)). (But I know that similar projects are known to receive hard blows on either front - LD_PRELOAD or kernel module, next 'API' change may well kill the project). Maybe there could be some common LD_PRELOAD toolkit designed to complement bproc in projects like these (caching, checkpointing, ...)? (Any work done this way? Which way is going v9fs these days? Any revival of Ron Minnich's AutoCacher in bproc/v9fs context?) Regards, Vaclav Hanzl |
From: Bill F. <bi...@fe...> - 2004-04-09 02:08:33
|
Excellent, Greg. Congrats! On Thursday 08 April 2004 10:24, Greg Watson wrote: > Hi all, > > I'm pleased to announce that yesterday Clustermatic 4 won the > ClusterWorld Excellence in Cluster Technology Award for Open Source > Software. I'd like to thank everyone who worked so hard to achieve this > outstanding result. Thanks also to Daniel Gruner and Jim Phillips for > agreeing to act as reference sites for Clustermatic. > > Best regards, > > Greg > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users |
From: <er...@he...> - 2004-04-08 17:11:17
|
On Mon, Apr 05, 2004 at 04:49:12AM +0900, Kimitoshi Takahashi wrote: > er...@he... wrote: > >On Thu, Apr 01, 2004 at 03:04:29AM +0900, Kimitoshi Takahashi wrote: > >> Hi all, > >> > >> Form what I read from Bproc documents, the process migration is volunatry, > >> meaning bproc_move() must be called from the proccess to be moved. > >> > >> The lovely bpsh seems to wrap non-bproc program, and cause the program to move involuntary, > >> using bproc_vexecmove(), only at the begining. > >> > >> I'm wondering if there is any way to cause a non-bproc procces to move involuntary any time > >> at user's will. > >> > >> My colleague uses a heterogeneous cluster where the memory sizes on nodes vary. > >> He sometimes wants to move small process on a large memory machine > >> before he starts obviously huge proccess. He is only using bpsh to start processes. > >> > >> Is it technically feasible to write a like of bpsh which always wraps a process on slave nodes, > >> and handles a "move now to where" signal ? > >> > >> How would you deal with the situation my colleague has ? > > > >I think a "wrapper" could take the form of a shared library. You > >could manually LD_PRELOAD yourself or you could modify bpsh to > >automatically set LD_PRELOAD for the child processes. > > I'm afraid I don't fully understand what you meant, > probably I need to learn more about basics of C programing .... > My guess is that signal handler is in libc and you suggested to > preload a signal handler which calls bproc_move() when it gets certain signal. > Is that what you meant ? Not exactly. LD_PRELOAD instructs the dynamic linker to load a library that it wouldn't otherwise load. It also loads it before the libraries that it would normally load. This allows it to override functions in the other libraries. The Electric Fence malloc debugging tool is a nice example of this kind of thing. By default, an application doesn't have signal handlers. If it wants to handle a signal it sets a signal handler. The amounts to telling the kernel to call a function when the signal arrives. The library could setup a signal handler without telling the application about it. > >A signal seems like a good way to get the process's attention but you > >still need another way to tell it where to move to. I can't think of > >anything easy for that off the top of my head. > > How about making it a two step process: > 1. When a process gets certain signal, it VMAdumps itself to the network stream > and bpmaster stores it into a file on the master. > 2. You can then manually restart the process explicitly specifying where to move. > > It's not cool in that the process migration is not peer to peer, > rather it is origin-master-target. > > This could be also used as a general check point/restarting functionality. Yeah. I think what you've described here is just a simple checkpointing mechanism. The only snag is that you'll have to re-open files after restoring the checkpoint. - Erik |
From: Daniel G. <dg...@ti...> - 2004-04-08 16:28:12
|
Way to go!!! On Thu, Apr 08, 2004 at 10:24:55AM -0600, Greg Watson wrote: > Hi all, > > I'm pleased to announce that yesterday Clustermatic 4 won the > ClusterWorld Excellence in Cluster Technology Award for Open Source > Software. I'd like to thank everyone who worked so hard to achieve this > outstanding result. Thanks also to Daniel Gruner and Jim Phillips for > agreeing to act as reference sites for Clustermatic. > > Best regards, > > Greg > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Greg W. <gw...@la...> - 2004-04-08 16:24:59
|
Hi all, I'm pleased to announce that yesterday Clustermatic 4 won the ClusterWorld Excellence in Cluster Technology Award for Open Source Software. I'd like to thank everyone who worked so hard to achieve this outstanding result. Thanks also to Daniel Gruner and Jim Phillips for agreeing to act as reference sites for Clustermatic. Best regards, Greg |
From: Kimitoshi T. <kt...@cl...> - 2004-04-04 19:49:38
|
Sorry for the very slow response. er...@he... wrote: >> I still can't understand why I couldn't bpsh renice, >> since the PID on the slave was obtained by "bpsh 1 ps -ef" and hence it should be local pid. >> Would you elaborate a little more ? > >When you do a ps on the slave node (via bpsh) you see the same process >IDs that the front end sees. You'll note that a process doesn't >appear to change its PID when it moves to the back end. The proc file >system is modified to show the pids that the front end sees. This way >everything stays consistent when processes move around. The slave >node has different process IDs internally but you don't see those. Thank you very much for the explanation. >You can see them if you turn off the PID mapping in /proc like this: > >bpsh 1 -O /proc/sys/bproc/proc_pid_map echo 0 > >Putting a zero in that file turns off PID mapping. 1 means map for >non-root. 2 means map for everybody. It defaults to 2. If you turn >it off, then you get to see everything on the node. You should be >able to see that the real pid is in that case. If you use the real >pid, then I think renice should work. This worked. Thank you again. Kimitoshi Takahashi |