You can subscribe to this list here.
2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
|
Dec
(22) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(13) |
Feb
(22) |
Mar
(39) |
Apr
(10) |
May
(26) |
Jun
(23) |
Jul
(38) |
Aug
(20) |
Sep
(27) |
Oct
(76) |
Nov
(32) |
Dec
(11) |
2003 |
Jan
(8) |
Feb
(23) |
Mar
(12) |
Apr
(39) |
May
(1) |
Jun
(48) |
Jul
(35) |
Aug
(15) |
Sep
(60) |
Oct
(27) |
Nov
(9) |
Dec
(32) |
2004 |
Jan
(8) |
Feb
(16) |
Mar
(40) |
Apr
(25) |
May
(12) |
Jun
(33) |
Jul
(49) |
Aug
(39) |
Sep
(26) |
Oct
(47) |
Nov
(26) |
Dec
(36) |
2005 |
Jan
(29) |
Feb
(15) |
Mar
(22) |
Apr
(1) |
May
(8) |
Jun
(32) |
Jul
(11) |
Aug
(17) |
Sep
(9) |
Oct
(7) |
Nov
(15) |
Dec
|
From: <er...@he...> - 2004-03-12 17:04:45
|
On Thu, Mar 11, 2004 at 09:12:38PM -0800, YhLu wrote: > Erik, > > Any howto or doc that is talking about using bproc with IB? > > I saw there is some plug-in option for myerinet ... I'm not aware of anything like that. I haven't set it up myself. The clusters that I've seen using IB at this point were still using ethernet as the management network. BProc did all its stuff on that network. There was some stuff whacked in to the BProc setup to get IB drivers loaded. That was non-trivial since there were a lot of scripts involved. I have seen BProc run using IP over IB on little bladed thing once but we cheated and had a local linux install on the slave node to get the IB setup. Some of the Sandia California guys are on this list. They have a lot more experience with this than I do. Maybe they can comment or recommend how to go about doing this. Matt/Josh/Mitch ??? - Erik |
From: Greg W. <gw...@la...> - 2004-03-12 16:36:02
|
Hi, Would anyone who is running Clustermatic (and likes it!) be prepared to act as a reference site for the Excellence in Cluster Technology award at Clusterworld? I don't think it's a particularly onerous job, just answer a few questions about the software. Thanks, Greg |
From: YhLu <Yh...@ty...> - 2004-03-12 05:17:35
|
Erik, Any howto or doc that is talking about using bproc with IB? I saw there is some plug-in option for myerinet ... Regards YH. |
From: <er...@he...> - 2004-03-11 22:06:30
|
On Thu, Mar 11, 2004 at 02:38:24PM -0500, Daniel Gruner wrote: > Hi Erik, > > Well, as I am still struggling with the alpha (UX) machines and the > scheduling problem that starves the nodes, I would like to try to rebuild > the Clustermatic 4 kernel, and test it on my machine. The need to rebuild > is due to the fact that my machines use milo, and there is no srm console > for them. I know I need to change one parameter in the config file(s) for > the kernel(s): CONFIG_ALPHA_LEGACY_START_ADDRESS=y. > > Can you tell me what the easiest way to do this is? Do I simply modify the > kernel-2.4.22-alpha.config file that got unpacked from the src.rpm and > then use rpmbuild on the kernel-2.4.spec file? Do I need to have other > packages installed on the machine prior to rebuilding the kernel (e.g. > bproc-4.0.0pre3-1)? That should do it... I'm pretty sure you don't need to have anything funny installed. - Erik |
From: Daniel G. <dg...@ti...> - 2004-03-11 19:49:16
|
Hi Erik, Well, as I am still struggling with the alpha (UX) machines and the scheduling problem that starves the nodes, I would like to try to rebuild the Clustermatic 4 kernel, and test it on my machine. The need to rebuild is due to the fact that my machines use milo, and there is no srm console for them. I know I need to change one parameter in the config file(s) for the kernel(s): CONFIG_ALPHA_LEGACY_START_ADDRESS=y. Can you tell me what the easiest way to do this is? Do I simply modify the kernel-2.4.22-alpha.config file that got unpacked from the src.rpm and then use rpmbuild on the kernel-2.4.spec file? Do I need to have other packages installed on the machine prior to rebuilding the kernel (e.g. bproc-4.0.0pre3-1)? Thanks in advance, Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: <er...@he...> - 2004-03-11 16:15:39
|
On Thu, Mar 11, 2004 at 11:37:32PM +0900, Kimitoshi Takahashi wrote: > Hello Erik, > > I think it would be nice if the renice command on the master were correctly > forwarded to slave nodes so that we are less dependent on queing systemes to prioritize jobs. > What do you think ? It's certainly a good thing to do. It's an unintentional omission. It's on the to-do list of things to fix but there's a lot of things on that list. I'm working on a port to Linux 2.6 in between helping out with other stuff. > I still can't understand why I couldn't bpsh renice, > since the PID on the slave was obtained by "bpsh 1 ps -ef" and hence it should be local pid. > Would you elaborate a little more ? When you do a ps on the slave node (via bpsh) you see the same process IDs that the front end sees. You'll note that a process doesn't appear to change its PID when it moves to the back end. The proc file system is modified to show the pids that the front end sees. This way everything stays consistent when processes move around. The slave node has different process IDs internally but you don't see those. You can see them if you turn off the PID mapping in /proc like this: bpsh 1 -O /proc/sys/bproc/proc_pid_map echo 0 Putting a zero in that file turns off PID mapping. 1 means map for non-root. 2 means map for everybody. It defaults to 2. If you turn it off, then you get to see everything on the node. You should be able to see that the real pid is in that case. If you use the real pid, then I think renice should work. > Anyway, I really like the bproc concept, and hope it will realize > true light weight SSI. Me too :) - Erik |
From: Kimitoshi T. <kt...@cl...> - 2004-03-11 14:48:25
|
Hello Erik, I think it would be nice if the renice command on the master were correctly forwarded to slave nodes so that we are less dependent on queing systemes to prioritize jobs. What do you think ? I still can't understand why I couldn't bpsh renice, since the PID on the slave was obtained by "bpsh 1 ps -ef" and hence it should be local pid. Would you elaborate a little more ? Anyway, I really like the bproc concept, and hope it will realize true light weight SSI. It's a shame that I don't have enough skill to contribute ... Kimitoshi Takahashi er...@he... さんは書きました: >On Tue, Mar 09, 2004 at 11:10:20PM +0900, Kimitoshi Takahashi wrote: >> Hello all, >> >> I want to change priorities of remote jobs on bproc slaves. >> The normal "renice" command doesn't seem to affect priorities on slaves. >> (by monitoring /proc/PID/stat) >> >> I tried to change priority of a proccess using bpsh. >> >> $ bpsh 1 -n sleep 9000 & >> $ bpstat -p >> PID Node >> 17634 1 >> $ bpsh 1 ps -ef >> UID PID PPID C STIME TTY TIME CMD >> ktaka 17634 17632 0 Mar25 ? 00:00:00 sleep 9000 >> ktaka 17729 17727 0 Mar25 ? 00:00:00 ps -ef >> >> $ bpsh 1 renice 20 17634 >> renice: 17634: getpriority: No such process >> >> How can I renice remote proccess? > >renice is one of those syscalls I missed which means it doesn't do the >process ID mapping on the remote node or get forwarded correctly... > >- Erik |
From: <er...@he...> - 2004-03-09 21:06:19
|
On Tue, Mar 09, 2004 at 11:10:20PM +0900, Kimitoshi Takahashi wrote: > Hello all, > > I want to change priorities of remote jobs on bproc slaves. > The normal "renice" command doesn't seem to affect priorities on slaves. > (by monitoring /proc/PID/stat) > > I tried to change priority of a proccess using bpsh. > > $ bpsh 1 -n sleep 9000 & > $ bpstat -p > PID Node > 17634 1 > $ bpsh 1 ps -ef > UID PID PPID C STIME TTY TIME CMD > ktaka 17634 17632 0 Mar25 ? 00:00:00 sleep 9000 > ktaka 17729 17727 0 Mar25 ? 00:00:00 ps -ef > > $ bpsh 1 renice 20 17634 > renice: 17634: getpriority: No such process > > How can I renice remote proccess? renice is one of those syscalls I missed which means it doesn't do the process ID mapping on the remote node or get forwarded correctly... - Erik |
From: Kimitoshi T. <kt...@cl...> - 2004-03-09 14:19:32
|
Hello all, I want to change priorities of remote jobs on bproc slaves. The normal "renice" command doesn't seem to affect priorities on slaves. (by monitoring /proc/PID/stat) I tried to change priority of a proccess using bpsh. $ bpsh 1 -n sleep 9000 & $ bpstat -p PID Node 17634 1 $ bpsh 1 ps -ef UID PID PPID C STIME TTY TIME CMD ktaka 17634 17632 0 Mar25 ? 00:00:00 sleep 9000 ktaka 17729 17727 0 Mar25 ? 00:00:00 ps -ef $ bpsh 1 renice 20 17634 renice: 17634: getpriority: No such process How can I renice remote proccess? Best regards, Kimitoshi Takahashi |
From: <er...@he...> - 2004-03-08 18:09:20
|
On Sat, Mar 06, 2004 at 12:10:48AM +0000, Steven James wrote: > Greetings, > > I've been working on an app that uses the fcntls F_SETOWN, F_ASYNC, > F_SETSIG, and friends on TCP sockets to handle async events. > > The program works fine on the master or between unrelated machines, but on > a bproc slave, the signals are not delivered. > > Digging through the code, I believe I have the reason but want to sanity > check my finding: > > 1. ghost processes are only on the master. > 2. getpid (from userspace) returns the PID as seen by the master (the pid > of the ghost). > > So when fcntl( sock, F_SETOWN, getpid()), I set the ghost processes pid > into filep->f_owner.pid > > When data arrives, send_sigio in fs/fcntl.c does (on the slave): > > 435 void send_sigio(struct fown_struct *fown, int fd, int band) > 436 { > 437 struct task_struct * p; > 438 int pid = fown->pid; > 439 > 440 read_lock(&tasklist_lock); > 441 if ( (pid > 0) && (p = find_task_by_pid(pid)) ) { > 442 send_sigio_to_task(p, fown, fd, band); > 443 goto out; > 444 } > 445 for_each_task(p) { > 446 int match = p->pid; > 447 if (pid < 0) > 448 match = -p->pgrp; > 449 if (pid != match) > 450 continue; > 451 send_sigio_to_task(p, fown, fd, band); > 452 } > 453 out: > 454 read_unlock(&tasklist_lock); > 455 } > > And since pid isn't there, it does nothing at all. > > The easy answer would be to hack do_fcntl to translate the pid passed in > to the real pid on the slave. However, I'm not certain that that wouldn't > be just a band-aid on a more general issue. > > Thoughts? That sounds like it would work out ok. You might want to require the pid argument to exist (i.e. the mapping is successful) before doing the assignment of the mapped pid in the f_owner structure. That might be good enough to avoid some shenanigans that could result in sending signals across a process spaces. This would make it work on local processes only and sending to a process group would probably still be busted. It sounds like it might work for a lot of things though. The more general issue is signals being sent by things other than processes. The file descriptor really needs a process space context along with it so that it can look up PIDs in the right process space. Then the mapping could be done when the signal is actually sent and it could be forwarded if necessary, etc. I can't think of a reasonable case where anybody would want a file descriptor on one machine to send a signal to a process on another machine but I guess that's what you'd have to support to be strictly correct. TTY generated signals have the same problem. I don't much like the idea of adding extra goop to the file structure but it's probably necessary to get this right. - Erik |
From: <er...@he...> - 2004-03-08 17:27:52
|
On Thu, Mar 04, 2004 at 05:31:20PM -0600, Jim Phillips wrote: > Hi, > > Can anyone explain (and tell me how to eliminate) this difference between > a normal Red Hat 9 machine: > > jim@belfast>ldd /bin/ls > libtermcap.so.2 => /lib/libtermcap.so.2 (0x40026000) > libc.so.6 => /lib/tls/libc.so.6 (0x42000000) > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) > > and a Clustermatic 4 machine: > > jim@delhi>ldd /bin/ls > libtermcap.so.2 => /lib/libtermcap.so.2 (0x40027000) > libc.so.6 => /lib/libc.so.6 (0x4002b000) > /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) > > Why does Clustermatic ignore the /lib/tls library versions? This doesn't > seem to be a ldconfig or LD_LIBRARY_PATH option. For some reason, > /lib/libpthread.so.0 instead of /lib/tls/libpthread.so.0 seems hard wired. > Is there a way to change this? I'm not sure what "tls" is in this case but I'll assume it's got something to do with the new threading stuff in 2.6 and TLS means "thread local storage." Linux 2.6 has a bunch of new stuff in it to better support threading. In particular the futex stuff and it's also got some TLS stuff. Red Hat has back ported some of this stuff to Linux 2.4. The Clustermatic kernels do not have these features since they are stock kernel.org kernels plus patches for BProc and couple of other small things. While tinkering on Linux 2.6, I came across the TLS stuff. The C library uses this stuff for itself even when there isn't any threading involved. I presume the linker or the C library is detecting whether or not this feature is present and choosing a C library based on that. FWIW, that's my best guess... - Erik |
From: <er...@he...> - 2004-03-06 22:47:40
|
On Thu, Mar 04, 2004 at 11:08:36PM -0500, Daniel Gruner wrote: > Erik, > > Thanks for the comments. I will do my best to work on the testing, > but I have another question: Have you tried either clustermatic > 3 or 4 with a machine like mine? They are UX (ruffian) boards, made > by Samsung, with EV56 @600 MHz. I haven't. I only have Compaq ES40s, API CS20s. There might be the occasional DS10 kicking around too. These are all tsunami variants. - Erik |
From: Steven J. <py...@li...> - 2004-03-06 00:18:05
|
Greetings, I've been working on an app that uses the fcntls F_SETOWN, F_ASYNC, F_SETSIG, and friends on TCP sockets to handle async events. The program works fine on the master or between unrelated machines, but on a bproc slave, the signals are not delivered. Digging through the code, I believe I have the reason but want to sanity check my finding: 1. ghost processes are only on the master. 2. getpid (from userspace) returns the PID as seen by the master (the pid of the ghost). So when fcntl( sock, F_SETOWN, getpid()), I set the ghost processes pid into filep->f_owner.pid When data arrives, send_sigio in fs/fcntl.c does (on the slave): 435 void send_sigio(struct fown_struct *fown, int fd, int band) 436 { 437 struct task_struct * p; 438 int pid = fown->pid; 439 440 read_lock(&tasklist_lock); 441 if ( (pid > 0) && (p = find_task_by_pid(pid)) ) { 442 send_sigio_to_task(p, fown, fd, band); 443 goto out; 444 } 445 for_each_task(p) { 446 int match = p->pid; 447 if (pid < 0) 448 match = -p->pgrp; 449 if (pid != match) 450 continue; 451 send_sigio_to_task(p, fown, fd, band); 452 } 453 out: 454 read_unlock(&tasklist_lock); 455 } And since pid isn't there, it does nothing at all. The easy answer would be to hack do_fcntl to translate the pid passed in to the real pid on the slave. However, I'm not certain that that wouldn't be just a band-aid on a more general issue. Thoughts? G'day, sjames -------------------------steven james, director of research, linux labs ... ........ ..... .... 230 peachtree st nw ste 2701 the original linux labs atlanta.ga.us 30303 -since 1995 http://www.linuxlabs.com office 404.577.7747 fax 404.577.7743 ----------------------------------------------------------------------- |
From: Daniel G. <dg...@ti...> - 2004-03-05 04:15:27
|
Erik, Thanks for the comments. I will do my best to work on the testing, but I have another question: Have you tried either clustermatic 3 or 4 with a machine like mine? They are UX (ruffian) boards, made by Samsung, with EV56 @600 MHz. Perhaps somebody else on the list has tried these? Thanks, Daniel On Thu, Mar 04, 2004 at 06:13:15PM -0700, er...@he... wrote: > On Thu, Mar 04, 2004 at 02:16:05PM -0500, Daniel Gruner wrote: > > > Anyway, a few things you can do to try to figure out that's what's > > > going on... (I tried this on our alphas real quick and it doesn't > > > seem to be happening here.) > > > > > > - Comment out this stuff in the slave daemon: > > > > > > /* bump our priority to RT to avoid getting hosed by errant > > > * stuff that gets run on our node */ > > > p.sched_priority = 1; > > > if (sched_setscheduler(0, SCHED_FIFO, &p)) > > > syslog(LOG_NOTICE, "Failed to set real-time scheduling for" > > > " slave daemon.\n"); > > > > > > and rebuild the slave daemon. (also reinstall libbpslave.a and > > > rebuild the phase 2 boot image if you're using the rest of > > > clustermatic) > > > > I will try, if I get some time... > > > > > > > > - bpsh other stuff (e.g. uptime) to the node while this job is > > > running but before the node dies. Is it responsive or is it just > > > completely dead? Is it really slow? > > > > Nothing else runs. "bpsh 1 uptime" just hangs. The process table on the > > master does not get updated, and the waster does not appear on "top" at > > all. The node dies (is killed, actually, because it times out) although the > > waster job manages to finish. I guess even the bpctl does not get through... > > Well, it sounds like the the starvation of the slave daemon is more or > less complete in that case. It's pretty much got to be a scheduler > problem. If commenting out the bit of code above and this tid bit in > kernel/slave.c fixes it, then that will confirm it. > > /* Knock our priority back to default */ > current->policy = SCHED_OTHER; > current->nice = DEF_NICE; > current->rt_priority = 0; > > If that fixes it I'm pretty sure that either the kernel you're using > has some scheduler related patch in it or my code is subtly buggy in a > way that doesn't crop up on the kernels that I've built. > > If you're running as root, you can also try to do > sched_setscheduler(0, SCHED_OTHER, ...) in your program to try and > confirm it. > > > > - Finally, if everything seems to be working but slow or something > > > like that you can up the ping timeout by adding a line like this > > > to your /etc/beowulf/config. > > > > > > pingtimeout 120 > > > > > > 120 is the timeout in seconds. The default is 30. > > > > Not a solution yet... > > > > Could it be something to do with the network driver? I am using eepro100 > > cards. Here is the list of loaded modules on the nodes: > > I seriously doubt it. You're still getting output from the remote > process which means the network and TCP are all still working fine. > > - Erik -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: <er...@he...> - 2004-03-05 01:40:08
|
On Thu, Mar 04, 2004 at 02:16:05PM -0500, Daniel Gruner wrote: > > Anyway, a few things you can do to try to figure out that's what's > > going on... (I tried this on our alphas real quick and it doesn't > > seem to be happening here.) > > > > - Comment out this stuff in the slave daemon: > > > > /* bump our priority to RT to avoid getting hosed by errant > > * stuff that gets run on our node */ > > p.sched_priority = 1; > > if (sched_setscheduler(0, SCHED_FIFO, &p)) > > syslog(LOG_NOTICE, "Failed to set real-time scheduling for" > > " slave daemon.\n"); > > > > and rebuild the slave daemon. (also reinstall libbpslave.a and > > rebuild the phase 2 boot image if you're using the rest of > > clustermatic) > > I will try, if I get some time... > > > > > - bpsh other stuff (e.g. uptime) to the node while this job is > > running but before the node dies. Is it responsive or is it just > > completely dead? Is it really slow? > > Nothing else runs. "bpsh 1 uptime" just hangs. The process table on the > master does not get updated, and the waster does not appear on "top" at > all. The node dies (is killed, actually, because it times out) although the > waster job manages to finish. I guess even the bpctl does not get through... Well, it sounds like the the starvation of the slave daemon is more or less complete in that case. It's pretty much got to be a scheduler problem. If commenting out the bit of code above and this tid bit in kernel/slave.c fixes it, then that will confirm it. /* Knock our priority back to default */ current->policy = SCHED_OTHER; current->nice = DEF_NICE; current->rt_priority = 0; If that fixes it I'm pretty sure that either the kernel you're using has some scheduler related patch in it or my code is subtly buggy in a way that doesn't crop up on the kernels that I've built. If you're running as root, you can also try to do sched_setscheduler(0, SCHED_OTHER, ...) in your program to try and confirm it. > > - Finally, if everything seems to be working but slow or something > > like that you can up the ping timeout by adding a line like this > > to your /etc/beowulf/config. > > > > pingtimeout 120 > > > > 120 is the timeout in seconds. The default is 30. > > Not a solution yet... > > Could it be something to do with the network driver? I am using eepro100 > cards. Here is the list of loaded modules on the nodes: I seriously doubt it. You're still getting output from the remote process which means the network and TCP are all still working fine. - Erik |
From: Jim P. <ji...@ks...> - 2004-03-04 23:37:46
|
Hi, Can anyone explain (and tell me how to eliminate) this difference between a normal Red Hat 9 machine: jim@belfast>ldd /bin/ls libtermcap.so.2 => /lib/libtermcap.so.2 (0x40026000) libc.so.6 => /lib/tls/libc.so.6 (0x42000000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) and a Clustermatic 4 machine: jim@delhi>ldd /bin/ls libtermcap.so.2 => /lib/libtermcap.so.2 (0x40027000) libc.so.6 => /lib/libc.so.6 (0x4002b000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) Why does Clustermatic ignore the /lib/tls library versions? This doesn't seem to be a ldconfig or LD_LIBRARY_PATH option. For some reason, /lib/libpthread.so.0 instead of /lib/tls/libpthread.so.0 seems hard wired. Is there a way to change this? -Jim |
From: Daniel G. <dg...@ti...> - 2004-03-04 19:22:25
|
Thanks for the quick response, Erik. On Thu, Mar 04, 2004 at 10:51:09AM -0700, er...@he... wrote: > On Thu, Mar 04, 2004 at 11:54:36AM -0500, Daniel Gruner wrote: > > Hi > > > > I have experienced a strange phenomenon on my alpha cluster. It is running > > bproc 3.2.6, on alpha UX machines. For the most part the cluster behaves > > quite normally, allowing me to run jobs, and all the normal stuff. > > > > However, I am testing a fairly short, highly cpu-intensive job, simply to > > have a way of submitting many jobs using bjs and learn its functioning, > > and it appears that the job is so cpu-intensive that the node appears > > dead to the master (i.e. it does not respond or something like that), > > and it dies before the job is completed. Well, actually the job manages > > to complete, but the node is reset anyway. If I do "ps" on the master I > > don't see the job (actually, it says it has not used up any time), nor do > > I see it appear in "top". Is it possible that the node gets TOO busy with > > the computation? I append two files: The program itself (waster.cpp), > > and its output (junk). The command line to run the program was: > > > > bpsh 1 -I /dev/null ./waster > & junk & > > > > The program ends with: > > [1] Exit 255 bpsh 1 -I /dev/null ./waster >& junk > > and the node is reset. From the /var/log/messages file I get: > > Mar 4 11:29:57 racaille bpmaster: ping timeout on slave 1 > > > > It looks like the node is too busy computing to even respond to pings... > > That shouldn't be possible but maybe something is going wrong with > priorities or something. The slave daemon is supposed to run with an > elevated priority to avoid these starvation issues. I saw this kind > of behavior once when somebody decided to start 1500 cpu intensive > processes on a slave node once. In any case, sharing with one other > process shouldn't be a problem. A possible problem could come up if > the slave daemon failed to reset priorities for the child processes it > created. I'm not seeing this problem on our systems here so I suspect > that that's not it. > > First a few questions: > > What kernel version? > Are you starting with a kernel.org kernel? > Any other patches other than BProc? > How many cpus? 2.4.18-27.7hdbp.ux.0 It is basically a stock kernel that has been patched for BProc, and made to run on the UX (ruffian) board. These are single cpu machines, with EV56 at 600 MHz. Here is the list of packages installed: beoboot-cm.1.5-hddcs.2.alpha.rpm beoboot-modules-cm.1.5-22.4.18_27.7hdbp.ux.0.alpha.rpm beonss-1.0.12-lanl.2.1.alpha.rpm bjs-1.2-hd.3.alpha.rpm bproc-3.2.6-hddcs.2.alpha.rpm bproc-devel-3.2.6-hddcs.2.alpha.rpm bproc-libs-3.2.6-hddcs.2.alpha.rpm bproc-modules-3.2.6-2.k2.4.18_27.7hdbp.ux.0.alpha.rpm cmtools-1.1-1.alpha.rpm cmtools-devel-1.1-1.alpha.rpm kernel-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-beoboot-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-doc-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-source-2.4.18-27.7hdbp.ux.0.alpha.rpm supermon-1.4-hddcs.3.alpha.rpm supermon-modules-1.4-3.k2.4.18_27.7hdbp.ux.0.alpha.rpm It was built by the good folks at HardData (Michal Jaegermann). > > Anyway, a few things you can do to try to figure out that's what's > going on... (I tried this on our alphas real quick and it doesn't > seem to be happening here.) > > - Comment out this stuff in the slave daemon: > > /* bump our priority to RT to avoid getting hosed by errant > * stuff that gets run on our node */ > p.sched_priority = 1; > if (sched_setscheduler(0, SCHED_FIFO, &p)) > syslog(LOG_NOTICE, "Failed to set real-time scheduling for" > " slave daemon.\n"); > > and rebuild the slave daemon. (also reinstall libbpslave.a and > rebuild the phase 2 boot image if you're using the rest of > clustermatic) I will try, if I get some time... > > - bpsh other stuff (e.g. uptime) to the node while this job is > running but before the node dies. Is it responsive or is it just > completely dead? Is it really slow? Nothing else runs. "bpsh 1 uptime" just hangs. The process table on the master does not get updated, and the waster does not appear on "top" at all. The node dies (is killed, actually, because it times out) although the waster job manages to finish. I guess even the bpctl does not get through... > > - run something else alongside the waster that does something like: > > while(1) { printf("hi\n"); fflush(stdout); sleep(1); } > > does it get starved and stop printing? In this case both jobs seem to run, but the node dies anyway, since at least the network stuff does not respond. Even when the job that just prints "hi" is running alone on the node I don't see it updating the process table. You'd think it would work if only because it spends most of its time sleep()ing... > > - Finally, if everything seems to be working but slow or something > like that you can up the ping timeout by adding a line like this > to your /etc/beowulf/config. > > pingtimeout 120 > > 120 is the timeout in seconds. The default is 30. Not a solution yet... Could it be something to do with the network driver? I am using eepro100 cards. Here is the list of loaded modules on the nodes: racaille:dgruner{116}> bpsh 1 /sbin/lsmod Module Size Used by Not tainted vfat 18816 0 (unused) fat 52176 0 [vfat] ext3 94152 0 (unused) jbd 74840 0 [ext3] nfs 140328 2 lockd 84488 0 [nfs] sunrpc 110736 1 [nfs lockd] sym53c8xx_2 101438 0 (unused) de4x5 66452 0 (unused) eepro100 29688 1 bproc 99736 2 vmadump 19320 0 [bproc] Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: <er...@he...> - 2004-03-04 18:17:48
|
On Thu, Mar 04, 2004 at 11:54:36AM -0500, Daniel Gruner wrote: > Hi > > I have experienced a strange phenomenon on my alpha cluster. It is running > bproc 3.2.6, on alpha UX machines. For the most part the cluster behaves > quite normally, allowing me to run jobs, and all the normal stuff. > > However, I am testing a fairly short, highly cpu-intensive job, simply to > have a way of submitting many jobs using bjs and learn its functioning, > and it appears that the job is so cpu-intensive that the node appears > dead to the master (i.e. it does not respond or something like that), > and it dies before the job is completed. Well, actually the job manages > to complete, but the node is reset anyway. If I do "ps" on the master I > don't see the job (actually, it says it has not used up any time), nor do > I see it appear in "top". Is it possible that the node gets TOO busy with > the computation? I append two files: The program itself (waster.cpp), > and its output (junk). The command line to run the program was: > > bpsh 1 -I /dev/null ./waster > & junk & > > The program ends with: > [1] Exit 255 bpsh 1 -I /dev/null ./waster >& junk > and the node is reset. From the /var/log/messages file I get: > Mar 4 11:29:57 racaille bpmaster: ping timeout on slave 1 > > It looks like the node is too busy computing to even respond to pings... That shouldn't be possible but maybe something is going wrong with priorities or something. The slave daemon is supposed to run with an elevated priority to avoid these starvation issues. I saw this kind of behavior once when somebody decided to start 1500 cpu intensive processes on a slave node once. In any case, sharing with one other process shouldn't be a problem. A possible problem could come up if the slave daemon failed to reset priorities for the child processes it created. I'm not seeing this problem on our systems here so I suspect that that's not it. First a few questions: What kernel version? Are you starting with a kernel.org kernel? Any other patches other than BProc? How many cpus? Anyway, a few things you can do to try to figure out that's what's going on... (I tried this on our alphas real quick and it doesn't seem to be happening here.) - Comment out this stuff in the slave daemon: /* bump our priority to RT to avoid getting hosed by errant * stuff that gets run on our node */ p.sched_priority = 1; if (sched_setscheduler(0, SCHED_FIFO, &p)) syslog(LOG_NOTICE, "Failed to set real-time scheduling for" " slave daemon.\n"); and rebuild the slave daemon. (also reinstall libbpslave.a and rebuild the phase 2 boot image if you're using the rest of clustermatic) - bpsh other stuff (e.g. uptime) to the node while this job is running but before the node dies. Is it responsive or is it just completely dead? Is it really slow? - run something else alongside the waster that does something like: while(1) { printf("hi\n"); fflush(stdout); sleep(1); } does it get starved and stop printing? - Finally, if everything seems to be working but slow or something like that you can up the ping timeout by adding a line like this to your /etc/beowulf/config. pingtimeout 120 120 is the timeout in seconds. The default is 30. - Erik |
From: Daniel G. <dg...@ti...> - 2004-03-04 17:00:59
|
Hi I have experienced a strange phenomenon on my alpha cluster. It is running bproc 3.2.6, on alpha UX machines. For the most part the cluster behaves quite normally, allowing me to run jobs, and all the normal stuff. However, I am testing a fairly short, highly cpu-intensive job, simply to have a way of submitting many jobs using bjs and learn its functioning, and it appears that the job is so cpu-intensive that the node appears dead to the master (i.e. it does not respond or something like that), and it dies before the job is completed. Well, actually the job manages to complete, but the node is reset anyway. If I do "ps" on the master I don't see the job (actually, it says it has not used up any time), nor do I see it appear in "top". Is it possible that the node gets TOO busy with the computation? I append two files: The program itself (waster.cpp), and its output (junk). The command line to run the program was: bpsh 1 -I /dev/null ./waster > & junk & The program ends with: [1] Exit 255 bpsh 1 -I /dev/null ./waster >& junk and the node is reset. From the /var/log/messages file I get: Mar 4 11:29:57 racaille bpmaster: ping timeout on slave 1 It looks like the node is too busy computing to even respond to pings... Any hints at what may be happening are welcome. I attach my /etc/beowulf/config file, for completeness. Thanks, Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |
From: Sanford M. <bh...@ya...> - 2004-02-28 04:26:42
|
GET YOUR UNIVERSITY DIPLOMA There are no required tests, classes, books, or interviews! Get a Bachelors, Masters, MBA, and Doctorate (PhD) diploma! Receive the benefits and admiration that comes with a diploma! No one is turned down! Call Today 1-248-927-0446 (7 days a week) Confidentiality assured! out http://buynow.meds34r.com/host/emailremove.asp statuary effie indian macbeth balletic cinematic usps washboard viii amphi= oxis despoil foothill menzies dwyer implosion=20 trichrome ppm ilq rj impelraritan diacritical doubleheader amidst classify airmail animosity tu= nic=20=20 |
From: Gregory S. <gr...@ai...> - 2004-02-27 19:58:28
|
Hi all, Before I start with specific questions, I wanted to try this: is there any comprehensive,step-by-step how-to on building the bproc+beoboot system from scratch (I mean _really_ from scratch - i.e. starting from how to get a kernel source, patch it and compile it). I am running Debian, but I may be able to "translate" RPM-oriented instructions. Thanks! -- Greg Shakhnarovich AI Lab, MIT NE43-V611 Cambridge, MA 02139 tel (617) 253-8170 fax (617) 258-6287 |
From: <er...@he...> - 2004-02-23 22:25:59
|
On Mon, Feb 23, 2004 at 01:24:44PM -0800, Dale Harris wrote: > On Mon, Feb 23, 2004 at 09:22:47AM -0700, er...@he... elucidated: > > > > Hehe. You've discovered the wacky shell script hack that I put in a > > while ago. The problem with shell scripts is that then the kernel > > sees '#!/bin/sh' in a script called X it actually does > > execve("/bin/sh", "X", 0). X isn't going to exist on the nodes most > > of the time. This is true in my world anyway. BProc, in an attempt > > to be tricky and get around this, puts the script in the process's > > memory space and then gives you this file descriptor on fd 3 that > > magically just reads the file from your own memory space. This made a > > few perl users very happy a while ago. > > > > I wonder if that is what was getting in the way of PVM on 4.0? That was > the file it would complain about. Hrm. Could be.... Most of those things (PVM, MPI, etc.) tend to be very script-centric... - Erik |
From: Greg W. <g.w...@co...> - 2004-02-23 21:36:55
|
Could be, if PVM was trying to run a script... Greg On 23/02/2004, at 2:24 PM, Dale Harris wrote: > On Mon, Feb 23, 2004 at 09:22:47AM -0700, er...@he... elucidated: >> >> Hehe. You've discovered the wacky shell script hack that I put in a >> while ago. The problem with shell scripts is that then the kernel >> sees '#!/bin/sh' in a script called X it actually does >> execve("/bin/sh", "X", 0). X isn't going to exist on the nodes most >> of the time. This is true in my world anyway. BProc, in an attempt >> to be tricky and get around this, puts the script in the process's >> memory space and then gives you this file descriptor on fd 3 that >> magically just reads the file from your own memory space. This made a >> few perl users very happy a while ago. >> > > I wonder if that is what was getting in the way of PVM on 4.0? That > was > the file it would complain about. > > > -- > Dale Harris > ro...@ma... > /.-) > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > BProc-users mailing list > BPr...@li... > https://lists.sourceforge.net/lists/listinfo/bproc-users > |
From: Dale H. <ro...@ma...> - 2004-02-23 21:25:02
|
On Mon, Feb 23, 2004 at 09:22:47AM -0700, er...@he... elucidated: > > Hehe. You've discovered the wacky shell script hack that I put in a > while ago. The problem with shell scripts is that then the kernel > sees '#!/bin/sh' in a script called X it actually does > execve("/bin/sh", "X", 0). X isn't going to exist on the nodes most > of the time. This is true in my world anyway. BProc, in an attempt > to be tricky and get around this, puts the script in the process's > memory space and then gives you this file descriptor on fd 3 that > magically just reads the file from your own memory space. This made a > few perl users very happy a while ago. > I wonder if that is what was getting in the way of PVM on 4.0? That was the file it would complain about. -- Dale Harris ro...@ma... /.-) |
From: <er...@he...> - 2004-02-23 20:03:53
|
On Mon, Feb 23, 2004 at 12:42:43AM +0100, J.A. Magallon wrote: > Hi all... > > I have a problem when launcing remote scripts with bpsh. > I have a soft that I build with different compilers, and stored like > > /opt/aleph/bin/tst Shell script > /opt/aleph/bin/linux-gcc/tst Binary > /opt/aleph/bin/linux-icc/tst Binary > /opt/aleph/bin/default -> linux-gcc Deafult build I want to use > > The script does something like: > > #!/bin/bash > > /opt/aleph/bin/default/$(basename $0) > > Just try this: > > #!/bin/bash > > echo $0 > > The result is: > > annwn:~> bpsh 0 tst > /proc/self/fd/3 > > Uh ? > Something is missing in rfork/rexec to set up properly script names ? Hehe. You've discovered the wacky shell script hack that I put in a while ago. The problem with shell scripts is that then the kernel sees '#!/bin/sh' in a script called X it actually does execve("/bin/sh", "X", 0). X isn't going to exist on the nodes most of the time. This is true in my world anyway. BProc, in an attempt to be tricky and get around this, puts the script in the process's memory space and then gives you this file descriptor on fd 3 that magically just reads the file from your own memory space. This made a few perl users very happy a while ago. If you don't want this hack, put a zero in /proc/sys/bproc/shell_hack. - Erik |