bproc-users Mailing List for BProc: Beowulf Distributed Process Space (Page 28)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

For the love of Pete -- sorry bproc'ers -- I posted to the wrong list :(

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Wed, 2003-07-09 at 11:26, Thomas Clausen wrote:
> Hi,
> 
> root@betty:~# telnet localhost 2709
> Trying 127.0.0.1...
> Connected to here.
> Escape character is '^]'.
> S
> (
> )
> ^]
> telnet> quit
> Connection closed.
> root@betty:~#
> 
> So supermon reports nothing. Hmm.

Oops... bug on my part. Can you try the attached patch for
lib/python/resource_manager/BprocSupermon.py? You will need to re-run
'python2.2 setup.py install' again.

I was taking empty data to mean bad data, but that is not always the
case -- especially when there are no nodes :/ 
If this fixes it, I will cut 0.5b81 with the fixes.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Wed, Jul 02, 2003 at 12:34:12PM -0400, Nicholas Henke wrote:
> On Tue, 2003-07-01 at 18:39, er...@he... wrote:
> > 
> > P.S.  I've attached a quick port of the 3.2.3 patch to 2.4.20.  I
> > think it should work.
> 
> Same S@#$t, Different Kernel.

Nic:

I have a hunch about what might be going on here.  There's some
potential for badness in exit_notify with BProc.  kill_pg and
is_orphaned_pgrp might end up setting the process state back to
RUNNING instead of ZOMBIE.  Then they could get hung up because the
ghost is gone because it's already exited.

I've attached a revised patch which I think should fix that.  Can you
try it an see if it helps?

- Erik

On Thu, 2003-07-03 at 10:34, Thomas Clausen wrote:
> Hi all,
> 
> I'm trying to compile bproc 3.2.5 using kernel 2.4.20. I had this up and
> running, then patched my kernel with the clubmask linux-2.4.17-avenrun.patch
> and linux-2.4.19-mem-swap-syms.patch patches. Now I get the following
> unresolved symbols:
> 
> root@betty~ modprobe bproc
> /lib/modules/2.4.20/bproc/bproc.o: unresolved symbol irq_stat_Rsmp_6b40ff0b
> /lib/modules/2.4.20/bproc/bproc.o: insmod /lib/modules/2.4.20/bproc/bproc.o
> failed
> /lib/modules/2.4.20/bproc/bproc.o: insmod bproc failed

Did you recompile the bproc module after the patches ? It looks like it
may just be a modversion problem.

Applogies if that is the fix -- I should have placed that in the docs.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

Hi all,

I'm trying to compile bproc 3.2.5 using kernel 2.4.20. I had this up and
running, then patched my kernel with the clubmask linux-2.4.17-avenrun.patch
and linux-2.4.19-mem-swap-syms.patch patches. Now I get the following
unresolved symbols:

root@betty~ modprobe bproc
/lib/modules/2.4.20/bproc/bproc.o: unresolved symbol irq_stat_Rsmp_6b40ff0b
/lib/modules/2.4.20/bproc/bproc.o: insmod /lib/modules/2.4.20/bproc/bproc.o
failed
/lib/modules/2.4.20/bproc/bproc.o: insmod bproc failed
root@betty~ 

Any help is appreciated.

Thanks, Thomas

-- 
   .^.    Thomas Clausen, post doc
   /V\    Physics Department, Wesleyan University, CT
  // \\   Tel 860-685-2018, fax 860-685-2031
 /(   )\  
  ^^-^^   Use Linux

On Tue, 2003-07-01 at 18:39, er...@he... wrote:
> 
> P.S.  I've attached a quick port of the 3.2.3 patch to 2.4.20.  I
> think it should work.

Same S@#$t, Different Kernel.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Tue, 2003-07-01 at 18:39, er...@he... wrote:
> I think user land back-traces are probably useless since this is some
> kind of weird kernel-land problem - and the judging by the message
> traces you've sent me before, the procs are getting caught somewhere
> in exit (i.e. signal received and *trying* to exit).

Ahhh.. that would make sense. 

> 
> It doesn't look like much changed to me between 2.4.18 and 2.4.19 but
> some of the process tree handling code in exit code did.  The examples
> you sent me a while back all show several threads/processes being
> killed at once.  I have a sneaking suspicion that this is somehow a
> race related to many things exiting and getting re-parented at the
> same time.

Ew -- and that is my official opinion of that.

> 
> I have no idea how that's getting hung up but maybe we can determine
> if it's really such a race or not.  To make a long story short, can
> you try the following:

Sure -- I have attached a text file with the results -- slightly more
readable than limiting it to 80 chars in email.
> 
> Kill the threads one at time and see if they still get hung up in that
> weird state.  A half a second in between kills should be more than
> enough.  Then maybe bottom up or top->bottom might be interesting.

Basically -- top->bottom : screwed. bottom->top+sleep: ok,
bottom->top+nosleep: screwed.

> 
> I appoligize if I've ased this before: When the threads are hung, does
> the system seem healthy otherwise?  Specifically, no problems creating
> or killing other processes?

Yes it does -- I have no problems ssh'ing or bpsh'ing in and running
anything.

> 
> P.S.  I've attached a quick port of the 3.2.3 patch to 2.4.20.  I
> think it should work.

Thanks! I will see what this produces as well.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Tue, Jul 01, 2003 at 04:03:28PM -0400, Nicholas Henke wrote:
> Ok -- So I have managed to find the change in versions that isolates the
> problem, unfortuneately, it is a kernel version change that triggers it,
> not a bproc one. 
> 
> FYI -- The working combination is 2.4.18 patched for bproc 3.2.3 -- I
> used the diff in the patches to backport the 2.4.19 patch for 3.2.3 to
> 2.4.18
> 
> The 'bad' combination is 2.4.19 with bproc 3.2.3. 
> 
> So, the behavior that I am seeing now, is that a program is bpsh'd to a
> node, where it uses pthreads to create a few threads to do the work. At
> some point, the threads hang, and it takes a 'kill -9' to kill them.
> Most of the time this will work, but I have noticed that I will have to
> go to the node and 'kill -9' them there for the process to die all of
> the way, if not, and I kill -9 from the fron-end, the processes will be
> removed from the front-end ps output, but when I ssh to the remote node,
> it is still there, and needs another kill -9 to kill it. There is also
> the case where the process on the remote node just refuses to die --
> kill -9 will not pull it out of whereever it is stuck.
> 
> What else can I provide ? Would it be possible to get a patch for bproc
> 3.2.3 for kernel 2.4.20 to see if I get the same behavior there ?
> 
> Here is a traceback for when the threads hang.This is the same traceback
> as when the process ignores the kill -9.

I think user land back-traces are probably useless since this is some
kind of weird kernel-land problem - and the judging by the message
traces you've sent me before, the procs are getting caught somewhere
in exit (i.e. signal received and *trying* to exit).

It doesn't look like much changed to me between 2.4.18 and 2.4.19 but
some of the process tree handling code in exit code did.  The examples
you sent me a while back all show several threads/processes being
killed at once.  I have a sneaking suspicion that this is somehow a
race related to many things exiting and getting re-parented at the
same time.

I have no idea how that's getting hung up but maybe we can determine
if it's really such a race or not.  To make a long story short, can
you try the following:

Kill the threads one at time and see if they still get hung up in that
weird state.  A half a second in between kills should be more than
enough.  Then maybe bottom up or top->bottom might be interesting.

I appoligize if I've ased this before: When the threads are hung, does
the system seem healthy otherwise?  Specifically, no problems creating
or killing other processes?

- Erik

P.S.  I've attached a quick port of the 3.2.3 patch to 2.4.20.  I
think it should work.

On Tue, 2003-07-01 at 11:05, er...@he... wrote:
> 
> I believe clone works.  Most of the interesting stuff with clone is
> local to the node and BProc doesn't get involved at all.  So, in
> theory, it should be possible to make Java work.

Ok

> 
> I think there are two things which you are likely to have trouble with:
> 
> 1 - Some of the thread group stuff (CLONE_THREAD) may not work.  This
>     stuff has been kind of fluid in the 2.4.x kernels so it seems
>     unlikely that many things use it.

Why does it seem likely that Java uses it then --- friggin' Java!

> 
> 2 - You cannot migrate a multi-threaded task.  Some of the guys at LBL
>     are working on some extensions to VMADump to do handle
>     multi-threaded tasks for some checkpointing work they're doing but
>     none of this has been combined with BProc at this point.  BProc
>     would also have to become aware of these situations.

That would be very cool.

> 
>     Migration will end up creating copies of the program.  Also, on
>     x86, vmadump isn't aware of funky LDT stuff which will also hamper
>     migration.  Note that this doesn't mean you can't bpsh a
>     multi-threaded program.
> 
> The other possible funny bit that you're likely to run into is that
> fork/clone is much slower than normal because it involves the front
> end.  This could lead to new/interesting races or just poor
> performance in apps that create/clean-up threads a lot.
> 
> In terms of what needs to be done, that depends entirely on what
> you're trying to run.  I've done some simple pthreads things on nodes
> w/o problems.  The first place to look is probably strace output of a
> program that fails.  Then try and figure out how what the app is
> seeing differs from what it's expecting.

We have several programs that use pthreads here as well -- and they seem
to run fine ( apart from the sigsuspend issue in 2.4.19 ), it is just
that java seems a bit confused -- I will put together a complete bug
report, and I guess we can go from there.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

Ok -- So I have managed to find the change in versions that isolates the
problem, unfortuneately, it is a kernel version change that triggers it,
not a bproc one. 

FYI -- The working combination is 2.4.18 patched for bproc 3.2.3 -- I
used the diff in the patches to backport the 2.4.19 patch for 3.2.3 to
2.4.18

The 'bad' combination is 2.4.19 with bproc 3.2.3. 

So, the behavior that I am seeing now, is that a program is bpsh'd to a
node, where it uses pthreads to create a few threads to do the work. At
some point, the threads hang, and it takes a 'kill -9' to kill them.
Most of the time this will work, but I have noticed that I will have to
go to the node and 'kill -9' them there for the process to die all of
the way, if not, and I kill -9 from the fron-end, the processes will be
removed from the front-end ps output, but when I ssh to the remote node,
it is still there, and needs another kill -9 to kill it. There is also
the case where the process on the remote node just refuses to die --
kill -9 will not pull it out of whereever it is stuck.

What else can I provide ? Would it be possible to get a patch for bproc
3.2.3 for kernel 2.4.20 to see if I get the same behavior there ?

Here is a traceback for when the threads hang.This is the same traceback
as when the process ignores the kill -9.

[root@test6 root]# gdb
genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast
17154
GNU gdb Red Hat Linux (5.2-2)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as
"i386-redhat-linux"...genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast: No such file or directory.

Attaching to process 17154
Reading symbols from
/mnt/io1/genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast...done.
Reading symbols from /lib/i686/libm.so.6...done.[henken@test6 henken]$
ps -jxf
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
17156 17157 17157 17157 pts/0    17200 S    27659   0:00 -bash
17157 17200 17200 17157 pts/0    17200 R    27659   0:00 ps -jxf
  568 17024   568   568 ?           -1 S    27659   0:00 /bin/sh
/proc/self/fd/3 /scratch/user/henken/slot_1/result
/genomics/share/testsuite/tests/blastSim
17024 17025   568   568 ?           -1 S    27659   0:00 /usr/bin/perl
/genomics/share/testsuite/test_software/gus/gushome_06-05-03/bin/blastSimilarity --bl
17025 17151   568   568 ?           -1 S    27659   0:00  \_ sh -c
/genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast -d
/scratch/user/he
17151 17152   568   568 ?           -1 S    27659   0:00      \_
/genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast -d
/scratch/user/henk
17152 17153   568   568 ?           -1 S    27659   0:00          \_
/genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast -d
/scratch/user/
17153 17154   568   568 ?           -1 S    27659   0:00              \_
/genomics/share/testsuite/test_software/ncbiblast_2000-10-31/rpsblast -d
/scratch/u
[henken@test6 henken]$ strace -p 17154
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
[henken@test6 henken]$ su -
Password:
[root@test6 root]# strace -p 17154
[root@test6 root]# strace -p 17153
getppid()                               = 511
poll([{fd=7, events=POLLIN}], 1, 2000)  = 0
getppid()                               = 511
poll( <unfinished ...>
[root@test6 root]# strace -p 17154
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libpthread.so.0...done.
[New Thread 1024 (LWP 511)]
Error while reading shared library symbols:
Can't attach LWP 511: No such process
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x40080bb5 in __sigsuspend (set=0x597697bc) at
../sysdeps/unix/sysv/linux/sigsuspend.c:45
45      ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or
directory.
        in ../sysdeps/unix/sysv/linux/sigsuspend.c
(gdb) bt
#0  0x40080bb5 in __sigsuspend (set=0x597697bc) at
../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x400461d9 in __pthread_wait_for_restart_signal (self=0x59769be0) at
pthread.c:971
#2  0x40047f49 in __pthread_alt_lock (lock=0x8297ab0, self=0x0) at
restart.h:34
#3  0x40044d26 in __pthread_mutex_lock (mutex=0x8297aa0) at mutex.c:120
#4  0x0804b7aa in s_MutexLock ()
#5  0x0804b83d in NlmMutexLockEx ()
#6  0x0817794c in Nlm_GetAppParam ()
#7  0x0817583f in GetAppErrInfo ()
#8  0x08174ba1 in Nlm_ErrSetLogfile ()
#9  0x0804abfa in NlmThreadWrapper ()
#10 0x40043c6f in pthread_start_thread (arg=0x59769be0) at manager.c:284

-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Tue, Jul 01, 2003 at 09:31:23AM -0400, Nicholas Henke wrote:
> Hey Erik~
> 	I am again faced with pesky Java users who are wanting to use bpsh to
> farm out their tasks. I am running low on ammunition to kill them, so I
> figured I would take a stab at getting the 'clone' system call working
> in bproc. First -- is this going to be possible ? Second - can you give
> me a rough overview of what needs to be done ?

I believe clone works.  Most of the interesting stuff with clone is
local to the node and BProc doesn't get involved at all.  So, in
theory, it should be possible to make Java work.

I think there are two things which you are likely to have trouble with:

1 - Some of the thread group stuff (CLONE_THREAD) may not work.  This
    stuff has been kind of fluid in the 2.4.x kernels so it seems
    unlikely that many things use it.

2 - You cannot migrate a multi-threaded task.  Some of the guys at LBL
    are working on some extensions to VMADump to do handle
    multi-threaded tasks for some checkpointing work they're doing but
    none of this has been combined with BProc at this point.  BProc
    would also have to become aware of these situations.

    Migration will end up creating copies of the program.  Also, on
    x86, vmadump isn't aware of funky LDT stuff which will also hamper
    migration.  Note that this doesn't mean you can't bpsh a
    multi-threaded program.

The other possible funny bit that you're likely to run into is that
fork/clone is much slower than normal because it involves the front
end.  This could lead to new/interesting races or just poor
performance in apps that create/clean-up threads a lot.

In terms of what needs to be done, that depends entirely on what
you're trying to run.  I've done some simple pthreads things on nodes
w/o problems.  The first place to look is probably strace output of a
program that fails.  Then try and figure out how what the app is
seeing differs from what it's expecting.

- Erik

Hey Erik~
	I am again faced with pesky Java users who are wanting to use bpsh to
farm out their tasks. I am running low on ammunition to kill them, so I
figured I would take a stab at getting the 'clone' system call working
in bproc. First -- is this going to be possible ? Second - can you give
me a rough overview of what needs to be done ?

Thanks!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

> I have a simple python batch/queuing system that up until now has worked for
> me. I looked at sge+bproc - but as far as I can tell you have to manually
> reconfigure sge when nodes become unavailable. It can probably be set up to
> automatically recongnize cluster reconfigurations but it's not obvious to me
> how to do it.

It should be easy to do dynamic configuration of SGE when nodes become
available/unavailable. With my approach where node looks like a queue
on the master node one just have to call

  qmod -e $N

to enable the queue when node N becomas available, so this command
should probably go to the end of the bproc's node_up script (where
N=$1), and

  qmod -d $N

to disable the queue when node becomes unavailable. I am not sure
there is anything like node_down script in bproc (I thought there is
but I do not see it in my cluster just now); if it is, it shoud start
with "qmod -d $N". We could also test node's sanity in SGE's prolog
and epilog scripts (run before and after the job) and call "qmod -d
$N" there when needed. (Epilog script could even re-schedule the job
when node died while running the job, if the job is re-runnable.)

Another simple approach is to run script doing "bpstat" and then "qmod
-d ..." every 30 seconds or so (on the master).

If all the jobs are written as re-runnable (can be aborted at any
moment and run again on a different node, this usually means that the
job does not change any of its input files), it should be easy to
create a node-fault-tolerant system.

All this is untested, please let me know if you try it.

Best Regards

Vaclav Hanzl

LSF batch system has an integration on bproc. Other than serial job, it =
also supports parallel job in nature(mpirun). Additionally, it can =
handle with node unavailable issue automatically. No need to reconfig =
system. The mechanism is transparent to end user. From user point of =
view, when node is down, the batch system just decreases some available =
slots.=20

Chong

-----Original Message-----
From: Thomas Clausen [mailto:tcl...@we...]
Sent: Thursday, June 26, 2003 2:19 PM
To: bpr...@li...
Subject: [BProc] Re: is this a good candidate for bproc?

Hi Russell,

bproc works great. I use it for running batch jobs on a 90 cpu cluster =
with
20 dual CPU and the rest single CPU machines. I have an occasional mpi =
job
but mostly it's single processes.

I have a simple python batch/queuing system that up until now has worked =
for
me. I looked at sge+bproc - but as far as I can tell you have to =
manually
reconfigure sge when nodes become unavailable. It can probably be set up =
to
automatically recongnize cluster reconfigurations but it's not obvious =
to me
how to do it.

Thomas

> _______________________________________________
> Beowulf mailing list, Be...@be...
> To change your subscription (digest mode or unsubscribe) visit =
http://www.beowulf.org/mailman/listinfo/beowulf

--=20
   .^.    Thomas Clausen, post doc
   /V\    Physics Department, Wesleyan University, CT
  // \\   Tel 860-685-2018, fax 860-685-2031
 /(   )\ =20
  ^^-^^   Use Linux

-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
BProc-users mailing list
BPr...@li...
https://lists.sourceforge.net/lists/listinfo/bproc-users

Hi Russell,

bproc works great. I use it for running batch jobs on a 90 cpu cluster with
20 dual CPU and the rest single CPU machines. I have an occasional mpi job
but mostly it's single processes.

I have a simple python batch/queuing system that up until now has worked for
me. I looked at sge+bproc - but as far as I can tell you have to manually
reconfigure sge when nodes become unavailable. It can probably be set up to
automatically recongnize cluster reconfigurations but it's not obvious to me
how to do it.

Thomas

> _______________________________________________
> Beowulf mailing list, Be...@be...
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
   .^.    Thomas Clausen, post doc
   /V\    Physics Department, Wesleyan University, CT
  // \\   Tel 860-685-2018, fax 860-685-2031
 /(   )\  
  ^^-^^   Use Linux

On Thu, Jun 26, 2003 at 12:50:43PM -0400, Nicholas Henke wrote:
> When running vfork, it appears that only the first node in the nodelist
> has the stdio redirected correctly. The rest of the nodes output appears
> to go to /dev/null. Is this the expected behavior ?
> 
> BTW -- I am running 3.2.0

Umm... Yeah.  It's certainly a quirk and it should probably be fixed
but that's normal.  Here's what's going on:

Normally, when you move to a node, bproc will setup a socket
connection between the two processes to move the process information.
As a nice hack to provide basic support for printf, the socket is kept
around and attached to the process's STDOUT, STDERR.  This works out
because the socket connection is usually back to the front end where
bproc can do some mostly sane IO forwarding.  In the vrfork case, only
the first process gets the process image from the front end.  The rest
of the processes get their process image from one of the previous
proceses.  This adds parallelism which makes it go faster and blah
blah blah...  The upshot is that the sockets which were used for the
built-in forwarding don't go back to the front end anymore so it
doesn't work.

If you look at the bpsh source, you'll see that it provides explicit
instructions to vexecmove on how to wire up STDIN, STDOUT, STDERR.
bpsh itself becomes the IO forwarder in that case.  This makes things
like bpsh much more complicated than they might otherwise be.  On the
bright side, bpsh is a MUCH better IO forwarder than what's built into
BProc at this point.

I've been wanting to get rid fo the IO forwarding daemon in BProc
since the very first version.  It's one of those thing that's lingered
because it's a nice crutch which does an ok job simple prints.

- Erik

When running vfork, it appears that only the first node in the nodelist
has the stdio redirected correctly. The rest of the nodes output appears
to go to /dev/null. Is this the expected behavior ?

BTW -- I am running 3.2.0

Cheers!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

On Wed, Jun 25, 2003 at 09:41:43AM -0400, gor...@ph... wrote:
> Has anyone tried bproc on 2.4.21 yet?

The port should be straight forward.  It looks like they just renamed
get_empty_inode to new_inode.  I'm working the kinks out some memory
handling kinks of on a new patch now.

- Erik

On Wed, Jun 25, 2003 at 05:39:54PM -0400, Gregory Shakhnarovich elucidated:
> 
> Hi,
> 
> We are working on our new BProc-based cluster. The initial setup has been
> nice and smooth, but one major hole we still have is the scheduler/queuing
> system (which is quite important for the intended use of the cluster).
> 

FWIW... you might check out:

http://noel.feld.cvut.cz/magi/sge+bproc.html

SGE and bproc integrated.

--
Dale Harris   
ro...@ma...
/.-)

Hi,

I failed to mention a detail that may be relevant: we are running a Debian
2.4.18 kernel on our cluster.

Thanks,

--
Greg     Shakhnarovich
AI Lab, MIT  NE43-V611
Cambridge,  MA   02139
tel     (617) 253-8170
fax     (617) 258-6287

Hi,=20
        Just let you know. We have an LSF integration on bproc. It =
provides full LSF scheduler ability , including fairshare, =
preemption....=20

        Chong

-----Original Message-----
From: Gregory Shakhnarovich [mailto:gr...@ai...]
Sent: Wednesday, June 25, 2003 5:40 PM
To: bpr...@li...
Subject: [BProc] Queueing & scheduling

Hi,

We are working on our new BProc-based cluster. The initial setup has =
been
nice and smooth, but one major hole we still have is the =
scheduler/queuing
system (which is quite important for the intended use of the cluster).

I am aware of Clubmask, but as people here have pointed out, the need to
do full node install makes that solution very undesirable.  It looks =
like
the only (other) immediately available solution is BJS. So, I am trying =
to
figure out the following (and will appreciate any tips):

1) We have 32 dual-CPU nodes, and would like them to be treated as 64
nodes for scheduling purposes. How can this be conveyed to BJS?

2) How do we tell BJS not to include the head node in the pool? A =
related
question - what is the semantics of the indices in 'nodes' directive?

3) Has anyone implemented any policy modules in addition to 'simple' and
'shared', which could be shared with us?

4) Is there any way to introduce priorities with the existing policies?

What we want ideally is to have 2-3 priority levels (low/med/high) so =
that
the jobs get scheduled and suspended/restarted dynamically based on the
priority, in addition to node availability. I.e., if all the nodes are
taken by a job L with low priority, and a job H with high priority
arrives, then L is suspended until H is done.

(*ideally* there would be some anti-starvation mechanism as well, likely
upgrading L after it's been unfinished for a while, but for now we would
be happy without it.)

I will much appreciate any suggestions on how this could be accomplished
with Bproc.

Thanks,

--
Greg     Shakhnarovich
AI Lab, MIT  NE43-V611
Cambridge,  MA   02139
tel     (617) 253-8170
fax     (617) 258-6287

-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
BProc-users mailing list
BPr...@li...
https://lists.sourceforge.net/lists/listinfo/bproc-users

On Wed, 2003-06-25 at 17:39, Gregory Shakhnarovich wrote:
> Hi,
> 
> We are working on our new BProc-based cluster. The initial setup has been
> nice and smooth, but one major hole we still have is the scheduler/queuing
> system (which is quite important for the intended use of the cluster).
> 
> I am aware of Clubmask, but as people here have pointed out, the need to
> do full node install makes that solution very undesirable.  It looks like
> the only (other) immediately available solution is BJS. So, I am trying to
> figure out the following (and will appreciate any tips):

<clubmask author>
Not anymore. I am working on a release now that does away with Clubmask
as an entire cluster installation/managment/feed your dog solution.  I
think it would be pretty easy to put Clubmask on a Clustermatic cluster,
as clubmask is just a simple RPM now. The only requirements we have for
the nodes are that they run a custom mond ( from supermon ), which can
just be started from node_up. That said, the release is honestly a month
off, as I have a ton of documentation to write, but the software itself
is currently running & working fine on 3 separate clusters, and we are
installing the rest of our clusters with it during July.

I would be more than happy to try and get you running Clubmask on your
Clustermatic setup, and I will be working with a Clustermatic cluster
here in the near future. Feel free to email me if you would be willing
to put in a bit of leg work. The issues I see cropping up are:

1) Need to patch kernel with a few symbol exports to make supermon
happy. We can do without this, but you will not get the supermon2ganglia
translator functionality. ( Supermon2ganglia is a 'fake' gmond that
translates supermon data into ganglia XML so that you can view the data
using the standard Ganglia web interface. See
http://www.liniac.upenn.edu/ganglia for a live example.
This would be pretty easy, as we have all of the SRPMs and patches that
should be necessary.

2) Recompiling ZODB, IndexedCatalog, Clubmask, Python2.2.2, etc SRPMs
for your target platform. Not really an issue, but it would need to be
done.

3) Sanity checking -- well I guess this goes for any software.

Now that I am done with the scary stuff :P, Here are a few questions for
you:

1) Would you need ssh access or control to the nodes?
2) What platform would you be running on ? RH 9 ? 8 ?
3) timeframe ?

Cheers!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

Hi,

We are working on our new BProc-based cluster. The initial setup has been
nice and smooth, but one major hole we still have is the scheduler/queuing
system (which is quite important for the intended use of the cluster).

I am aware of Clubmask, but as people here have pointed out, the need to
do full node install makes that solution very undesirable.  It looks like
the only (other) immediately available solution is BJS. So, I am trying to
figure out the following (and will appreciate any tips):

1) We have 32 dual-CPU nodes, and would like them to be treated as 64
nodes for scheduling purposes. How can this be conveyed to BJS?

2) How do we tell BJS not to include the head node in the pool? A related
question - what is the semantics of the indices in 'nodes' directive?

3) Has anyone implemented any policy modules in addition to 'simple' and
'shared', which could be shared with us?

4) Is there any way to introduce priorities with the existing policies?

What we want ideally is to have 2-3 priority levels (low/med/high) so that
the jobs get scheduled and suspended/restarted dynamically based on the
priority, in addition to node availability. I.e., if all the nodes are
taken by a job L with low priority, and a job H with high priority
arrives, then L is suspended until H is done.

(*ideally* there would be some anti-starvation mechanism as well, likely
upgrading L after it's been unfinished for a while, but for now we would
be happy without it.)

I will much appreciate any suggestions on how this could be accomplished
with Bproc.

Thanks,

--
Greg     Shakhnarovich
AI Lab, MIT  NE43-V611
Cambridge,  MA   02139
tel     (617) 253-8170
fax     (617) 258-6287

Some of the oops people have been seeing in the 2.4.20 kernels may have
been due to an RPC race condition.  Here's a kernel thread on the topic,
and a couple of ksymoops from my systems:

http://www.ussg.iu.edu/hypermail/linux/kernel/0302.0/1146.html

These oops are from systems running 3.2.5, but the also occurred under
3.2.4.  They appear to be perciptated by simultaneous spikes in CPU and
network (NFS?) load, which happens regularly on compute clusters.  I've
contacted the original poster, who hasn't seen the problem since upgrading
to 2.4.21-rc6.   Has anyone tried bproc on 2.4.21 yet?
======================================================================================
ksymoops 2.4.8 on i686 2.4.20.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20/ (default)
     -m /boot/System.map (specified)

Unable to handle kernel NULL pointer dereference at virtual address 00000058
c0303206
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0303206>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 0000002c   ebx: 00000000   ecx: 00000008   edx: 00000001
esi: f7611078   edi: e7bc2480   ebp: f7611000   esp: c2837edc
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c2837000)
Stack: e7bc2480 c03031c0 00000020 00000000 c0304442 e7bc2480 e7bc24d4 c03043c0
       c0123b47 e7bc2480 c2837f0c 00000001 c882a0e4 f7bfbd78 00000000 00000001
       00000020 00000000 c011fafb c0433660 c011f9a1 00000000 00000001 c04095e0
Call Trace:    [<c03031c0>] [<c0304442>] [<c03043c0>] [<c0123b47>] [<c011fafb>]
  [<c011f9a1>] [<c011f72b>] [<c010a8ad>] [<c0106e60>] [<c0106e60>] [<c0106e60>]
  [<c0106e60>] [<c0106e8c>] [<c0106f12>] [<c011af6b>]
Code: 8b 40 2c 83 f8 09 0f 4c c8 b8 01 00 00 00 d3 e0 39 c2 7d 16

>>EIP; c0303206 <xprt_timer+46/e0>   <=====

>>esi; f7611078 <_end+371a0bfc/384b3b84>
>>edi; e7bc2480 <_end+27752004/384b3b84>
>>ebp; f7611000 <_end+371a0b84/384b3b84>
>>esp; c2837edc <_end+23c7a60/384b3b84>

Trace; c03031c0 <xprt_timer+0/e0>
Trace; c0304442 <rpc_run_timer+82/90>
Trace; c03043c0 <rpc_run_timer+0/90>
Trace; c0123b47 <timer_bh+2b7/3f0>
Trace; c011fafb <bh_action+4b/80>
Trace; c011f9a1 <tasklet_hi_action+61/a0>
Trace; c011f72b <do_softirq+7b/e0>
Trace; c010a8ad <do_IRQ+dd/f0>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e60 <default_idle+0/40>
Trace; c0106e8c <default_idle+2c/40>
Trace; c0106f12 <cpu_idle+52/70>
Trace; c011af6b <call_console_drivers+eb/100>

Code;  c0303206 <xprt_timer+46/e0>
00000000 <_EIP>:
Code;  c0303206 <xprt_timer+46/e0>   <=====
   0:   8b 40 2c                  mov    0x2c(%eax),%eax   <=====
Code;  c0303209 <xprt_timer+49/e0>
   3:   83 f8 09                  cmp    $0x9,%eax
Code;  c030320c <xprt_timer+4c/e0>
   6:   0f 4c c8                  cmovl  %eax,%ecx
Code;  c030320f <xprt_timer+4f/e0>
   9:   b8 01 00 00 00            mov    $0x1,%eax
Code;  c0303214 <xprt_timer+54/e0>
   e:   d3 e0                     shl    %cl,%eax
Code;  c0303216 <xprt_timer+56/e0>
  10:   39 c2                     cmp    %eax,%edx
Code;  c0303218 <xprt_timer+58/e0>
  12:   7d 16                     jge    2a <_EIP+0x2a> c0303230 <xprt_timer+70/e0>
==============================================================================================
 <0>Kernel panic: Aiee, killing interrupt handler!ksymoops 2.4.8 on i686 2.4.20.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20/ (default)
     -m /boot/System.map (specified)

Unable to handle kernel NULL pointer dereference at virtual address 00000058
c0303206
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0303206>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 0000002c   ebx: 00000000   ecx: 00000008   edx: 00000001
esi: f75d82b8   edi: f423ee40   ebp: f75d8000   esp: f53f3f24
ds: 0018   es: 0018   ss: 0018
Process blastall (pid: 2246, stackpage=f53f3000)
Stack: f423ee40 c03031c0 00000000 00000000 c0304442 f423ee40 f423ee94 c03043c0
       c0123b47 f423ee40 f53f3f54 00000086 c35a7a98 d901a0e4 00000000 00000001
       00000000 00000000 c011fafb c0433660 c011f9a1 00000000 00000001 c04095e0
Call Trace:    [<c03031c0>] [<c0304442>] [<c03043c0>] [<c0123b47>] [<c011fafb>] [<c011f9a1>] [<c011f72b>] [<c010a8ad>]
Code: 8b 40 2c 83 f8 09 0f 4c c8 b8 01 00 00 00 d3 e0 39 c2 7d 16

>>EIP; c0303206 <xprt_timer+46/e0>   <=====

>>esi; f75d82b8 <_end+37167e3c/384b3b84>
>>edi; f423ee40 <_end+33dce9c4/384b3b84>
>>ebp; f75d8000 <_end+37167b84/384b3b84>
>>esp; f53f3f24 <_end+34f83aa8/384b3b84>

Trace; c03031c0 <xprt_timer+0/e0>
Trace; c0304442 <rpc_run_timer+82/90>
Trace; c03043c0 <rpc_run_timer+0/90>
Trace; c0123b47 <timer_bh+2b7/3f0>
Trace; c011fafb <bh_action+4b/80>
Trace; c011f9a1 <tasklet_hi_action+61/a0>
Trace; c011f72b <do_softirq+7b/e0>
Trace; c010a8ad <do_IRQ+dd/f0>

Code;  c0303206 <xprt_timer+46/e0>
00000000 <_EIP>:
Code;  c0303206 <xprt_timer+46/e0>   <=====
   0:   8b 40 2c                  mov    0x2c(%eax),%eax   <=====
Code;  c0303209 <xprt_timer+49/e0>
   3:   83 f8 09                  cmp    $0x9,%eax
Code;  c030320c <xprt_timer+4c/e0>
   6:   0f 4c c8                  cmovl  %eax,%ecx
Code;  c030320f <xprt_timer+4f/e0>
   9:   b8 01 00 00 00            mov    $0x1,%eax
Code;  c0303214 <xprt_timer+54/e0>
   e:   d3 e0                     shl    %cl,%eax
Code;  c0303216 <xprt_timer+56/e0>
  10:   39 c2                     cmp    %eax,%edx
Code;  c0303218 <xprt_timer+58/e0>
  12:   7d 16                     jge    2a <_EIP+0x2a> c0303230 <xprt_timer+70/e0>

 <0>Kernel panic: Aiee, killing interrupt handler!

On Tue, 2003-06-24 at 07:55, Nicholas Henke wrote:

> 
> You may wish to try LAM/MPI ( www.lam-mpi.org ) The beta releases of
> 7.0, and 7.0 when it is release finally all have very nice support for
> Bproc. 

Wow -- one should really read their email before hitting 'Send' :)
Apparently English is not my best thing this early in the morning.

What I meant to say was that the 7.0 branch of LAM/MPI has very nice
bproc support, including such features as marking the bpmaster node as
'no-schedule' so MPI processes are not automatically scheduled on the
front end machine.

Cheers!
Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (25)	Nov	Dec (22)
2002	Jan (13)	Feb (22)	Mar (39)	Apr (10)	May (26)	Jun (23)	Jul (38)	Aug (20)	Sep (27)	Oct (76)	Nov (32)	Dec (11)
2003	Jan (8)	Feb (23)	Mar (12)	Apr (39)	May (1)	Jun (48)	Jul (35)	Aug (15)	Sep (60)	Oct (27)	Nov (9)	Dec (32)
2004	Jan (8)	Feb (16)	Mar (40)	Apr (25)	May (12)	Jun (33)	Jul (49)	Aug (39)	Sep (26)	Oct (47)	Nov (26)	Dec (36)
2005	Jan (29)	Feb (15)	Mar (22)	Apr (1)	May (8)	Jun (32)	Jul (11)	Aug (17)	Sep (9)	Oct (7)	Nov (15)	Dec

bproc-users Mailing List for BProc: Beowulf Distributed Process Space (Page 28)

bproc-users — General discussion about BProc.