From: <er...@he...> - 2004-06-08 21:14:10
|
On Mon, Jun 07, 2004 at 05:59:04AM -0500, Luke Palmer wrote: > Hi everyone, > > I have a bproc cluster up and running of Fedora core 1. bpsh looks to > be working. bjs doesn't appear to be doing anything sane, however. I'm > using v1.5. > > In my test configuration I have 4 SMP nodes. Using the "filler" policy, > bjs will place 4 processes on the master only. Slaves do not get used. This is the way that bjs is supposed to operate. On a BProc system placing processes on the slave nodes is normally handled by something like mpirun. Jobs usually look like simple scripts (which run on the front end) that contain commands to start stuff on the slave nodes. The NODES environement variable should tell the script where to run things. > "simple" will start as many jobs as I want, but only on the master. > bjsctl -r does not kill a running job, either. bjsctl -r will kill off stuff running on the nodes allocated to the job. > "shared" just plain doesn't work. Any call to bjssub gives me this: > > connect("/tmp/.bjs"): Connection refused > Failed to connect to scheduler. > Is bjs not running or is /tmp/.bjs the wrong socket path? That usually means bjs is died for some reason. It could be that "shared" is busted. That one doesn't get a lot of testing around here. - Erik |