From: Luke P. <lo...@du...> - 2004-06-09 04:57:32
|
Here's a diff for shared.c that I think does the trick. Just an uninitialized variable and a misplaced control structure. 79c79 < int i, node_count; --- > int i, node_count=0; 95,97d94 < job_counts[node_count].node = node->node; < job_counts[node_count].count = 0; < 99a97,98 > job_counts[node_count].node = node->node; > job_counts[node_count].count = 0; -Luke On Tue, 2004-06-08 at 14:48, er...@he... wrote: > On Mon, Jun 07, 2004 at 05:59:04AM -0500, Luke Palmer wrote: > > Hi everyone, > > > > I have a bproc cluster up and running of Fedora core 1. bpsh looks to > > be working. bjs doesn't appear to be doing anything sane, however. I'm > > using v1.5. > > > > In my test configuration I have 4 SMP nodes. Using the "filler" policy, > > bjs will place 4 processes on the master only. Slaves do not get used. > > This is the way that bjs is supposed to operate. On a BProc system > placing processes on the slave nodes is normally handled by something > like mpirun. Jobs usually look like simple scripts (which run on the > front end) that contain commands to start stuff on the slave nodes. > The NODES environement variable should tell the script where to run > things. > > > "simple" will start as many jobs as I want, but only on the master. > > bjsctl -r does not kill a running job, either. > > bjsctl -r will kill off stuff running on the nodes allocated to the job. > > > "shared" just plain doesn't work. Any call to bjssub gives me this: > > > > connect("/tmp/.bjs"): Connection refused > > Failed to connect to scheduler. > > Is bjs not running or is /tmp/.bjs the wrong socket path? > > That usually means bjs is died for some reason. It could be that > "shared" is busted. That one doesn't get a lot of testing around > here. > > - Erik |