From: barton at ualberta.ca (B. Barton) - 2007-04-19 16:32:07
|
Oops :-[ . My only excuse is I am tired and I have been working on this move of the cluster too long. The problem was Stale NFS handles so all I needed to do was unmount and remount the NFS files systems on the nodes! Bob Barton wrote: > I have just re-installed my cluster of IBM e-series 325, 326 machines > using xcat-1.2.0-RC3 replacing my masternode and moving to Scientific > Linux 4.4 (from Scientific Linux 4.2). > I installed torque-1.2.0p1 and maui-3.2.6p11 as I had them working > under Scientific Linux 4.2. > I now find that I cannot get jobs to run successfully. When I use qsub > the jobs get queued and start but no results are produced. > e.g. (server log) > 04/19/2007 15:45:29;0008;PBS_Server;Job;12.masternode;Job Queued at > request of barton@masternode, owner = barton@masternode, job name = > TestingPBS, queue = dque > 04/19/2007 15:45:29;0040;PBS_Server;Svr;masternode;Scheduler sent > command new > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at > request of root@masternode > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Run at > request of root@masternode > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at > request of root@masternode > 04/19/2007 15:45:35;0010;PBS_Server;Job;12.masternode;Exit_status=-2 > resources_used.cput=00:00:00 resources_used.mem=0kb > resources_used.vmem=0kb resources_used.walltime=00:00:00 > 04/19/2007 15:45:35;000d;PBS_Server;Job;12.masternode;Post job file > processing error; job 12.masternode on host node16/0 > > I always get the Exit_staus=-2 and I don't know what is causing it. > -- Bob Barton <bob...@ua...> Local Area Administrator (780) 492-5160 564B Chemical & Materials Engineering University of Alberta, Edmonton Alberta, T6G 2G6 |