From: barton at ualberta.ca (B. Barton) - 2007-04-19 16:19:04
|
I have just re-installed my cluster of IBM e-series 325, 326 machines using xcat-1.2.0-RC3 replacing my masternode and moving to Scientific Linux 4.4 (from Scientific Linux 4.2). I installed torque-1.2.0p1 and maui-3.2.6p11 as I had them working under Scientific Linux 4.2. I now find that I cannot get jobs to run successfully. When I use qsub the jobs get queued and start but no results are produced. e.g. (server log) 04/19/2007 15:45:29;0008;PBS_Server;Job;12.masternode;Job Queued at request of barton@masternode, owner = barton@masternode, job name = TestingPBS, queue = dque 04/19/2007 15:45:29;0040;PBS_Server;Svr;masternode;Scheduler sent command new 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at request of root@masternode 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Run at request of root@masternode 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at request of root@masternode 04/19/2007 15:45:35;0010;PBS_Server;Job;12.masternode;Exit_status=-2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00 04/19/2007 15:45:35;000d;PBS_Server;Job;12.masternode;Post job file processing error; job 12.masternode on host node16/0 I always get the Exit_staus=-2 and I don't know what is causing it. -- Bob Barton <bob...@ua...> Local Area Administrator (780) 492-5160 564B Chemical & Materials Engineering University of Alberta, Edmonton Alberta, T6G 2G6 |
From: barton at ualberta.ca (B. Barton) - 2007-04-19 16:32:07
|
Oops :-[ . My only excuse is I am tired and I have been working on this move of the cluster too long. The problem was Stale NFS handles so all I needed to do was unmount and remount the NFS files systems on the nodes! Bob Barton wrote: > I have just re-installed my cluster of IBM e-series 325, 326 machines > using xcat-1.2.0-RC3 replacing my masternode and moving to Scientific > Linux 4.4 (from Scientific Linux 4.2). > I installed torque-1.2.0p1 and maui-3.2.6p11 as I had them working > under Scientific Linux 4.2. > I now find that I cannot get jobs to run successfully. When I use qsub > the jobs get queued and start but no results are produced. > e.g. (server log) > 04/19/2007 15:45:29;0008;PBS_Server;Job;12.masternode;Job Queued at > request of barton@masternode, owner = barton@masternode, job name = > TestingPBS, queue = dque > 04/19/2007 15:45:29;0040;PBS_Server;Svr;masternode;Scheduler sent > command new > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at > request of root@masternode > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Run at > request of root@masternode > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Modified at > request of root@masternode > 04/19/2007 15:45:35;0010;PBS_Server;Job;12.masternode;Exit_status=-2 > resources_used.cput=00:00:00 resources_used.mem=0kb > resources_used.vmem=0kb resources_used.walltime=00:00:00 > 04/19/2007 15:45:35;000d;PBS_Server;Job;12.masternode;Post job file > processing error; job 12.masternode on host node16/0 > > I always get the Exit_staus=-2 and I don't know what is causing it. > -- Bob Barton <bob...@ua...> Local Area Administrator (780) 492-5160 564B Chemical & Materials Engineering University of Alberta, Edmonton Alberta, T6G 2G6 |
From: egan at sense.n. (E. Ford) - 2007-04-20 07:49:08
|
So, no problems? > -----Original Message----- > From: xca...@li... > [mailto:xca...@li...] On Behalf Of Bob Barton > Sent: Thursday, April 19, 2007 4:32 PM > To: xCAT Users Mailing list > Subject: Re: [xcat-user] torque/qsub problem > Importance: High > > > Oops :-[ . My only excuse is I am tired and I have been > working on this > move of the cluster too long. The problem was Stale NFS > handles so all I > needed to do was unmount and remount the NFS files systems on > the nodes! > > Bob Barton wrote: > > I have just re-installed my cluster of IBM e-series 325, > 326 machines > > using xcat-1.2.0-RC3 replacing my masternode and moving to > Scientific > > Linux 4.4 (from Scientific Linux 4.2). > > I installed torque-1.2.0p1 and maui-3.2.6p11 as I had them working > > under Scientific Linux 4.2. > > I now find that I cannot get jobs to run successfully. When > I use qsub > > the jobs get queued and start but no results are produced. > > e.g. (server log) > > 04/19/2007 15:45:29;0008;PBS_Server;Job;12.masternode;Job Queued at > > request of barton@masternode, owner = barton@masternode, job name = > > TestingPBS, queue = dque > > 04/19/2007 15:45:29;0040;PBS_Server;Svr;masternode;Scheduler sent > > command new > > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job > Modified at > > request of root@masternode > > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job Run at > > request of root@masternode > > 04/19/2007 15:45:30;0008;PBS_Server;Job;12.masternode;Job > Modified at > > request of root@masternode > > 04/19/2007 > 15:45:35;0010;PBS_Server;Job;12.masternode;Exit_status=-2 > > resources_used.cput=00:00:00 resources_used.mem=0kb > > resources_used.vmem=0kb resources_used.walltime=00:00:00 > > 04/19/2007 15:45:35;000d;PBS_Server;Job;12.masternode;Post job file > > processing error; job 12.masternode on host node16/0 > > > > I always get the Exit_staus=-2 and I don't know what is causing it. > > > > -- > Bob Barton <bob...@ua...> > Local Area Administrator (780) 492-5160 > 564B Chemical & Materials Engineering > University of Alberta, > Edmonton Alberta, T6G 2G6 > > _______________________________________________ > xcat-user mailing list > xca...@li... > http://www.xcat.org/mailman/listinfo/xcat-user |