From: Sriram K. <sr...@sd...> - 2009-11-04 16:36:33
|
Hi Malcolm, The hard limit is being set programmatically via DRMAA. However, as the note says, only some schedulers support it. We have also noticed that SGE simply ignores this - which is consistent with what you have seen. As a solution, you might want to write a cron script that looks at the hard limit and purges jobs from the queue if it is old. Thanks, Sriram On Nov 4, 2009, at 7:27 AM, Malcolm Tobias wrote: > > I'm running Opal 2.0 and using the DRMAAJobManager to interface with > Sun Grid > Engine (SGE). I've got the opal.hard_limit set to 3600s: > > # specify in seconds the hard limit for how long a job can run > # only applicable if either DRMAA or Globus is being used, and if > # the scheduler supports it > opal.hard_limit=3600 > > but occasionally encounter 'run away' jobs that are never killed: > > [mtobias@sccne ~]$ qstat -u opal > job-ID prior name user state submit/start at > queue > slots ja-task-ID > ----------------------------------------------------------------------------------------------------------------- > 3058 0.55500 pdb2pqr.py opal r 11/02/2009 14:34:19 > all.q@compute-0-10.local 1 > > I've tried looking in the $TOMCAT/webapps/ROOT directory where the > job data is > stored, but I don't see any file that looks like a batch script to > examine to > see if it's limiting the CPU time limit. > > I've also looked at the compute-node that the job is running on and > examined > the 'trace' file which appears to be where SGE is setting up the > job. It > seems like it's setting some ridiculous limits: > > 11/02/2009 14:34:19 [400:22267]: setting limits > 11/02/2009 14:34:19 [400:22267]: RLIMIT_CPU setting: (soft > 18446744073709551615 > hard 18446744073709551615) resulting: (soft 18446744073709551615 hard > 18446744073709551615) > > Any ideas on what might be going wrong or how to debug this further? > > Malcolm > > -- > Malcolm Tobias > 314.362.1594 > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Opaltoolkit-users mailing list > Opa...@li... > https://lists.sourceforge.net/lists/listinfo/opaltoolkit-users |