Menu

#424 get PE SGE interface to submit to specific SGE queue

workingwiki
closed
None
5
2016-12-29
2013-10-10
Lee Worden
No

This is a good feature to have, and it looks like we may need it on our cluster.

Discussion

  • Lee Worden

    Lee Worden - 2013-10-10

    Actually, due to the complexity of the situation, I may have to implement this in order to figure out whether it's a viable option. Hopefully there's an easier way. I'll make some notes here.

    Up till now:

    • PE uses qsub with a specially-constructed script file to submit jobs with customized job name and various other info fields set, but doesn't specify what job queue to use, so it is using the default queue.
    • It uses qstat -f -xml to get a list of actively running jobs (and pending jobs?). It doesn't specify a queue here either. It looks like this gets you all jobs from all queues, and Todd says it does, but I'll have to test that to make sure it provides all the info I need in the form that I'm expecting.
    • It uses qacct -j to get a list of finished jobs. I'll have to find out whether this lists all queues as well.
    • It uses qdel (jobnumber) to kill a job. I'll have to find out whether this works regardless of which queue a job is on.

    Ideally I can just modify the qsub to submit to the right queue, and the other stuff will just work.

    I guess the thing to do is:

    • Submit a test job using the code as is
    • Submit another test job with the code patched to submit to a different queue
    • Manually test the qstat, qacct, and qdel commands on those jobs
      • make sure the test jobs run for a while, so I can look for them in qstat before they terminate. Then look for them in qacct after they finish.
     
  • Lee Worden

    Lee Worden - 2013-10-10
     
  • Lee Worden

    Lee Worden - 2013-10-10

    I already know that a job submitted as is will show up in the listing whether running or finished (i.e. in qstat and in qacct), and that it will work with qdel as is. So I want to test whether these three operations work as is with a job submitted to a different queue. Then I'll need to consider different things depending on the answer to that.

     
  • Lee Worden

    Lee Worden - 2013-10-10

    It may help me to document the context for this, too.

    To do some system upgrading on the cluster nodes, it's been proposed that we cut out two of the nodes, upgrade them, and run test wikis on them. So we'll have two different wiki systems, one on the upgraded nodes and one on the unchanged nodes. There's a special url, http://lalashan.mcmaster.ca/dev/*, that will forward to the wikis on the upgraded test nodes, while the usual urls will go to the regular wikis on the other nodes. To test for interactions between the upgrade and SGE, we have two non-default job queues "q1" and "q2" for the old and new cluster nodes, respectively. So we want the dev wikis to use q2 and the non-dev wikis to use q1, and neither should use the default queue ("all.q"), because that would send non-dev users' jobs to untested upgraded cluster nodes.

    Additionally, ideally the background jobs listings shouldn't show jobs from one cluster's queue on the other one. For our needs we can gloss over that, since the dev cluster is only for testing. It's a necessary feature, though, if we're going to "support" using non-default queues as a part of WW for third-party users.

     
  • Lee Worden

    Lee Worden - 2013-10-10

    I'm realizing that for the purpose of testing the new nodes, I don't actually feel the need to test the whole set of SGE features by submitting to q2, it should be enough just to verify that the SGE commands work after the upgrade. Since we don't need this feature for anything else, and it seems like kind of a lot of work, it would be great to skip it.

    But to my frustration, I do think I need to keep the non-dev nodes from sending background jobs to the testing nodes. I think I'll ask Todd if he can just temporarily remove the testing nodes from the default queue. Because otherwise I have to get the non-dev wikis to submit to q1.

     
  • Lee Worden

    Lee Worden - 2013-10-10
    • status: open --> wont-fix
     
  • Lee Worden

    Lee Worden - 2013-10-10

    The answer is yes, so I'm retiring this. Reopen in future if a need arises.

     
  • Lee Worden

    Lee Worden - 2016-12-29

    Added a $peSGEQueueName setting in r.1238.

    I'm not clear whether this ticket originally meant to allow the cluster admin to choose a queue for each wiki farm, or to allow users to choose a queue, but I'm going to go with the former and say it's been fulfilled.

     
  • Lee Worden

    Lee Worden - 2016-12-29
    • status: wont-fix --> closed
     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.