Actually, due to the complexity of the situation, I may have to implement this in order to figure out whether it's a viable option. Hopefully there's an easier way. I'll make some notes here.
Up till now:
PE uses qsub with a specially-constructed script file to submit jobs with customized job name and various other info fields set, but doesn't specify what job queue to use, so it is using the default queue.
It uses qstat -f -xml to get a list of actively running jobs (and pending jobs?). It doesn't specify a queue here either. It looks like this gets you all jobs from all queues, and Todd says it does, but I'll have to test that to make sure it provides all the info I need in the form that I'm expecting.
It uses qacct -j to get a list of finished jobs. I'll have to find out whether this lists all queues as well.
It uses qdel (jobnumber) to kill a job. I'll have to find out whether this works regardless of which queue a job is on.
Ideally I can just modify the qsub to submit to the right queue, and the other stuff will just work.
I guess the thing to do is:
Submit a test job using the code as is
Submit another test job with the code patched to submit to a different queue
Manually test the qstat, qacct, and qdel commands on those jobs
make sure the test jobs run for a while, so I can look for them in qstat before they terminate. Then look for them in qacct after they finish.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I already know that a job submitted as is will show up in the listing whether running or finished (i.e. in qstat and in qacct), and that it will work with qdel as is. So I want to test whether these three operations work as is with a job submitted to a different queue. Then I'll need to consider different things depending on the answer to that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It may help me to document the context for this, too.
To do some system upgrading on the cluster nodes, it's been proposed that we cut out two of the nodes, upgrade them, and run test wikis on them. So we'll have two different wiki systems, one on the upgraded nodes and one on the unchanged nodes. There's a special url, http://lalashan.mcmaster.ca/dev/*, that will forward to the wikis on the upgraded test nodes, while the usual urls will go to the regular wikis on the other nodes. To test for interactions between the upgrade and SGE, we have two non-default job queues "q1" and "q2" for the old and new cluster nodes, respectively. So we want the dev wikis to use q2 and the non-dev wikis to use q1, and neither should use the default queue ("all.q"), because that would send non-dev users' jobs to untested upgraded cluster nodes.
Additionally, ideally the background jobs listings shouldn't show jobs from one cluster's queue on the other one. For our needs we can gloss over that, since the dev cluster is only for testing. It's a necessary feature, though, if we're going to "support" using non-default queues as a part of WW for third-party users.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm realizing that for the purpose of testing the new nodes, I don't actually feel the need to test the whole set of SGE features by submitting to q2, it should be enough just to verify that the SGE commands work after the upgrade. Since we don't need this feature for anything else, and it seems like kind of a lot of work, it would be great to skip it.
But to my frustration, I do think I need to keep the non-dev nodes from sending background jobs to the testing nodes. I think I'll ask Todd if he can just temporarily remove the testing nodes from the default queue. Because otherwise I have to get the non-dev wikis to submit to q1.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not clear whether this ticket originally meant to allow the cluster admin to choose a queue for each wiki farm, or to allow users to choose a queue, but I'm going to go with the former and say it's been fulfilled.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually, due to the complexity of the situation, I may have to implement this in order to figure out whether it's a viable option. Hopefully there's an easier way. I'll make some notes here.
Up till now:
qsub
with a specially-constructed script file to submit jobs with customized job name and various other info fields set, but doesn't specify what job queue to use, so it is using the default queue.qstat -f -xml
to get a list of actively running jobs (and pending jobs?). It doesn't specify a queue here either. It looks like this gets you all jobs from all queues, and Todd says it does, but I'll have to test that to make sure it provides all the info I need in the form that I'm expecting.qacct -j
to get a list of finished jobs. I'll have to find out whether this lists all queues as well.qdel (jobnumber)
to kill a job. I'll have to find out whether this works regardless of which queue a job is on.Ideally I can just modify the
qsub
to submit to the right queue, and the other stuff will just work.I guess the thing to do is:
qstat
,qacct
, andqdel
commands on those jobsqstat
before they terminate. Then look for them inqacct
after they finish.Created a test project for this at http://lalashan.mcmaster.ca/wonder_development/index.php/Qsub_test.
I already know that a job submitted as is will show up in the listing whether running or finished (i.e. in
qstat
and inqacct
), and that it will work withqdel
as is. So I want to test whether these three operations work as is with a job submitted to a different queue. Then I'll need to consider different things depending on the answer to that.It may help me to document the context for this, too.
To do some system upgrading on the cluster nodes, it's been proposed that we cut out two of the nodes, upgrade them, and run test wikis on them. So we'll have two different wiki systems, one on the upgraded nodes and one on the unchanged nodes. There's a special url, http://lalashan.mcmaster.ca/dev/*, that will forward to the wikis on the upgraded test nodes, while the usual urls will go to the regular wikis on the other nodes. To test for interactions between the upgrade and SGE, we have two non-default job queues "q1" and "q2" for the old and new cluster nodes, respectively. So we want the dev wikis to use q2 and the non-dev wikis to use q1, and neither should use the default queue ("all.q"), because that would send non-dev users' jobs to untested upgraded cluster nodes.
Additionally, ideally the background jobs listings shouldn't show jobs from one cluster's queue on the other one. For our needs we can gloss over that, since the dev cluster is only for testing. It's a necessary feature, though, if we're going to "support" using non-default queues as a part of WW for third-party users.
I'm realizing that for the purpose of testing the new nodes, I don't actually feel the need to test the whole set of SGE features by submitting to q2, it should be enough just to verify that the SGE commands work after the upgrade. Since we don't need this feature for anything else, and it seems like kind of a lot of work, it would be great to skip it.
But to my frustration, I do think I need to keep the non-dev nodes from sending background jobs to the testing nodes. I think I'll ask Todd if he can just temporarily remove the testing nodes from the default queue. Because otherwise I have to get the non-dev wikis to submit to q1.
The answer is yes, so I'm retiring this. Reopen in future if a need arises.
Added a $peSGEQueueName setting in r.1238.
I'm not clear whether this ticket originally meant to allow the cluster admin to choose a queue for each wiki farm, or to allow users to choose a queue, but I'm going to go with the former and say it's been fulfilled.