[Queue-developers] Re: GNU queue 1.20.1 fails to start job on remote host
Brought to you by:
wkrebs
From: W. G. K. <wer...@ya...> - 2000-09-08 21:56:42
|
Sorry for the long delay; I've been out of town and away from email. QingLong wrote: > Hello! > > I have a problem which probably is due to lack of up to date docs > for `queue', so I am trying to get help online. > IMHO, making a bug mailing list (actually `tips' list certainly is such one) > closed (`subscribers only') is really impolite. This is unfortunately necessary to keep out robo-spam. It's actually kept out quite a bit of spam that would otherwise have gone out to the list. It's easy to get on and off, and of course there are also the discussion forums for bugs on SourceForge. The new queue-support list (where these sorts of messages will begin to go in the future) is not spam-proofed in this way, but it is intended to have a lower signal-to-noise ratio anyway. > > I try to start a job on an other host using queue command like: > > queue --queue --spooldir wait --wait --no-pty -- echo X > > it just hangs forever. While > > queue --immediate --wait --no-pty -- echo X > > works fine (I have also tried running heavier jobs). > When I start job in `wait' queue (configured in a way that makes > jobs always start on remote end (using pfactor)) I get in supervisorlog > on remote end: > > Aug 30 01:44:23 khvanchkara.bunch.ihep.su wait: cfm448981877: START (delayed 0 min): qinglong (qinglong) > Aug 30 01:57:31 khvanchkara.bunch.ihep.su wait: cfm448981877: END: cpu 0.0s signal 0 exit 2 (02) > > but NOTHING happens (no output), and job (shell job) is still there > and do not return control to shell. What platform is this on. Try running queued -D &. > > > I have also tried using NFS-shared queue spool dir, jobs seemed to be > starting and processing just fine although not on the remote end, 1.20.1 doesn't use NFS, so the queue spool dir should not be NFS shared (although it will probably work fine if it is; I don't recommend it.). > > but on local host. Here are messages from supervisorlog on local host: > > Aug 29 17:30:54 alexandreuli.bunch.ihep.su wait: cfm448507134: START (delayed 1 min): qinglong (qinglong) > Aug 29 17:30:55 alexandreuli.bunch.ihep.su wait: cfm448507134: END: cpu 1.6s signal 0 exit 0 (00) > Aug 29 17:32:56 alexandreuli.bunch.ihep.su wait: cfm448508334: START (delayed 2 min): qinglong (qinglong) > Aug 29 17:32:57 alexandreuli.bunch.ihep.su wait: cfm448508334: END: cpu 1.6s signal 0 exit 0 (00) > > And queued on remote end was dying with messages like: > > Aug 29 17:29:56 khvanchkara.bunch.ihep.su wait: cfm448507134: START (delayed 0 min): qinglong (qinglong) > Aug 29 17:31:11 khvanchkara.bunch.ihep.su Trouble aborting vanished job cfm448507134 > Aug 29 17:31:11 khvanchkara.bunch.ihep.su wait: cfm448508334: START (delayed 0 min): qinglong (qinglong) > Aug 29 17:33:11 khvanchkara.bunch.ihep.su Trouble aborting vanished job cfm448508334 > Aug 29 18:10:37 khvanchkara.bunch.ihep.su wait: enabled: maxexec=48 loadsched=36 loadstop=66 nice=1 cpu inf > > This looked like remote queued got confused by local queued control files > in shared spool dir, so I decided that new `queue' do not use NFS-shared > spool dir, in contrary to what the `queue' documentation states. > > To say the truth, first I tried to use 1.12.8. No success. > Too many segfaults and other weirdness. > > So, what I am doing wrong? How should I create batch queue, > which executes all jobs on remote end (local host should be `submit only'). You set profile on the submit only host to have a maxexec of zero. > > > BTW, is it possible to limit pty emulation level for the queue? > Say, I would like to insist on `--no-pty' (and `--batch') mode > for all jobs in `wait' queue. > > I have also had to hack `queue's Makefile.am (heavily) and configure.in > and profile.in to make the build/installation process relocatable > (required for src.rpm and other source packages). build/installation should certainly be relocatable as written (via ./configure); send me the patches so that I can see what you felt necessary to change. > > If you are interested, I would be happy to send you the patch and RPM spec. > > BR, > > QingLong. |