Thread: [Queue-developers] Re: GNU queue 1.20.1 fails to start job on remote host
Brought to you by:
wkrebs
From: W. G. K. <wer...@ya...> - 2000-09-08 21:56:42
|
Sorry for the long delay; I've been out of town and away from email. QingLong wrote: > Hello! > > I have a problem which probably is due to lack of up to date docs > for `queue', so I am trying to get help online. > IMHO, making a bug mailing list (actually `tips' list certainly is such one) > closed (`subscribers only') is really impolite. This is unfortunately necessary to keep out robo-spam. It's actually kept out quite a bit of spam that would otherwise have gone out to the list. It's easy to get on and off, and of course there are also the discussion forums for bugs on SourceForge. The new queue-support list (where these sorts of messages will begin to go in the future) is not spam-proofed in this way, but it is intended to have a lower signal-to-noise ratio anyway. > > I try to start a job on an other host using queue command like: > > queue --queue --spooldir wait --wait --no-pty -- echo X > > it just hangs forever. While > > queue --immediate --wait --no-pty -- echo X > > works fine (I have also tried running heavier jobs). > When I start job in `wait' queue (configured in a way that makes > jobs always start on remote end (using pfactor)) I get in supervisorlog > on remote end: > > Aug 30 01:44:23 khvanchkara.bunch.ihep.su wait: cfm448981877: START (delayed 0 min): qinglong (qinglong) > Aug 30 01:57:31 khvanchkara.bunch.ihep.su wait: cfm448981877: END: cpu 0.0s signal 0 exit 2 (02) > > but NOTHING happens (no output), and job (shell job) is still there > and do not return control to shell. What platform is this on. Try running queued -D &. > > > I have also tried using NFS-shared queue spool dir, jobs seemed to be > starting and processing just fine although not on the remote end, 1.20.1 doesn't use NFS, so the queue spool dir should not be NFS shared (although it will probably work fine if it is; I don't recommend it.). > > but on local host. Here are messages from supervisorlog on local host: > > Aug 29 17:30:54 alexandreuli.bunch.ihep.su wait: cfm448507134: START (delayed 1 min): qinglong (qinglong) > Aug 29 17:30:55 alexandreuli.bunch.ihep.su wait: cfm448507134: END: cpu 1.6s signal 0 exit 0 (00) > Aug 29 17:32:56 alexandreuli.bunch.ihep.su wait: cfm448508334: START (delayed 2 min): qinglong (qinglong) > Aug 29 17:32:57 alexandreuli.bunch.ihep.su wait: cfm448508334: END: cpu 1.6s signal 0 exit 0 (00) > > And queued on remote end was dying with messages like: > > Aug 29 17:29:56 khvanchkara.bunch.ihep.su wait: cfm448507134: START (delayed 0 min): qinglong (qinglong) > Aug 29 17:31:11 khvanchkara.bunch.ihep.su Trouble aborting vanished job cfm448507134 > Aug 29 17:31:11 khvanchkara.bunch.ihep.su wait: cfm448508334: START (delayed 0 min): qinglong (qinglong) > Aug 29 17:33:11 khvanchkara.bunch.ihep.su Trouble aborting vanished job cfm448508334 > Aug 29 18:10:37 khvanchkara.bunch.ihep.su wait: enabled: maxexec=48 loadsched=36 loadstop=66 nice=1 cpu inf > > This looked like remote queued got confused by local queued control files > in shared spool dir, so I decided that new `queue' do not use NFS-shared > spool dir, in contrary to what the `queue' documentation states. > > To say the truth, first I tried to use 1.12.8. No success. > Too many segfaults and other weirdness. > > So, what I am doing wrong? How should I create batch queue, > which executes all jobs on remote end (local host should be `submit only'). You set profile on the submit only host to have a maxexec of zero. > > > BTW, is it possible to limit pty emulation level for the queue? > Say, I would like to insist on `--no-pty' (and `--batch') mode > for all jobs in `wait' queue. > > I have also had to hack `queue's Makefile.am (heavily) and configure.in > and profile.in to make the build/installation process relocatable > (required for src.rpm and other source packages). build/installation should certainly be relocatable as written (via ./configure); send me the patches so that I can see what you felt necessary to change. > > If you are interested, I would be happy to send you the patch and RPM spec. > > BR, > > QingLong. |
From: QingLong <qin...@Bo...> - 2001-01-31 15:00:10
Attachments:
queue-1.30.1.QL-hack.tar.gz
|
Hello! A while ago I've submitted you a set of patches representing changes which I'd had to do to get queue-1.20.1 work reliably enough to be usable. AFAICS you have not used them at all, so I try once again. Please consider them, I hope you would find a few usefull bits there. Thank You. I submit for your consideration a set of patches against queue-1.30.1, most of them are adapted versions of patches I've already sent to you. Please have a look at the attached queue-1.30.1.QL-hack.tar.gz file. Some comments on individual files: queue.spec A spec file for RPM. queue-1.30.1.pebkac-lart.diff This patch fixes autoconf/automake and makefile stuff. I have also had to hack `queue's Makefile.am (heavily) and configure.in and profile.in to make the build/installation process relocatable (required for src.rpm and other source packages). To say the truth, the final destinations in original stuff really are relocatable, but many package managing systems need to perform `faked root installation' usually somewhere in /tmp/ or /var/tmp/ to build binary packages. The latter is impossible with current code. I believe it's worth supporting prefixing installation paths by $(DESTDIR) or $(DESTROOT) scheme. Although I haven't added them. queue-1.30.1.extra-trace-messages.diff This one adds trace messages about ongoing `connect()'s, `accept()'s and alikes to help traceing network-related problems. queue-1.30.1.ptty-support-code-borrowed-from-rxvt.diff This patch almost entirely consists of pty code borrowed from RXVT (I would also recommend you to have a look at xterm's pty/tty code). I just have slightly modified it to fit it to the `queue' environment. queue-1.30.1.reject-jobs-on-inactive-queue.diff Makes `queued' reject jobs on inactive (`exec off' or `exec drain') queues. Without this patch jobs can go to hosts which do not run this queue, thus effectively hanging forever. BTW, I consider method of signalling job rejection (returning magic loadaverage value) as lame, this obvoiusly is design flaw. queue-1.30.1.verbose.diff This patch adds `--verbose' (`-v' in GNU tradition) flag to `queue' and a few trace messages. This (and it's goal) is different from the `-DDEBUG' cpp flag, as `-DDEBUG' enables lots of insecure debug messages (like printing cookie values and so on) makeing debug-enabled `queue' binary useless for production installation. The `--verbose' flag is intended to print trace and debug info useful for ordinary user without compromising system security. The patch also renames current `-v' (`--version') to `-V'. I believe most GNU programs have `-V' for `--version' and `-v' for `--verbose'. queue-1.30.1.const-char-2-char.diff I've had to add explicit type conversion to mage g++ happy. queue-1.30.1.debug.diff Adds --debug facility to queue (NOT queued). queue-1.30.1.reliable-connect-fread.diff This is a hack around hanging forever in fread()ing from a stream opened on a network socket connected to a dead remote end. If remote queued hangs (it's alive, but stalled) for some unknown reason (this does happen rather often!), it still has network port opened in `listen' state, i.e. it does accept connections (as this stage of connection establishing is done by kernel) but is silent. And fread()ing from this connection hangs forever, it does not time out (at least, I've failed to get it time out). I had to use select() on the underlying socket to make it work reliably. BTW, I have managed to trace this problem out and fix it only due to --debug and --verbose flags and trace messages added by the above patches. queue-1.30.1.skip-1e06-la.diff If all queued's are rejecting jobs (e.g. if they all are dead or deaf), the hosts list will contain only hosts with 1e08 (and alike) loadaverages, designating that those queues are down, but wakeup() will still try to connect those queued's... So it's worth skipping all non-willing-to-serve hosts. Besides that I would like to ask you to move `profile' config files from spool directories to more appropriate place like, e.g. /etc/queue/ or /usr/etc/. And please consider adding ``DESTDIR'' style to ``local'' installation rules in Makefile.am. Best regards. QingLong. |
From: W. G. K. <wer...@ya...> - 2001-01-31 20:03:17
|
This would be due to some sort of oversight. I would never completely reject such a comprehensive set of patches. What seems to have happened is that you sent the patches to me in response to somone's queue-developers message (but to my email rather than the list), and I didn't realize that there was a patch at the end of the email. I suppose this is an argument in favor of the patch manager on http://www.gnuqueue.org , which will make patches instantly available to everyone. I suppose 1.30.2 is way overdue at this point, but I've been very busy these last few months with my real-world job trying to meet a very hard and fast deadline. Hopefully, I'll have a chance to look through some of the various patches that have been sent in and apply them towards the new release. In the meantime, if those of you with write access to the CVS repository would care to help me out by testing and applying the various patches to the repository, that would help me out alot and would get 1.30.2 out much sooner. QingLong wrote: > Hello! > > A while ago I've submitted you a set of patches representing changes > which I'd had to do to get queue-1.20.1 work reliably enough to be usable. > AFAICS you have not used them at all, so I try once again. > Please consider them, I hope you would find a few usefull bits there. > Thank You. > > I submit for your consideration a set of patches against queue-1.30.1, > most of them are adapted versions of patches I've already sent to you. > Please have a look at the attached queue-1.30.1.QL-hack.tar.gz file. > Some comments on individual files: > > queue.spec > A spec file for RPM. > > queue-1.30.1.pebkac-lart.diff > This patch fixes autoconf/automake and makefile stuff. > I have also had to hack `queue's Makefile.am (heavily) and configure.in > and profile.in to make the build/installation process relocatable > (required for src.rpm and other source packages). > To say the truth, the final destinations in original stuff > really are relocatable, but many package managing systems > need to perform `faked root installation' usually somewhere > in /tmp/ or /var/tmp/ to build binary packages. > The latter is impossible with current code. > I believe it's worth supporting prefixing installation paths by > $(DESTDIR) or $(DESTROOT) scheme. Although I haven't added them. > > queue-1.30.1.extra-trace-messages.diff > This one adds trace messages about ongoing `connect()'s, `accept()'s > and alikes to help traceing network-related problems. > > queue-1.30.1.ptty-support-code-borrowed-from-rxvt.diff > This patch almost entirely consists of pty code borrowed from RXVT > (I would also recommend you to have a look at xterm's pty/tty code). > I just have slightly modified it to fit it to the `queue' environment. > > queue-1.30.1.reject-jobs-on-inactive-queue.diff > Makes `queued' reject jobs on inactive (`exec off' or `exec drain') > queues. Without this patch jobs can go to hosts which do not run > this queue, thus effectively hanging forever. > BTW, I consider method of signalling job rejection (returning magic > loadaverage value) as lame, this obvoiusly is design flaw. > > queue-1.30.1.verbose.diff > This patch adds `--verbose' (`-v' in GNU tradition) flag to `queue' > and a few trace messages. This (and it's goal) is different from > the `-DDEBUG' cpp flag, as `-DDEBUG' enables lots of insecure > debug messages (like printing cookie values and so on) makeing > debug-enabled `queue' binary useless for production installation. > The `--verbose' flag is intended to print trace and debug info > useful for ordinary user without compromising system security. > > The patch also renames current `-v' (`--version') to `-V'. > I believe most GNU programs have `-V' for `--version' > and `-v' for `--verbose'. > > queue-1.30.1.const-char-2-char.diff > I've had to add explicit type conversion to mage g++ happy. > > queue-1.30.1.debug.diff > Adds --debug facility to queue (NOT queued). > > queue-1.30.1.reliable-connect-fread.diff > This is a hack around hanging forever in fread()ing from a stream > opened on a network socket connected to a dead remote end. > If remote queued hangs (it's alive, but stalled) for some unknown reason > (this does happen rather often!), it still has network port opened > in `listen' state, i.e. it does accept connections > (as this stage of connection establishing is done by kernel) > but is silent. And fread()ing from this connection hangs forever, > it does not time out (at least, I've failed to get it time out). > I had to use select() on the underlying socket to make it work reliably. > BTW, I have managed to trace this problem out and fix it only due to > --debug and --verbose flags and trace messages > added by the above patches. > > queue-1.30.1.skip-1e06-la.diff > If all queued's are rejecting jobs (e.g. if they all are dead or deaf), > the hosts list will contain only hosts with 1e08 (and alike) > loadaverages, designating that those queues are down, > but wakeup() will still try to connect those queued's... > So it's worth skipping all non-willing-to-serve hosts. > > Besides that I would like to ask you to move `profile' config files > from spool directories to more appropriate place like, > e.g. /etc/queue/ or /usr/etc/. And please consider adding ``DESTDIR'' style > to ``local'' installation rules in Makefile.am. > > Best regards. > > QingLong. > > |
From: QingLong <qin...@Bo...> - 2001-02-08 09:40:44
Attachments:
queue-1.30.1.reject-jobs-on-inactive-queue.diff
|
> > I have submitted them to the queue-developers list, so they are available. > (I also wrote a reply on queue-developers.) > I regret to say that I had sent you an incorrect version of `reject-jobs-on-inactive-queue' patch. That wasn't the final one. I am sorry. I send you correct version now (see attachment). > >> >> queue-1.30.1.reject-jobs-on-inactive-queue.diff >> Makes `queued' reject jobs on inactive (`exec off' or `exec drain') >> queues. Without this patch jobs can go to hosts which do not run >> this queue, thus effectively hanging forever. >> BTW, I consider method of signalling job rejection (returning magic >> loadaverage value) as lame, this obvoiusly is design flaw. >> > BR. QingLong. |