queue-developers Mailing List for GNU Queue (Page 8)
Brought to you by:
wkrebs
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(8) |
Jun
(4) |
Jul
(4) |
Aug
(25) |
Sep
(9) |
Oct
(4) |
Nov
(4) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(15) |
Feb
(31) |
Mar
(26) |
Apr
(44) |
May
(39) |
Jun
(3) |
Jul
|
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
|
2003 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(5) |
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(9) |
Jun
(9) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: W. G. K. <wer...@ya...> - 2001-02-22 13:59:13
|
Quoting Gert Van den Eynde <gvd...@sc...>: > Dear users/developers of Queue, > > The last few days I've been experimenting with Queue on a five node Linux > cluster (SuSE 7.0 out of the box *without* installing the Queue from SuSE 7.0). > I have encountered several problems. I've been browsing the maillist for more > information, maybe I missed it... > > * using the latest release 1.30.1 (after fixing the RLIMIT bug as mentioned in > the bugtrack and compiling), all seemed to go well. But... when I submitted a > large number of jobs to one queue (exceeding the total sum of different maxexec > on the nodes, so a couple of the jobs had to wait for a 'free slot'), I > observed that Queue fills up all queues on the machines very nice and puts the > other jobs on hold. However, when a job has finished, the waiting jobs keep on > waiting. They do not get a free slot and when I query the queuestat files, it > says in one that they are running, but nothing is happening on that host. > > * I also tried to get the latest CVS going (queue-development). Something > strange is going on during configuration and compilation. After the usual > ./configure --enable-root and make, all seems to be well. When I do make > install, it starts reconfiguring (and effectively changing config.h), > recompiling and then the compilation breaks due to a missing cleanutent. It > seems that during the reconfigure the support for rxvt utmp was added, but the > file logging.c is not in the sources list in the makefile. The reconfiguration > also changed the install directories for queue. Before, queue queue's were in > /usr/local/var/queue, now they were to go in /usr/local/var/spool/queue. The > directory for the qhostfile changed from /usr/local/share to > /usr/local/share/queue. > > * I managed to fix the above to have it compile, no probs. When I start this > CVS queued without debugging option (on one host, just for testing, the other > nodes are down) and I submit jobs to the now queue (the classical hostname job > from the manual), I get emails like this: > > ---- > Date: Thu, 22 Feb 2001 11:41:50 +0100 > From: The Queue Daemon <ro...@fe...> > To: gvd...@fe... > Subject: batch queue_b on fermi: queued queued.c sendmail(): SENDMAIL: From: > "queued" SENDMAIL: To: > "gvdeynde" > > queued queued.c sendmail(): > SENDMAIL: From: "queued" > SENDMAIL: To: "gvdeynde" > ---- > > and using the verbose option from queue gives me this: > > --- > Requesting load average for queue "now" on host "fermi"... > The host "fermi"is not able to serve queue "now". > Failed to submit job in queue "now" to host "fermi". > --- > > However, if I start queued in debug mode (queued --debug), I get this from > queue --verbose .... > > --- > Requesting load average for queue "now" on host "fermi"... > Host "fermi" appears to be able to serve queue "now". > Ok, connecting to QueueD at it. > Trying "fermi"... > Going to submit job to queue "now" on host "fermi". > queue.c: main(): tty(in/out/err): 1 1 1. > queued handle.c handle(): going to try to run "hostname". > queued handle.c handle(): assembled full path: "/bin/hostname". > queued handle.c handle(): going to execve(/bin/hostname). > fermi > --- > > > My questions: > > - Is the 1.30.1 release still relevant (is there a patch to fix the apparant > hang of queued ?) This patch has been rolled into the CVS release. > - Is the developers version reliable (I know it's a developers version, but I > am aware of projects where it is best to stick to the developers version than > to the stable releases) ? Unfortunately, our project is in such a phase right now. It is probably easiest to figure out what is wrong with the CVS version and get this working, as opposed to playing with 1.30.1. Once this works a little better, I hope to roll out 1.30.2 from the CVS version anyway, so that 1.30.1 should become obsolete soon. (The other thing you could try would be the pre-1.20.1 releases.) > Thank you for your time and a very promising tool. I'm really looking forward > to using queue on our system... > > Gert Van den Eynde > SCK-CEN > Reactor Physics & Myrrha dept. > Neutronics Calculation Section > Belgium > > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Gert V. d. E. <gvd...@sc...> - 2001-02-22 11:16:42
|
Dear users/developers of Queue, The last few days I've been experimenting with Queue on a five node Linux cluster (SuSE 7.0 out of the box *without* installing the Queue from SuSE 7.0). I have encountered several problems. I've been browsing the maillist for more information, maybe I missed it... * using the latest release 1.30.1 (after fixing the RLIMIT bug as mentioned in the bugtrack and compiling), all seemed to go well. But... when I submitted a large number of jobs to one queue (exceeding the total sum of different maxexec on the nodes, so a couple of the jobs had to wait for a 'free slot'), I observed that Queue fills up all queues on the machines very nice and puts the other jobs on hold. However, when a job has finished, the waiting jobs keep on waiting. They do not get a free slot and when I query the queuestat files, it says in one that they are running, but nothing is happening on that host. * I also tried to get the latest CVS going (queue-development). Something strange is going on during configuration and compilation. After the usual ./configure --enable-root and make, all seems to be well. When I do make install, it starts reconfiguring (and effectively changing config.h), recompiling and then the compilation breaks due to a missing cleanutent. It seems that during the reconfigure the support for rxvt utmp was added, but the file logging.c is not in the sources list in the makefile. The reconfiguration also changed the install directories for queue. Before, queue queue's were in /usr/local/var/queue, now they were to go in /usr/local/var/spool/queue. The directory for the qhostfile changed from /usr/local/share to /usr/local/share/queue. * I managed to fix the above to have it compile, no probs. When I start this CVS queued without debugging option (on one host, just for testing, the other nodes are down) and I submit jobs to the now queue (the classical hostname job from the manual), I get emails like this: ---- Date: Thu, 22 Feb 2001 11:41:50 +0100 From: The Queue Daemon <ro...@fe...> To: gvd...@fe... Subject: batch queue_b on fermi: queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "gvdeynde" queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "gvdeynde" ---- and using the verbose option from queue gives me this: --- Requesting load average for queue "now" on host "fermi"... The host "fermi"is not able to serve queue "now". Failed to submit job in queue "now" to host "fermi". --- However, if I start queued in debug mode (queued --debug), I get this from queue --verbose .... --- Requesting load average for queue "now" on host "fermi"... Host "fermi" appears to be able to serve queue "now". Ok, connecting to QueueD at it. Trying "fermi"... Going to submit job to queue "now" on host "fermi". queue.c: main(): tty(in/out/err): 1 1 1. queued handle.c handle(): going to try to run "hostname". queued handle.c handle(): assembled full path: "/bin/hostname". queued handle.c handle(): going to execve(/bin/hostname). fermi --- My questions: - Is the 1.30.1 release still relevant (is there a patch to fix the apparant hang of queued ?) - Is the developers version reliable (I know it's a developers version, but I am aware of projects where it is best to stick to the developers version than to the stable releases) ? Thank you for your time and a very promising tool. I'm really looking forward to using queue on our system... Gert Van den Eynde SCK-CEN Reactor Physics & Myrrha dept. Neutronics Calculation Section Belgium |
From: QingLong <Qin...@bo...> - 2001-02-20 17:27:48
|
Hello! I've commited a few changes (3 bugfixes and 1 improvement/bugfix) to queue-development at sourceforge. ChangeLog notes follow: Made queued (handle.c) correctly assemble full path if the command is specified as relative path (./xxx or ../xxx). Also fixed a bug I have recently introduced in this same piece of code. Added a few trace messages (activated by --debug or --verbose). Fix: Made queue (queue.c) request loadaverage from remote host before trying to submit a job to it, if the host was explicitly specified on command line (-h or -H). This is necessary, as that host can turn out not to be running the given queue, if this is the case the submitted job will hang or sleep forever. Fix: (startjob() queued.c): a file for stdout redirection has been opened twice (thus, creating two fd's), once via creat(), and another time via open(). Although I fail to understand why it is so necessary to do the job twice, I added close() after creat() to make it work in accordance with comments (open fd 1 for /*stdout*/, and dup() it to fd 2 (rather than 3) for /*stderr*/). Changed QueueD behaviour in case of receiving a job on inactive queue. Made it just abort conection silently (queued.c check_query()). The queue protocol should have some way to return some error status indicator (and a descriptive error message) to client and to abort connection gracefully if client tries to do something unreasobnable or prohibited. Please give it a try. Thank you. QingLong. |
From: QingLong <qin...@Bo...> - 2001-02-19 16:42:22
|
Hello! New intermediate version of queue-development is available at sourceforge via CVS. I've integrated rxvt-2.6.3 utmp-handling code in queue, also have fixed a few bugs and have slightly cleaned the code. This is from ChangeLog: Imported utmp handling stuff from rxvt-2.6.3. (logging.*, rxvt.h) Added a few trace messages (activated by --debug or --verbose). Fixed a bug in batch-mode jobs handling code: queue has been trying to set tty line parameters (tcsetattr()) while in batch mode there has been no allocated tty. Now queue is aware of tty absence. (handle.c) Changed queue behaviour in debug-mode: now it does send mail. It only claims it (but actually doesn't) sends mail when running in debug-mode in foreground (--foreground). (queued.c) Fix: do not think that every path beginning with dot is a fullpath (current dir), do check if next char is slash. (handle.c) Fix: there has been strange condition: ``!ttyinput && ttyinput'' in queue.c, changed it to ``!ttyinput && ttyoutput''. Restructured ``wait'' code in waitforchild() (queued.c) to make it readable and to squeeze doubles. Added loop checking if wait() returned prematurely (EINTR) before actually looking for child exit. Please have a look. Thank you. QingLong. |
From: QingLong <qin...@Bo...> - 2001-02-14 17:33:32
|
Hello! I have merged all the patches I had sent to the list to queue-development at sourceforge.net. I have also merged some patches from ``patch manager''. Although I haven't merged grayraven's utmp stuff patch, as I am already trying to import more universal utmp/wtmp code from rxvt. Please try this development version of Queue. This is from ChangeLog: --- Merged patch #102850 by bmc (Ben Chad <bc...@uo...>) ``Return value bug for TRANSMIT_DEBUG in queued.c'' submitted to patch manager at SourceForge at 2000-Dec-14. --- Merged patch #102237 ``Bug #121017 patch'' by wkrebs (W.G.Krebs <wer...@ya...>) submitted at 2000-Nov-02. --- Moved ``rinning in foreground'' queued feature control from `--debug' to its own command line option `--foreground'. I think that having an ability to control run mode independently from `--debug' is very usefull for Queue now, as I consider it as not yet very stable and requiring contiguous debugging even during ``production'' run in background. Also renamed `-v' (--version) to `-V' in accordance with traditions, btw `-v' is traditionally used for `--verbose'. Also squeezed a couple of unnecessary repititions and slightly ``beautified'' `show usage' and `show version' code. --- A ``small integer used to index the q_rlimit array'' upper limit and RLIM_NLIMITS were confused in a couple of places in RLIMIT_* entities handling code. Fixed them. Also added a few new RLIMIT_*'s available in Linux (they all are properly wrapped in #ifdef's). This change overlaps with (actually covers it) patch #102826 ``queued.c patch for Redhat 7.0 resource limits'' submitted by bmc (Ben Chad <bc...@uo...>) at 2000-Dec-13. --- Skip hosts rejecting jobs while looking for the best one. If all queued's are rejecting jobs (e.g. if they all are dead/deaf), the hosts list will contain only hosts with 1e08 (and alike) loadaverages, designating that those queues are down, but wakeup() will still try to connect those queued's, as they turn out to be the best ones in this awfull case... So it's not at all worth taking such non-willing-to-serve hosts in consideraion. --- Hack around problem of hanging forever fread()ing from a stream opened on a network socket connected to a dead/deaf remote end. If remote queued hangs (it's alive, but stalled) for some unknown reason (this does happen rather often!), it still has network port opened in `listen' state, i.e. it does accept connections (as this stage of connection establishing is done by kernel) but is silent. And fread()ing from this connection hangs forever, it does not time out (at least, I failed to get it time out). I had to use select() on underlying socket to make it work reliably. --- Add ``--debug'' facility to queue (NOT queued). --- Trivial: add explicit type conversions to make g++ happy. --- Add `--verbose' (`-v' in GNU tradition) flag to `queue' and a few trace messages. This (and it's goal) is different from the `-DDEBUG' cpp flag, as `-DDEBUG' enables lots of insecure debug messages (like printing cookie values and so on) makeing debug-enabled `queue' binary useless for production installation. The `--verbose' flag is intended to print trace and debug info useful for an ordinary user without compromising system security. Also rename current `-v' (`--version') to `-V'. I believe most GNU programs have `-V' for `--version' and `-v' for `--verbose'. --- Make `queued' reject jobs on inactive (`exec off' or `exec drain') queues. Without this change jobs can go to hosts which do not run this queue, thus effectively hanging forever. --- This patch almost entirely consists of pty code borrowed from RXVT I just have slightly modified it to fit in to the `queue' environment. (I would also recommend to have a look at xterm's pty/tty code) --- Added trace messages about ongoing `connect()'s, `accept()'s and alikes to help traceing network-related problems. Moved debug related printing macros to separate header files. --- Fixed autoconf/automake and makefile stuff. I have also had to hack Makefile.am (heavily) and configure.in and profile.in to make the build/installation process relocatable (required for src.rpm and other source packages). To say the truth, the final destinations already were relocatable, but many package managing systems need to perform `fake root installation' usually somewhere in /tmp/ or /var/tmp/ to build binary packages. The latter is impossible with current code. I believe it's worth supporting prefixing installation paths by $(DESTDIR) or $(DESTROOT) scheme (as automake does in generated rules). Although I haven't added them, left it for possible discussion. BR, QingLong. |
From: QingLong <qin...@Bo...> - 2001-02-13 04:04:06
|
Hello! I just have commited automake/autoconf/make stuff patch, (I had posted it here recently) to queue-development at SourceForge's CVS. This is excerpt from the Changelog: Have hacked Makefile.am, configure.in and profile.in to make the build/installation process relocatable. This is required for src.rpm and other source packages to perform `faked root installation' usually somewhere in /tmp/ or /var/tmp/ to build a binary package from the source. Please take a look. This is my first experience with CVS at SourceForge, and I suspect I could miss something in the development cycle of this style. Point it out to me, please, so I should be able to do it better next time. Thank you. QingLong. |
From: QingLong <qin...@Bo...> - 2001-02-13 03:50:05
|
On Mon, Feb 12, 2001 at 06:13:39PM -0500, W. G. Krebs wrote: > > Are any of the other developers seeing this problem? > [...] > >> >> Date: Mon, 12 Feb 2001 13:24:48 +0300 >> From: QingLong <qin...@Bo...> >> To: "W. G. Krebs" <wer...@ya...> >> Subject: Re: Re: failed to commit changes [...] >> But I was just rejected from cvs.queue.sourceforge.net --- I've tried to >> 'slogin -v ...' there to track possible problems/erros on my end. >> Now I tend to think that the problem matter is at SourceForge. >> It looks like I am not the only one who faces this problem, >> there are similar reports (support requests) at support page: >> http://sourceforge.net/support/?group_id=1 >> ``SourceForge: Browse Support Requests By Status: Open''. >> >> I'll keep trying. >> > The problem just disappeared. I've changed nothing. Looks like it have been fixed by SourceForge staff. QingLong. |
From: W. G. K. <wer...@ya...> - 2001-02-12 23:18:33
|
Are any of the other developers seeing this problem? I recently committed changes to the CVS repository without any problem. However, it is possible that there could be a problem for mixed lowercase/uppercase in the sourceforge userid or something like that. You don't use :pserver or cvs login or anything like that --- that's for an unsecured cleartext login, which SourceForge doesn't allow. So the correct instructions should be something like: setenv CVS_RSH ssh1 cvs -z3 -dq...@cv...:/cvsroot/queue co queue-development If you haven't checked out; make changes, then "cvs ci" from the queue-development directory. If it was checked out as anonymous (as the distribution is) and you've already made changes, then you include the "-z3 and-d" stuff in the "cvs -z3 -d ... ci" command. Is anyone else having problems with this? If so, we should submit a support request to SourceForge and have them work on this. Thanks. |
From: W. G. K. <wer...@ya...> - 2001-02-11 00:16:15
|
Quoting "Timothy H. Keitt" <Tim...@su...>: > Just got queue up and running an a small cluster of linux boxen. Thanks > for the nice software. I used the Debian packages in the "testing" > distribution. > > A couple of things... > > 1) The documentation suggests that a shared NFS directory is required, > but its not there and everything seems to work. (Out of date?) The documentation has been updated in the CVS repository. Basically, it's just a matter of getting some submitted patches rolled in and tested by the users, and then 1.30.2 will go out. (I'm reluctant to let 1.30.2 go out without rolling in the various patches.) [Developers take note: you'll want to do a "cvs update" to grab this change.] > 2) There appears to be no way to kill batched jobs (except by finding > the host and killing it manually). There are two non-functional > programs "task_manager" and "task_control" that I imagine are for that > purpose. Scanning the CVS archive, it appears these were compiled in > the Debian packages with #NO_TASK_MANAGER (or something) defined which > just produces an empty(!) main function. Perhaps it might be nice to > put 'printf("%s\n", "Not compiled with task_manager support")' in > there? Anyway, it would rather nice to be able to list the current jobs > and kill them as needed. This is a good idea. I have added a message to the *.cc programs so that, when queue_manager is not compiled in, it displays a message to that effect and how to re-compile with this support compiled in. > 3) The home page link on the sourceforge development site is a dead link. register.com, the registrar for gnuqueue.org, updated their software, and now gnuqueue.org is a deadlink (including some pages on SourceForge that try to reference things through gnuqueue.org). Register.com says the problem will be fixed within six hours (by 2AM EST) ... we'll see. > 4) I noticed some (possibly) odd behavior where some jobs when > backgrounded would migrate back to the original host (perhaps a feature?). Could also be a bug; you might want to try applying one of the various patches that have floating around .... > All for now. Thanks again. > > Cheers, > Tim > > -- > Timothy H. Keitt > Department of Ecology and Evolution > State University of New York at Stony Brook > Phone: 631-632-1101, FAX: 631-632-7626 > http://life.bio.sunysb.edu/ee/keitt/ > |
From: W. G. K. <wer...@ya...> - 2001-02-10 18:23:49
|
Quoting QingLong <qin...@Bo...>: > > > > Obviously, there are a number of patches, so it would be good > > if someone would look through all of these and iron out any conflicts. > > > I can try if you let me. > I do have some experience with CVS. > But I am afraid I'll not be able to test all the changes introduced > by the patches waiting to be applied against a wide variety of system > platforms. I only have Linux (I also have openbsd, ultrix, tru64 and irix6, > but they aren't worth even trying on due to awfull development environment). That's OK. If you patch the CVS repository in a reasonable way (i.e., it compiles and runs on Linux) others will test the code for other systems once you announce the changes. The point of CVS is that it lowers the barriers to getting the latest, patched source, so many more "eyeballs" can test each new version and find the bugs on their platforms, thus accelerating bugfixes and overall development. > > Obviously, if you are uncomfortable with the CVS archive, > > > Although I am not very comfortable with CVS > (that's due to CVS disabilities), I can stay with it. > > > > > Let me know what your Sourceforge username is (sourceforge is at > > http://www.gnuqueue.org) and I'll enable you for write access to the CVS > > repository so that you can apply your patches. > > > Ok, finally I've managed to reach it via https and register myself. > My Sourceforge username is QingLong (surprised? :). > > QingLong. Ok, I've listed you as a sourceforge developer, with write access to the CVS archives. (You also have some permissions over the patch manager should you care to exercise this to mark patches you merged into the repository as "closed.") Contrary to your previous posting, this is a very select group! The others are all programmers who requested write access via the FSF (some at the request of RMS) or long-time Queue developers who were grand-fathered in. Note that, it seems my registration services DNS server is down right now, so the URL (temporarily) is http://queue.sourceforge.net rather than http://www.gnuqueue.org . (This must be affecting a lot of sites.) So, to access CVS, you have to have ssh1 (secure shell version 1, get it from http://www.ssh.com . The source is free for Unix systems.) For developers with write-access: You do a "setenv CVS_RSH ssh1" to the path of ssh1 or whatever is appropriate for your system. "cvs -z3 -d:pserver:ano...@cv...:/cvsroot/queue co queue-development" Enter your SourceForge password when prompted for a password. (You can get rid of the constant need to enter your password by pasting an ssh certificate into a form at Sourceforge. But, this is for advanced ssh users.) Make changes to the source in queue-development, including logging them in "ChangeLog". When you are read to commit, just type "cvs ci" from the queue-development directory, make a brief log entry into the editor it brings up, save, enter password. Then post a message to queue-developers saying there is a new release on CVS, and then can download it by just running "cvs update" from their GNU Queue directory. Complete CVS docs are on http://sourceforge.net/cvs/?group_idV05 ----- End forwarded message ----- |
From: W. G. K. <wer...@ya...> - 2001-02-10 15:52:15
|
Quoting QingLong <qin...@Bo...>: > On Fri, Feb 09, 2001 at 02:28:04PM -0500, W. G. Krebs wrote: > > > > The ways things are currently going, 1.30.2 probably won't see light > > before mid-March. > > > > One thing that would really help speed up this process > > if the various patches you and others have submitted to queue-developers, > > and to the patch manager on www.gnuqueue.org, > > were merged into the CVS archive. > [...] > > Basically, you just download the distribution from cvs, > > apply the patches (using patch) and then do a "cvs ci" > > to put your new version into the archive. > > Then, you just post a message to "queue-developers" > > saying you've merged in your latest patch, > > and the patched files can be downloaded from the CVS repository. > > > This scheme only works for those who have write-enabled access > to the CVS repository. And that's really dangerous to give everybody > (who wills) such rights. I think I can handle it. It's only a few people who have it at any one time (you're being invited to join a carefully selected, elite group :D ), and the changes made are heavily logged. It's easy for me to undo changes, and to see exactly what those changes were. This is how Open Source development works on SourceForge. There's less trust in the CVS distribtion than in the real, relased file based. But, users who want the latest and greatest can immediately download the latest CVS source, and thus development is much faster. |
From: QingLong <qin...@Bo...> - 2001-02-10 14:34:59
|
On Fri, Feb 09, 2001 at 02:28:04PM -0500, W. G. Krebs wrote: > > The ways things are currently going, 1.30.2 probably won't see light > before mid-March. > > One thing that would really help speed up this process > if the various patches you and others have submitted to queue-developers, > and to the patch manager on www.gnuqueue.org, > were merged into the CVS archive. [...] > Basically, you just download the distribution from cvs, > apply the patches (using patch) and then do a "cvs ci" > to put your new version into the archive. > Then, you just post a message to "queue-developers" > saying you've merged in your latest patch, > and the patched files can be downloaded from the CVS repository. > This scheme only works for those who have write-enabled access to the CVS repository. And that's really dangerous to give everybody (who wills) such rights. QingLong. |
From: W. G. K. <wer...@ya...> - 2001-02-09 19:33:22
|
Yes, we're here. The ways things are currently going, 1.30.2 probably won't see light before mid-March. One thing that would really help speed up this process if the various patches you and others have submitted to queue-developers, and to the patch manager on www.gnuqueue.org, were merged into the CVS archive. That way, it would be easy for people to download the latest, patched distribution. It's actually very easy to apply the patches to CVS distribution; the instructions are on the web. Basically, you just download the distribution from cvs, apply the patches (using patch) and then do a "cvs ci" to put your new version into the archive. Then, you just post a message to "queue-developers" saying you've merged in your latest patch, and the patched files can be downloaded from the CVS repository. Obviously, there are a number of patches, so it would be good if someone would look through all of these and iron out any conflicts. But even your patches alone, applied to the CVS archive, would be very helpful. Let me know what your Sourceforge username is (sourceforge is at http://www.gnuqueue.org) and I'll enable you for write access to the CVS repository so that you can apply your patches. Obviously, if you are uncomfortable with the CVS archive, I don't want to discourage you from continuing to submit your patches to queue-developers (this is always a good idea, even if you use CVS, so that people know what has changed). But, if you do agree to do this, it will really help all of us alot by making it much quicker to get feedback on the patched versions, and get 1.30.2 out much sooner. Thanks. Quoting QingLong <qin...@Bo...>: > Hello! > Anybody alive here? > > I submit to your consideration two patches (see attachment): > > queue-1.30.1.rlimits-bug-fix.diff > A ``small integer used to index the q_rlimit array'' upper limit > and RLIM_NLIMITS were confused in a couple of places in RLIMIT_* > entities handling code. This patch fixes this bug. > It also adds a few new RLIMIT_*'s available in Linux > (they all are properly wrapped in #ifdef's). > > queue-1.30.1.queued--foreground.diff > This patch moves ``rinning in foreground'' feature control from `--debug' > to its own command line option `--foreground'. > I think that having an ability to control run mode independently > from `--debug' is very usefull for Queue now, as I consider it > as not yet very stable and requiring contiguous debugging > even during ``production'' run in background. > This patch also renames `-v' (--version) to `-V' in accordance > with traditions, btw `-v' is traditionally used for `--verbose'. > The patch also squeezes a couple of unnecessary repititions > and slightly ``beatifies'' `show usage' and `show version' code. > > Please consider mergeing these changes in Queue. > > Thank you. > > QingLong. > |
From: QingLong <qin...@Bo...> - 2001-02-09 16:16:29
|
Hello! Anybody alive here? I submit to your consideration two patches (see attachment): queue-1.30.1.rlimits-bug-fix.diff A ``small integer used to index the q_rlimit array'' upper limit and RLIM_NLIMITS were confused in a couple of places in RLIMIT_* entities handling code. This patch fixes this bug. It also adds a few new RLIMIT_*'s available in Linux (they all are properly wrapped in #ifdef's). queue-1.30.1.queued--foreground.diff This patch moves ``rinning in foreground'' feature control from `--debug' to its own command line option `--foreground'. I think that having an ability to control run mode independently from `--debug' is very usefull for Queue now, as I consider it as not yet very stable and requiring contiguous debugging even during ``production'' run in background. This patch also renames `-v' (--version) to `-V' in accordance with traditions, btw `-v' is traditionally used for `--verbose'. The patch also squeezes a couple of unnecessary repititions and slightly ``beatifies'' `show usage' and `show version' code. Please consider mergeing these changes in Queue. Thank you. QingLong. |
From: QingLong <qin...@Bo...> - 2001-02-09 15:22:28
|
Hello! This is an excerpt from queued.c (``latest from sourceforge'' 1.30.1) in main() around lines 720--740: | | /* | * Go to sleep for a while before flooding the system with | * jobs, in case it crashes again right away, or the | * system manager wants to prevent jobs from running. | * Send a SIGALRM to give it a kick-start. | */ | | if (!debug) { | alarm(sleeptime); | | /* WGK: Rather than do a sigpause(), here, we do a check_query | here, which will cause us to wake up immediately if someone | submits a new job in the first few minutes. This could cause | the batchd to flood the system with new jobs in the event of an | immediate query, but is unlikely to cause any real problems.*/ | | check_query(); | | (void) alarm(0); | } | Well, I think I understand the author intention and I consider the idea of having a sleep before working as reasonable. But there is a problem with ``immediate job submission'': queued fails to get new jobs during this sleeping, the server host closes accepted connections almost immediately after establishing them and client host fails to read loadaverage from this server host. It looks like this alarm-games confuse incoming connections handling code. I tried a dozen of times to run queued both with and without the above piece of code and found out that correlation with the described behaviour is 100%. So I would like to ask: is this behaviour correct? If not, then why not replace that code by proper select()? Thank you. Wasilx. |
From: QingLong <qin...@Bo...> - 2001-02-08 09:40:44
|
> > I have submitted them to the queue-developers list, so they are available. > (I also wrote a reply on queue-developers.) > I regret to say that I had sent you an incorrect version of `reject-jobs-on-inactive-queue' patch. That wasn't the final one. I am sorry. I send you correct version now (see attachment). > >> >> queue-1.30.1.reject-jobs-on-inactive-queue.diff >> Makes `queued' reject jobs on inactive (`exec off' or `exec drain') >> queues. Without this patch jobs can go to hosts which do not run >> this queue, thus effectively hanging forever. >> BTW, I consider method of signalling job rejection (returning magic >> loadaverage value) as lame, this obvoiusly is design flaw. >> > BR. QingLong. |
From: W. G. K. <wer...@ya...> - 2001-02-04 20:48:23
|
Quoting Kai Harrekilde-Petersen <kh...@ex...>: > W. G. Krebs wrote: > > > Here is another patch that was submitted --- haven't yet had > > a chance to test this. > > Well, as the author of that patch I can say: Yes, it fixes the > compilation problems on Linux 2.4, but it seems as queue doesn't > work (as advertised :) on 2.4 [snip] > If desired, I can test patches. I have some ideas for improving > the setup of the cluster & the queues, based on our (Exbit's) setup > and requirements. What I would need would be for someone to test the various patches that have been submitted to queue-developers and the SourceForge patch manager (http://www.gnuqueue.org), figure out which ones work, and then apply them against the CVS repository (which requires being approved for write access to the CVS repository, which isn't too hard to get if you're willing to publically volunteer.) Otherwise, my real-world job comes first, of course, (it is very difficult trying to retire off of Open Source software!), so it will probably be month or two before I have time to test the various patches, figure out which ones work, apply them, and then roll out 1.30.2. > Regards, > > > Kai > -- > Kai Harrekilde-Petersen <kh...@ex...> Exbit Technology A/S > > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Kai Harrekilde-P. <kh...@ex...> - 2001-02-01 07:09:52
|
W. G. Krebs wrote: > Here is another patch that was submitted --- haven't yet had > a chance to test this. Well, as the author of that patch I can say: Yes, it fixes the compilation problems on Linux 2.4, but it seems as queue doesn't work (as advertised :) on 2.4 Basically, what I did was to install queue as non-root, on four identical machines, add "queue -i -w -n --" to all commands in a Makefile, and fire off make to see what happended. Although queue did try to process the commands spawned by make, they were not done "interactively", and they certainly didn't get executed right (possibly not at all). After some further struggling with queue, I have pushed it aside as not being mature for our (production) use, and gone in search of alternatives. [I've tried LSF and it seems over-heavy (and expensive!) for our use. Condor and GridWare is next on the list] If desired, I can test patches. I have some ideas for improving the setup of the cluster & the queues, based on our (Exbit's) setup and requirements. Regards, Kai --=20 Kai Harrekilde-Petersen <kh...@ex...> Exbit Technology A/S |
From: W. G. K. <wer...@ya...> - 2001-01-31 20:05:26
|
Here is another patch that was submitted --- haven't yet had a chance to test this. I suppose we should add these to the patch manager on http://www.gnuqueue.org, then I or one of the other managers there will sort through these, add these to the CVS repository, and then roll out 1.30.2 |
From: W. G. K. <wer...@ya...> - 2001-01-31 20:03:17
|
This would be due to some sort of oversight. I would never completely reject such a comprehensive set of patches. What seems to have happened is that you sent the patches to me in response to somone's queue-developers message (but to my email rather than the list), and I didn't realize that there was a patch at the end of the email. I suppose this is an argument in favor of the patch manager on http://www.gnuqueue.org , which will make patches instantly available to everyone. I suppose 1.30.2 is way overdue at this point, but I've been very busy these last few months with my real-world job trying to meet a very hard and fast deadline. Hopefully, I'll have a chance to look through some of the various patches that have been sent in and apply them towards the new release. In the meantime, if those of you with write access to the CVS repository would care to help me out by testing and applying the various patches to the repository, that would help me out alot and would get 1.30.2 out much sooner. QingLong wrote: > Hello! > > A while ago I've submitted you a set of patches representing changes > which I'd had to do to get queue-1.20.1 work reliably enough to be usable. > AFAICS you have not used them at all, so I try once again. > Please consider them, I hope you would find a few usefull bits there. > Thank You. > > I submit for your consideration a set of patches against queue-1.30.1, > most of them are adapted versions of patches I've already sent to you. > Please have a look at the attached queue-1.30.1.QL-hack.tar.gz file. > Some comments on individual files: > > queue.spec > A spec file for RPM. > > queue-1.30.1.pebkac-lart.diff > This patch fixes autoconf/automake and makefile stuff. > I have also had to hack `queue's Makefile.am (heavily) and configure.in > and profile.in to make the build/installation process relocatable > (required for src.rpm and other source packages). > To say the truth, the final destinations in original stuff > really are relocatable, but many package managing systems > need to perform `faked root installation' usually somewhere > in /tmp/ or /var/tmp/ to build binary packages. > The latter is impossible with current code. > I believe it's worth supporting prefixing installation paths by > $(DESTDIR) or $(DESTROOT) scheme. Although I haven't added them. > > queue-1.30.1.extra-trace-messages.diff > This one adds trace messages about ongoing `connect()'s, `accept()'s > and alikes to help traceing network-related problems. > > queue-1.30.1.ptty-support-code-borrowed-from-rxvt.diff > This patch almost entirely consists of pty code borrowed from RXVT > (I would also recommend you to have a look at xterm's pty/tty code). > I just have slightly modified it to fit it to the `queue' environment. > > queue-1.30.1.reject-jobs-on-inactive-queue.diff > Makes `queued' reject jobs on inactive (`exec off' or `exec drain') > queues. Without this patch jobs can go to hosts which do not run > this queue, thus effectively hanging forever. > BTW, I consider method of signalling job rejection (returning magic > loadaverage value) as lame, this obvoiusly is design flaw. > > queue-1.30.1.verbose.diff > This patch adds `--verbose' (`-v' in GNU tradition) flag to `queue' > and a few trace messages. This (and it's goal) is different from > the `-DDEBUG' cpp flag, as `-DDEBUG' enables lots of insecure > debug messages (like printing cookie values and so on) makeing > debug-enabled `queue' binary useless for production installation. > The `--verbose' flag is intended to print trace and debug info > useful for ordinary user without compromising system security. > > The patch also renames current `-v' (`--version') to `-V'. > I believe most GNU programs have `-V' for `--version' > and `-v' for `--verbose'. > > queue-1.30.1.const-char-2-char.diff > I've had to add explicit type conversion to mage g++ happy. > > queue-1.30.1.debug.diff > Adds --debug facility to queue (NOT queued). > > queue-1.30.1.reliable-connect-fread.diff > This is a hack around hanging forever in fread()ing from a stream > opened on a network socket connected to a dead remote end. > If remote queued hangs (it's alive, but stalled) for some unknown reason > (this does happen rather often!), it still has network port opened > in `listen' state, i.e. it does accept connections > (as this stage of connection establishing is done by kernel) > but is silent. And fread()ing from this connection hangs forever, > it does not time out (at least, I've failed to get it time out). > I had to use select() on the underlying socket to make it work reliably. > BTW, I have managed to trace this problem out and fix it only due to > --debug and --verbose flags and trace messages > added by the above patches. > > queue-1.30.1.skip-1e06-la.diff > If all queued's are rejecting jobs (e.g. if they all are dead or deaf), > the hosts list will contain only hosts with 1e08 (and alike) > loadaverages, designating that those queues are down, > but wakeup() will still try to connect those queued's... > So it's worth skipping all non-willing-to-serve hosts. > > Besides that I would like to ask you to move `profile' config files > from spool directories to more appropriate place like, > e.g. /etc/queue/ or /usr/etc/. And please consider adding ``DESTDIR'' style > to ``local'' installation rules in Makefile.am. > > Best regards. > > QingLong. > > |
From: Bill C. <bco...@po...> - 2001-01-31 16:29:19
|
I posted this to the sourceforge open discussion forum, but maybe I'll get more of a response here. I've installed queue-1.30.1 as follows: ./configure --enable-root make make install Running queued and testing with the following worked: queue -i -w -n -- hostname On a 2nd machine (node2) that mounts /usr/local via NFS (no_root_squash), queued was started as well and the hostname added to the qhostsfile. Running the example above only returned the hostname for the 1st machine (node1). I removed node1 from the qhostsfile and now I get "Broken pipe" when running the queue example on either node1 or node2. Likewise, if I leave both hostnames in the qhostsfile and specify node2 using -H I get the broken pipe error. I tried doing the installation steps on node2, but got the same result. If queued is not running on node2, queue will complain, so it is contacting node2. The machines are very similar in their configuration, some differences perhaps in security settings, and node2 mounts /usr/local from node1. queued -D is silent when the "Broken pipe" message appears. Any ideas for what would cause the "Broken pipe" message? Or things to try to get more information? My platform is Mandrake Linux 7.2, kernel 2.2.17-21. thanks! bill -- Bill Comisky bco...@po... |
From: QingLong <qin...@Bo...> - 2001-01-31 15:00:10
|
Hello! A while ago I've submitted you a set of patches representing changes which I'd had to do to get queue-1.20.1 work reliably enough to be usable. AFAICS you have not used them at all, so I try once again. Please consider them, I hope you would find a few usefull bits there. Thank You. I submit for your consideration a set of patches against queue-1.30.1, most of them are adapted versions of patches I've already sent to you. Please have a look at the attached queue-1.30.1.QL-hack.tar.gz file. Some comments on individual files: queue.spec A spec file for RPM. queue-1.30.1.pebkac-lart.diff This patch fixes autoconf/automake and makefile stuff. I have also had to hack `queue's Makefile.am (heavily) and configure.in and profile.in to make the build/installation process relocatable (required for src.rpm and other source packages). To say the truth, the final destinations in original stuff really are relocatable, but many package managing systems need to perform `faked root installation' usually somewhere in /tmp/ or /var/tmp/ to build binary packages. The latter is impossible with current code. I believe it's worth supporting prefixing installation paths by $(DESTDIR) or $(DESTROOT) scheme. Although I haven't added them. queue-1.30.1.extra-trace-messages.diff This one adds trace messages about ongoing `connect()'s, `accept()'s and alikes to help traceing network-related problems. queue-1.30.1.ptty-support-code-borrowed-from-rxvt.diff This patch almost entirely consists of pty code borrowed from RXVT (I would also recommend you to have a look at xterm's pty/tty code). I just have slightly modified it to fit it to the `queue' environment. queue-1.30.1.reject-jobs-on-inactive-queue.diff Makes `queued' reject jobs on inactive (`exec off' or `exec drain') queues. Without this patch jobs can go to hosts which do not run this queue, thus effectively hanging forever. BTW, I consider method of signalling job rejection (returning magic loadaverage value) as lame, this obvoiusly is design flaw. queue-1.30.1.verbose.diff This patch adds `--verbose' (`-v' in GNU tradition) flag to `queue' and a few trace messages. This (and it's goal) is different from the `-DDEBUG' cpp flag, as `-DDEBUG' enables lots of insecure debug messages (like printing cookie values and so on) makeing debug-enabled `queue' binary useless for production installation. The `--verbose' flag is intended to print trace and debug info useful for ordinary user without compromising system security. The patch also renames current `-v' (`--version') to `-V'. I believe most GNU programs have `-V' for `--version' and `-v' for `--verbose'. queue-1.30.1.const-char-2-char.diff I've had to add explicit type conversion to mage g++ happy. queue-1.30.1.debug.diff Adds --debug facility to queue (NOT queued). queue-1.30.1.reliable-connect-fread.diff This is a hack around hanging forever in fread()ing from a stream opened on a network socket connected to a dead remote end. If remote queued hangs (it's alive, but stalled) for some unknown reason (this does happen rather often!), it still has network port opened in `listen' state, i.e. it does accept connections (as this stage of connection establishing is done by kernel) but is silent. And fread()ing from this connection hangs forever, it does not time out (at least, I've failed to get it time out). I had to use select() on the underlying socket to make it work reliably. BTW, I have managed to trace this problem out and fix it only due to --debug and --verbose flags and trace messages added by the above patches. queue-1.30.1.skip-1e06-la.diff If all queued's are rejecting jobs (e.g. if they all are dead or deaf), the hosts list will contain only hosts with 1e08 (and alike) loadaverages, designating that those queues are down, but wakeup() will still try to connect those queued's... So it's worth skipping all non-willing-to-serve hosts. Besides that I would like to ask you to move `profile' config files from spool directories to more appropriate place like, e.g. /etc/queue/ or /usr/etc/. And please consider adding ``DESTDIR'' style to ``local'' installation rules in Makefile.am. Best regards. QingLong. |
From: Bill C. <bco...@po...> - 2001-01-31 01:03:50
|
I posted this to the sourceforge open discussion forum, but maybe I'll get more of a response here. I've installed queue-1.30.1 as follows: ./configure --enable-root make make install Running queued and testing with the following worked: queue -i -w -n -- hostname On a 2nd machine (node2) that mounts /usr/local via NFS (no_root_squash), queued was started as well and the hostname added to the qhostsfile. Running the example above only returned the hostname for the 1st machine (node1). I removed node1 from the qhostsfile and now I get "Broken pipe" when running the queue example on either node1 or node2. Likewise, if I leave both hostnames in the qhostsfile and specify node2 using -H I get the broken pipe error. I tried doing the installation steps on node2, but got the same result. If queued is not running on node2, queue will complain, so it is contacting node2. The machines are very similar in their configuration, some differences perhaps in security settings, and node2 mounts /usr/local from node1. queued -D is silent when the "Broken pipe" message appears. Any ideas for what would cause the "Broken pipe" message? Or things to try to get more information? My platform is Mandrake Linux 7.2, kernel 2.2.17-21. thanks! bill -- Bill Comisky bco...@po... |
From: Eric D. <eri...@co...> - 2001-01-19 15:13:00
|
Mark, I started working on Solaris/Linux cross-platform Queue support several months ago during some down time. As far as I know, this is still partially broken. Queue now properly handles passing jobs between big/little endian systems (the first problem), but the remaining problem (that I'm aware of) is that the terminal settings are passed from "client" to "server" via the job file in the native format of the submitting machine. When this structure is read by the execution machine, it will fail if it is not the same platform (the termios structure varies greatly among the Unix platforms). The general solution that I had in mind was on the client to map all local termios settings to a common structure, or even a simple ASCII string, and use this in the job structure. The server machine would then take the information and populate its native termios structure with this data when parsing the job file. Hopefully I'll have some more time to spend on this over the next several months. Eric Deal eri...@co... >--0__=882569D90024C2EF8f9e8a93df938690918c882569D90024C2EF >Content-type: text/plain; charset=us-ascii > > >Monica - > >Perhaps you can help me. I had to take the queue manager and >related components out so that I could configure Queue on >Solaris 2.7. Otherwise, when I ran configure, it would >just fail, complaining that I was running a cross-compiler. >Has anyone else had this problem? Has anyone successfully >built Queue on Solaris2.7? I'd be very interested as well, >if someone has been able to interoperate between Solaris >and Linux with Queue. > >- Mark > >Monica Lau wrote: >> >> On Tue, 16 Jan 2001, Federico Ardanaz wrote: >> >> > 1) task_control seems to do nothing at all!? >> >> Are you running the task_manager program on each machine as where the >> queue daemons are running? >> >> I realize that some of the programs may not be working correctly. My >> apologies, but progress is rather slow at the moment. I've just updated >> W. Krebs with the new patches. Hopefully, they'll be up for people to >> download soon. >> >> These are the necessary steps to run the programs: >> >> 1) There needs to be a "my_qdir" subdirectory within the standard queue >> directory. The queue_manager uses the my_qdir directory to store certain >> files. All of these files, except for one (the "licenses" file) gets >> created by the queue_manager. >> >> 2) The "licenses" file needs to be in the my_qdir subdirectory. It >stores >> the total number of licenses that users are allowed to use per license >> feature (ie, 10 matlab licenses). In the updated patches, there is a >> default license called "dummylicense" so that users are not required to >> specify a license(s) in order to run a job, ie, if they just want to do >> something like "queue -- ./a.out". However, note that the number of >> dummylicenses would limit the total number of jobs that users can >> run. Users can change this number if they want. In the current >programs, >> I believe that users do have to specify a license. >> >> 3) In order to view what jobs are running and where they are running, >> users simply have to look at the "status" file within the my_qdir >> directory. Just a simple "cat my_qdir/status" will do. In the updated >> patches, the queue_manager updates this status file quite often. >> >> 4) Some of the variables in the queue_define.h file needs to be updated >> before compilation. QMANAGERHOST -- change the host name of this >variable >> to be the name of the server where the queue_manager will be running >> on. Also, be sure to update the new directory paths of the variables >> QDIR, AVAILHOSTS, ..., TEMPFILE. >> >> 5) In order for the task_control program to work correctly, the >> task_manager program must be running on each server where each queue >> daemon is running. >> >> Please let me know if anything is unclear or if there are any problems. >I >> hope this helps! >> >> Regards, >> Monica Lau >> >> > 2) How can I remove (qdel in NQS) batch jobs? >> > 3) How can I know how many jobs are running and where? >> > >> > Federico Ardanaz >> > >> > _______________________________________________ >> > Queue-developers mailing list Que...@li... >> > To unsubscribe, subscribe, or set options: >> > http://lists.sourceforge.net/lists/listinfo/queue-developers >> > >> >> _______________________________________________ >> Queue-developers mailing list Que...@li... >> To unsubscribe, subscribe, or set options: >> http://lists.sourceforge.net/lists/listinfo/queue-developers >(See attached file: markd.vcf) >--0__=882569D90024C2EF8f9e8a93df938690918c882569D90024C2EF >Content-type: application/octet-stream; > name="markd.vcf" >Content-Disposition: attachment; filename="markd.vcf" >Content-transfer-encoding: base64 > >YmVnaW46dmNhcmQgDQpuOkRlbm5pO01hcmsNCnRlbDtmYXg6KDQwOCkgNzE5LTQ4MDANCnRlbDt3 >b3JrOig0MDgpIDcxOS00NzQyDQp4LW1vemlsbGEtaHRtbDpGQUxTRQ0Kb3JnOlJlZFN3aXRjaA0K >YWRyOjs7MTgxNSBNY0NhbmRsZXNzIERyaXZlO01pbHBpdGFzO0NBOzk1MDM1LTgwNDY7VVNBDQp2 >ZXJzaW9uOjIuMQ0KZW1haWw7aW50ZXJuZXQ6bWFya2RAcmVkc3dpdGNoLmNvbQ0KdGl0bGU6U3Iu >IFVOSVggU3lzdGVtcyBBZG1pbmlzdHJhdG9yDQp4LW1vemlsbGEtY3B0OjsyNTE2MA0KZm46TWFy >ayBEZW5uaQ0KZW5kOnZjYXJkDQo= > >--0__=882569D90024C2EF8f9e8a93df938690918c882569D90024C2EF-- > > |
From: W. G. K. <wer...@ya...> - 2001-01-19 14:53:47
|
This is a bug in the ./configure script, or, in reality, in the GNU Autoconf program. Queue_manager requires C++, so I had to add this to the configure.in setup. Unfortunately, GNU Autoconf does not seem to work correctly with C++/g++ on some platforms, concluding that the C++ compiler is a cross-compiler. I suppose we'll have to write the folks at autoconf to ask us for help with this problem, hence the CC line. [Note that queue-developers is spam-proofed by requiring subscriptions, so folks from bug-autoconf should just reply to the CC addresses. Thanks.] Quoting Mark Denni <ma...@re...>: > Monica - > > Perhaps you can help me. I had to take the queue manager and > related components out so that I could configure Queue on > Solaris 2.7. Otherwise, when I ran configure, it would > just fail, complaining that I was running a cross-compiler. > Has anyone else had this problem? Has anyone successfully > built Queue on Solaris2.7? I'd be very interested as well, > if someone has been able to interoperate between Solaris > and Linux with Queue. > > - Mark > > Monica Lau wrote: > > > > On Tue, 16 Jan 2001, Federico Ardanaz wrote: > > > > > 1) task_control seems to do nothing at all!? > > > > Are you running the task_manager program on each machine as where the > > queue daemons are running? > > > > I realize that some of the programs may not be working correctly. My > > apologies, but progress is rather slow at the moment. I've just updated > > W. Krebs with the new patches. Hopefully, they'll be up for people to > > download soon. > > > > These are the necessary steps to run the programs: > > > > 1) There needs to be a "my_qdir" subdirectory within the standard queue > > directory. The queue_manager uses the my_qdir directory to store certain > > files. All of these files, except for one (the "licenses" file) gets > > created by the queue_manager. > > > > 2) The "licenses" file needs to be in the my_qdir subdirectory. It stores > > the total number of licenses that users are allowed to use per license > > feature (ie, 10 matlab licenses). In the updated patches, there is a > > default license called "dummylicense" so that users are not required to > > specify a license(s) in order to run a job, ie, if they just want to do > > something like "queue -- ./a.out". However, note that the number of > > dummylicenses would limit the total number of jobs that users can > > run. Users can change this number if they want. In the current programs, > > I believe that users do have to specify a license. > > > > 3) In order to view what jobs are running and where they are running, > > users simply have to look at the "status" file within the my_qdir > > directory. Just a simple "cat my_qdir/status" will do. In the updated > > patches, the queue_manager updates this status file quite often. > > > > 4) Some of the variables in the queue_define.h file needs to be updated > > before compilation. QMANAGERHOST -- change the host name of this variable > > to be the name of the server where the queue_manager will be running > > on. Also, be sure to update the new directory paths of the variables > > QDIR, AVAILHOSTS, ..., TEMPFILE. > > > > 5) In order for the task_control program to work correctly, the > > task_manager program must be running on each server where each queue > > daemon is running. > > > > Please let me know if anything is unclear or if there are any problems. I > > hope this helps! > > > > Regards, > > Monica Lau > > > > > 2) How can I remove (qdel in NQS) batch jobs? > > > 3) How can I know how many jobs are running and where? > > > > > > Federico Ardanaz > > > > > > _______________________________________________ > > > Queue-developers mailing list Que...@li... > > > To unsubscribe, subscribe, or set options: > > > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > > > > > _______________________________________________ > > Queue-developers mailing list Que...@li... > > To unsubscribe, subscribe, or set options: > > http://lists.sourceforge.net/lists/listinfo/queue-developers |