queue-developers Mailing List for GNU Queue (Page 4)
Brought to you by:
wkrebs
You can subscribe to this list here.
2000 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(8) |
Jun
(4) |
Jul
(4) |
Aug
(25) |
Sep
(9) |
Oct
(4) |
Nov
(4) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(15) |
Feb
(31) |
Mar
(26) |
Apr
(44) |
May
(39) |
Jun
(3) |
Jul
|
Aug
(3) |
Sep
(1) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
(6) |
Sep
|
Oct
|
Nov
|
Dec
|
2003 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(5) |
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2005 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
(9) |
Jun
(9) |
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2006 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Mike C. <da...@ix...> - 2001-05-10 22:16:34
|
On Thu, May 10, 2001 at 09:52:56AM -0700, Cyril Bortolato wrote: > Mike, > this is discussed in the Support Requests on sourceforge, request ID 415221. So I discovered. With a nice "*" next to it indicating it is a problem over 30 days old. For over 30 days one hasn't been able to run queue without the -D option and it still hasn't been fixed? And release candidates are being generated? Make sure all of those issues are addressed before even starting to look at releasing something. (Either solved, as a basic functionlike this one should be! Or comment on them at least.) mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Cyril B. <cyr...@ya...> - 2001-05-10 16:53:21
|
Mike, this is discussed in the Support Requests on sourceforge, request ID 415221. Cyril --- Mike Castle <da...@ix...> wrote: > queued will not work without the -D option. > > It will run, but no submissions can be made: __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ |
From: Mike C. <da...@ix...> - 2001-05-10 07:39:15
|
On Wed, May 09, 2001 at 11:02:06PM -0700, Mike Castle wrote: > queued will not work without the -D option. > > It will run, but no submissions can be made: Ok. Reading the stuff on sourceforge, I found someone else making this same complaint, with more analysis. The following change makes it possible to run queued without the -D option: Index: queued.c =================================================================== RCS file: /cvsroot/queue/queue-development/queued.c,v retrieving revision 1.47 diff -u -r1.47 queued.c --- queued.c 2001/05/09 22:35:51 1.47 +++ queued.c 2001/05/10 07:25:35 @@ -954,7 +954,7 @@ systemflag = 0; /* Clear HP-UX "system" call flag */ -#if 1 +#if 0 /* * Go to sleep for a while before flooding the system with * jobs, in case it crashes again right away, or the However, now I'm getting all sorts of annoying emails to root: Date: Thu, 10 May 2001 00:24:09 -0700 To: ro...@mr... From: The Queue Daemon <ro...@mr...> Subject: queued error on thune.mrc-home.org: 'now/efm799285650': fchown(1, 501, +100) failed: Bad file descriptor 'now/efm799285650': fchown(1, 501, 100) failed: Bad file descriptor The follow block of code seems abandoned, even as far back as 1.12.8: from queued.c: if( fchown(1, pw->pw_uid, pw->pw_gid) == -1 ){ mperror3("'%s': fchown(1, %d, %d) failed", fname, pw->pw_uid, pw->pw_gid ); /* no exit; just keep going */ } Bracketing it with #if 0 / #endif seems to have no immediate ill effects. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 06:02:08
|
queued will not work without the -D option. It will run, but no submissions can be made: nexus@thune[9:01pm]src/queue/queue-1.40.1beta(545) queue -i -w -n -- hostname Queue: Failed to submit job to queue "now". Running with -D on two machines, using the default "now" profile, and the following command: nexus@thune[9:04pm]src/zlib/zlib-1.1.3(546) cat ~/bin/mygcc #!/bin/sh queue -i -w -n -- mygcc2 "$@" #sqs-submit gcc "$@" nexus@thune[9:06pm]src/zlib/zlib-1.1.3(547) cat ~/bin/mygcc2 #!/bin/sh hostname gcc "$@" I get the following output everytime: nexus@thune[9:06pm]src/zlib/zlib-1.1.3(549) make CC=mygcc -j2 mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o example.o example.c mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o adler32.o adler32.c thune thune mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o compress.o compress.c mars mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o crc32.o crc32.c thune mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o gzio.o gzio.c thune mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o uncompr.o uncompr.c thune mygcc -fPIC -O3 -DHAVE_UNISTD_H -DUSE_MMAP -c -o deflate.o deflate.c thune /home/nexus/bin/mygcc: line 2: 426 Terminated queue -i -w -n -- mygcc2 "$@" make: *** [gzio.o] Error 143 make: *** Waiting for unfinished jobs.... (of course, which file it stops on is random). I see the following email: Date: Wed, 9 May 2001 21:04:21 -0700 To: ro...@mr... From: The Queue Daemon <ro...@mr...> Subject: queued error on mars.mrc-home.org: Can't +unlink(now/CFDIR/cfm799093728): No such file or directory Can't unlink(now/CFDIR/cfm799093728): No such file or directory Also, since queued can only run in debug mode, I keep getting all the damned annoying email messages: Date: Wed, 9 May 2001 21:07:13 -0700 To: ne...@mr... From: The Queue Daemon <ro...@mr...> Subject: batch queue_b on thune.mrc-home.org[28783]: queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "nexus" queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "nexus" Which is very annoying. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 01:06:27
|
Minor documentation nit: In profile, the following is stated: #Under /usr/spool/queue you may create several directories This should probably be subject to substitution on @queuedir@ as a few lines below are. Just to avoid any possible confusion. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 00:56:19
|
The info pages fail to install: install-info --infodir=/usr/info /usr/info/queue.info install-info: unrecognized option `--infodir=/usr/info' Try `install-info --help' for a complete list of options. Again, I had sent a previous message on this topic. The option is --info-dir, NOT --infodir. Please note the extra hyphen. The following patch is necessary: --- Makefile.am.orig Sun May 6 04:39:14 2001 +++ Makefile.am Wed May 9 17:54:04 2001 @@ -105,7 +105,7 @@ $(INSTALL) -o $(Queue_OWNER) -m 600 profile $(DEST_queuedir)/wait/ -$(INSTALL) -m 755 -d $(DEST_infodir) $(INSTALL) -m 644 doc/queue.info $(DEST_infodir)/ - -install-info --infodir=$(infodir) $(DEST_infodir)/queue.info + -install-info --info-dir=$(infodir) $(DEST_infodir)/queue.info -$(INSTALL) -m 755 -d $(DEST_mandir) -$(INSTALL) -m 755 -d $(DEST_mandir)/man1 $(INSTALL) -m 444 doc/queue.man $(DEST_mandir)/man1/queue.1 mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 00:39:03
|
On Wed, May 09, 2001 at 05:02:05PM -0700, Mike Castle wrote: > > NFS-shared Queue spool directory is "/usr/var/spool/queue" Oops... I missed also commenting on the following: Local queued process id file prefix is "/usr/var/run/queued.pid" I kind of like this bit from the ssh configure.in: PIDDIR="/var/run" AC_MSG_CHECKING(where to put sshd.pid) if test '!' -d $PIDDIR; then PIDDIR="$ETCDIR" fi AC_MSG_RESULT($PIDDIR) Could I request something similar in queue? If /var/run exists, go with that, else use ${localstatedir}/run. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 00:15:38
|
queued.c STILL will not compile out of the box. I sent email on this a few days back actually. As I stated before, this section in defines.h does not make sense: #ifdef TM_WITH_SYS_TIME #ifdef HAVE_TIME_H #include <time.h> #endif #endif If TM_WITH_SYS_TIME is true, then why are you trying to include time.h? I think the follow patch is appropriate: --- define.h.orig Wed Apr 11 08:29:29 2001 +++ define.h Wed May 9 17:12:07 2001 @@ -134,7 +134,7 @@ #include <syslog.h> #endif #include <termios.h> -#ifdef TM_WITH_SYS_TIME +#ifndef TM_WITH_SYS_TIME #ifdef HAVE_TIME_H #include <time.h> #endif mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 00:06:10
|
A second make still invokes cd . && autoheader. Since this is approaching release candidate stage, this should not occur, regardless on if the installer has automake installed or not (it may be an incompatible version). mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-05-10 00:02:08
|
NFS-shared Queue spool directory is "/usr/var/spool/queue" /usr/var/spool is not an appropriate place for this. /usr/var should *NOT* be nfs shared. All var name spaces should be reserved for localstate information. Instead, use /usr/com/queue. For reference, the GNU Coding Standard: `sharedstatedir' The directory for installing architecture-independent data files which the programs modify while they run. This should normally be `/usr/local/com', but write it as `$(prefix)/com'. (If you are using Autoconf, write it as `@sharedstatedir@'.) Btw, I recommend that the profiles NOT go into sharedstatedir, as they are, in general, read only files. Instead, they should go into into /usr/share/queue. (Actually I'd strongly suggest /usr/share/queue/profiles.) Doing a make now... mrc PS: Oh yeah, the above was build with --prefix=/usr, but even account for s+/usr+/usr/local+, the same arguments still hold. -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Werner G. K. <wer...@ya...> - 2001-05-09 23:18:51
|
Queue 1.40.1 beta is out on sourceforge, http:/www.gnuqueue.org . This is intended to be a major release and represents many months of work by a number of developers and contributors. (Most notably, QingLong, Monic Lau. I also made some contributions.) The "beta" designation reflects a growing concern with internal quality control. After some feedback we will hopefully be able to drop the "beta" designation and go for a full release. Hopefully, we'll be able to bring out CVS snapshots periodically as beta releases. This means the CVS repository version needs to work most of the time (I've probably been the sloppiest with this) and, most importantly, install well (I know this is important, so I've been pretty good with this; I've had to debug installation problems that have "appeared".) These sorts of issues are important to us. For example, the installation stuff involving the queue_manager stuff still needs a fair amount of work to make the code install in a portable manner. (This is a major reason why it is disabled by default.) The problem is most users won't give a package a second look if there are installation problems. So, we want to make we get these right. Although "beta", this should be a much better release than 1.30.1, which I've never been especially happy with. (Although it was up there for a long time, unfortunately, because I've been distracted and wasn't sure the CVS version was completely ready.) So I hope folks will have a look at it and report the bugs/installation problems before it turns into 1.40.1 and is announced on Freshmeat and in various other places. |
From: Christian P. <cp...@el...> - 2001-05-08 07:52:30
|
Hi Mike ! > ... but there are so > many other things broken with queue right now it's not even funny. (I've > pretty much given up on queue for now and wrote a few cheesy shell scripts > that work much better.) My first approach for loadbalancing was a simple ruptime request for the load of several machines. The disadvantage was that the callculated loads deadtime was too long. If I waited 5 Minutes until starting the next job the balancing was ok. How do you determine the load ? Best regards, Christian |
From: Mike C. <da...@ix...> - 2001-05-07 23:26:50
|
On Thu, May 03, 2001 at 11:10:10PM -0700, Cyril Bortolato wrote: > I came up with the patch to queued.c below. In startjob() I check if > there's already an efmXXX file for the job passed to startjob(). If > there is, it means that job is already running on some host and we'd > better not start it again. So I set the job's pid accordingly and There is still a race condition here, unfortunately. The efm file could still show up after you look for it but before you create it. You've reduced the window, but not eliminated it. One solution might be from the linux open(2) man page: O_EXCL When used with O_CREAT, if the file already exists it is an error and the open will fail. O_EXCL is broken on NFS file systems, programs which rely on it for performing locking tasks will contain a race condition. The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Oth<AD> erwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful. We can't necessarily rely on flock working (not everyone has a working lockd). We could build in a lock protocol into queue, but there are so many other things broken with queue right now it's not even funny. (I've pretty much given up on queue for now and wrote a few cheesy shell scripts that work much better.) > return ALREADY_LOCKED. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Cyril B. <cyr...@ya...> - 2001-05-04 06:11:23
|
Hi, I tried Queue with 2 machines, both running RH 6.2. I downloaded the source from CVS on 05/01. I set it up in non root mode. When I launched a couple of jobs (with queue -i -w -n or qsh), I noticed some strange behavior. A job starts on machine1, runs for a while, but then machine2 tries to run it as well (same cfmXXX in supervisor.log). Machine2 then immediately stops running that job, but also removes the efmXXX and CFDIR/cfmXXX files in the "now" queue directory... The job also gets terminated on machine1 (signal 9). When machine1's queued daemon tries to remove the job's CFDIR/cfmXXX file, it's no longer there and I get errors like "Can't unlink cfmXXX". I came up with the patch to queued.c below. In startjob() I check if there's already an efmXXX file for the job passed to startjob(). If there is, it means that job is already running on some host and we'd better not start it again. So I set the job's pid accordingly and return ALREADY_LOCKED. Let me know if it's the right approach. Regards, Cyril bo...@us... Index: queued.c =================================================================== RCS file: /cvsroot/queue/queue-development/queued.c,v retrieving revision 1.46 diff -u -r1.46 queued.c --- queued.c 2001/04/11 20:46:10 1.46 +++ queued.c 2001/05/04 02:02:45 @@ -3525,6 +3525,17 @@ checkpoint = qp->q_checkpointmode; restart = NO_RESTART; + /* Check if there's already an "ef" file, meaning the job + * is already running on some host. borto 2001/05/03 */ + sprintf(fname, "%s/e%s", qp->q_name, jp->j_cfname+1); + if(access(fname, F_OK)==0) { + mdebug1("queued queued.c startjob():\n"\ + "\t%s is already running somewhere, skip it.\n", + jp->j_cfname); + jp->j_pid = ANOTHER_HOST; + return(ALREADY_LOCKED); + } + #ifdef ENABLE_CHECKPOINT /*Migrator code. WGK 1999/3/6. If there's a corresponding mf file, only consider starting the job if we are allowed to restart jobs.*/ __________________________________________________ Do You Yahoo!? Yahoo! Auctions - buy the things you want at great prices http://auctions.yahoo.com/ |
From: Christian P. <cp...@el...> - 2001-05-03 12:39:19
|
Hello ! Just a short bugreport: The latest cvs-snapshot from queue-stable dosn't build on SuSE 7.1 GNU/Linux: =2E/configure --prefix=3D/opt/queue-1.30.1 --sharedstatedir=3D/common/run= /data/queued/com --enable-root --datadir=3D/common/run/data/queued/share = --localstatedir=3D/var/queue=20 cpa/queue-stable> make gcc -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -c queue.c gcc -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -c wakeup.c wakeup.c: In function `wakeup': wakeup.c:174: warning: passing arg 4 of `qsort' from incompatible pointer= type gcc -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -c ident.c gcc -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -c qlib.c gcc -g -O2 -o queue queue.o wakeup.o ident.o qlib.o -lutil -lelf -lcr= ypt -lfl -lnsl -lelf -lrpcsvc gcc -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -c queued.c queued.c:959: unterminated `#else' conditional make: *** [queued.o] Error 1 =20 Best regards, =09Christian |
From: Christian P. <cp...@el...> - 2001-05-02 20:01:18
|
Hello ! I used Queue 1.30.1 with the following configure options: ./configure --prefix=/opt/queue-1.30.1 --sharedstatedir=/common/run/data/queued --enable-root --datadir=/common/run/data/queued/share --localstatedir=/var/queue And i applied the patch from Fabio (see HP at sourceforge). Now the gmake install says: ./install-sh -c -d /var/queue || (mkdir /var/queue; exit 0) ./install-sh -c -o root -m 700 -d /var/queue/queue/ || (mkdir /var/queue/queue; chmod 700 /var/queue/queue; exit 0) ./install-sh -c -o root -m 700 -d /var/queue/queue/now || (mkdir /var/queue/queue/now; chmod 700 /var/queue/queue/now; exit 0) ./install-sh -c -o root -m 700 -d /var/queue/queue/wait || (mkdir /var/queue/queue/wait; chmod 700 /var/queue/queue/wait; exit 0) ./install-sh -c -o root -m 600 profile /var/queue/queue/now ./install-sh -c -o root -m 600 profile /var/queue/queue/wait This looks as for profile localstatedir is used but this file must be shared in my opinion, or I have to maintain severeal profile-files on each machine. Please tell me if I am wrong. Is there somewhere explained what datadir, localstatedir and sharedstatedir exactly is ? I havn't found it in the delivered documentation nor on the website. Best Regards, Christian |
From: Christian P. <cp...@el...> - 2001-05-02 13:29:26
|
Hello ! Though I allready wrote on the Sourceforge Bug-Reporting tool, nobody (but Tom) answered. So I will try it again on this ML. I made queue on a Sparc with the following configure options: ./configure --prefix=/opt/queue-1.30.1 --sharedstatedir=/common/run/data/queued --enable-root --datadir=/common/run/data/queued/share --localstatedir=/var/queue /common/run/data/queued is NFS-shared and root-writeable. Everytime I run a job I got the mail: Can't unlink(layver/CFDIR/cfm788551552): No such file or directory Why is that ? I am also confused why queue complains about this: Can't fopen(/opt/queue-1.30.1/var/queue/now/mail_log2, "a"); using stderr: No such file or directory I want to make queue to write to /var/queue the local stuff. Queue also wants a var director in /common/run/data/queued I changed the QUEUEDIR=\"`eval echo ${localstatedir}/queue`\" line in Makefile by hand to QUEUEDIR=\"`eval echo ${sharedstatedir}/queue`\" as a mainatainer told me. If I look on Fabios patch (why is it not on the sourceforge-site?) there is much more to change, this is my next try. Best regards, Christian |
From: Mike C. <da...@ix...> - 2001-04-27 18:05:20
|
This is less of a queue issue than a development tool issue. I'm wanting to build queue with -profile to try to identify some performance issues. However, I think I'm running against a bug in the GNU binutils package. Then again, it may be a local corruption problem. :-> I'm wondering if I could get anyone using queue to volunteer to do a build on a machine with configuration similar to below and just let me know if it successfully builds. If it does, then I'll assume it's a local issue. If it doesn't, then I'm pretty sure it's in ld or, most likely, libbfd. Linux 2.4.2 (more or less, though I don't think it matters). Binutils 2.11, 2.11.90.0.5, or 010423 (had problems with all three) glibc 2.2.2 gcc 2.95.3 queue 1.30.1 or the latest cvs source. Just after doing ./configure, but before doing make, change your top level make file and add "-profile" to CXXFLAGS and CFLAGS. On my machine, it hangs linking queue or queued. If you're REALLY ambitious, you can use gdb to attach to ld (assuming you have source code laying around) and see that it's traversing a circularly linked list (that I don't think is supposed to be circular). Anyway, if anyone does this, and is able to identify working and none working combinations, I'd appreciate it. Mail me off list, and I'll correlate data and see if I can find a trend. Thanks! mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-04-25 07:56:02
|
In comparing the performance of the latest CVS stuff vs 1.30.1, I've noticed a significant change in speed. I'm using as a simple test case a compile of zlib-1.1.3. I'm doing: time make CC=mygcc -j For some base line numbers: time make: 24.7 sec (ok, it's only a P5-233) time make -j: 24.8 sec time make CC=mygcc: 43 sec (1.30.1) time make CC=mygcc -j: 27.2 sec (1.30.1) time make CC=mygcc: 44 sec (1.40.1) time make CC=mygcc -j: 35 sec (1.40.1) This is all with default settings in the now/profile. Only one machine in the system (I haven't started using nfs locking yet, so if I run 2 machines, they die pretty quickly with random file corruption, so I can't tell how well it scales across two machines). I am going to start profiling to see if I can see any major issues. But I was wondering if anyone else has noticed similar performance changes. (geocrawler seems to be down at the moment). mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-04-25 07:12:54
|
Index: Makefile.am =================================================================== RCS file: /cvsroot/queue/queue-development/Makefile.am,v retrieving revision 1.12 diff -u -r1.12 Makefile.am --- Makefile.am 2001/03/14 13:21:30 1.12 +++ Makefile.am 2001/04/25 06:27:06 @@ -105,7 +105,7 @@ $(INSTALL) -o $(Queue_OWNER) -m 600 profile $(DEST_queuedir)/wait/ -$(INSTALL) -m 755 -d $(DEST_infodir) $(INSTALL) -m 644 doc/queue.info $(DEST_infodir)/ - -install-info --infodir=$(infodir) $(DEST_infodir)/queue.info + -install-info --info-dir=$(infodir) $(DEST_infodir)/queue.info -$(INSTALL) -m 755 -d $(DEST_mandir) -$(INSTALL) -m 755 -d $(DEST_mandir)/man1 $(INSTALL) -m 444 doc/queue.man $(DEST_mandir)/man1/queue.1 -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-04-25 05:19:40
|
I propose the initial following patch: Index: queued.c =================================================================== RCS file: /cvsroot/queue/queue-development/queued.c,v retrieving revision 1.46 diff -u -r1.46 queued.c --- queued.c 2001/04/11 20:46:10 1.46 +++ queued.c 2001/04/25 05:16:25 @@ -79,7 +79,7 @@ * GNU/Linux differs here from other OSes in that * it has named the descriptor 'fd' rather than 'dd_fd'. */ -# define QUEUE_PROPER__DIR_FD(dir) (*((int*)dir)) +# define QUEUE_PROPER__DIR_FD(dir) (dirfd(dir)) # else # define QUEUE_PROPER__DIR_FD(dir) (dir->dd_fd) #endif Actually, doing anything "file descriptor" wise with DIR is a bit suspicious. However, I think that something along the line of: #ifdef HAVE_DIRFD # define QUEUE_PROPER__DIR_FD(dir) (dirfd(dir)) #else # define QUEUE_PROPER__DIR_FD(dir) (dir->dd_fd) #endif With the appropriate autoconf magic would be a better solution. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Mike C. <da...@ix...> - 2001-04-25 04:36:49
|
Ok... looking through the list archive, this has been a sticking point for people for a while. Looking at define.h, we have several issues. First, the following: #ifdef HAVE_SYS_TIME_H #include <sys/time.h> #endif exists at line 1, and line 115. Not sure it's needed in both. Second, we have the following: #ifdef TM_WITH_SYS_TIME #ifdef HAVE_TIME_H #include <time.h> #endif #endif Ok. *either* tm is in sys/time.h or it's not. If it's in sys/time.h, then we've already included it above, so we don't need #ifdef at all. If it's NOT in sys/time.h, then we should be including time.h. In short, I propose the following patch: Index: define.h =================================================================== RCS file: /cvsroot/queue/queue-development/define.h,v retrieving revision 1.3 diff -u -r1.3 define.h --- define.h 2001/04/11 15:29:29 1.3 +++ define.h 2001/04/25 04:36:25 @@ -134,7 +134,7 @@ #include <syslog.h> #endif #include <termios.h> -#ifdef TM_WITH_SYS_TIME +#ifndef TM_WITH_SYS_TIME #ifdef HAVE_TIME_H #include <time.h> #endif Then it at least compiles. mrc -- Mike Castle Life is like a clock: You can work constantly da...@ix... and be right all the time, or not work at all www.netcom.com/~dalgoda/ and be right at least twice a day. -- mrc We are all of us living in the shadow of Manhattan. -- Watchmen |
From: Hazelrig, C. C. (C. - Simtech)
<Chr...@hw...> - 2001-04-18 14:48:20
|
Hi, Monica. I left queue_manager, queued, and task_manager running over night and now host1 and host2 are listed in the status file under DEFECTIVE_SERVERS. I guess I'll jump into the debugger and try to determine why the two don't seem to be communicating properly. It's weird, though, that queue_manager isn't detecting host1 since that's the node it's running on. I get the same results when attempting to submit a job to host1 as with host2 yesterday. Chris > -----Original Message----- > Date: Tue, 17 Apr 2001 22:42:05 -0700 (PDT) > From: Monica Lau <la...@cs...> > To: que...@li... > Subject: Re: [Queue-developers] no luck with latest development Queue > Reply-To: que...@li... > > Hi Chris, > > When the queue_manager first starts up, the hosts are in the VALIDHOSTS > list. The queued's must periodically send update messages to the > queue_manager; the period between these update messages are defined in the > queue_define.h file (MAX_MODULO and MIN_MODULO). When the queue_manager > hears from these queued's, then it will move these hosts from the > VALIDHOSTS list to the AVAILHOSTS list. So, just wait a bit before > submitting your jobs. > > I hope this helps. > > Regards, > > Monica > > > On Tue, 17 Apr 2001, Hazelrig, Chris C. (Contractor - Simtech) wrote: > > > Greetings, > > > > Having a few problems with latest Queue development version. I'm trying > it > > with just two nodes (host1, host2), each running RedHat Linux 6.2. On > the > > master (host1), I am running queued (queued --debug --foreground), > > queue_manager, and task_manager. On the slave (host2), I am running > queued > > (queued --debug --foreground) and task_manager. From host1 I execute > the > > following: > > > > queue -D -i -w -n -a dummylicense -H host2 -- hostname > > > > I get the following error message: > > > > Queue.c Error: no |'s allowed in Queue software > > > > If I remove the -H option OR the -D option, no error is reported, but no > > result is reported either. The command seems to go off into the weeds > and > > never comes back. The status file says host1 and host2 are VALIDHOSTS > but > > lists nothing as AVAILHOSTS, and the submitted job is listed under > > HIGH_WAITING. It appears that queue_manager thinks both nodes are busy. > > Upon submitting the job, queue_manager reports "After getting licenses" > and > > then "After getting user's environment" and then "Timed out". > Attempting to > > kill the job with ^C returns to the command line, but doesn't actually > kill > > the job, it is still listed in the status file. Using task_control -k > > <JOB_ID> does kill it, and queue returns the following message: > > > > Queue.c Error: did not get an assigned host > > > > I'm stumped. Any thoughts? > > > > Thanks in advance, > > Chris > > > > _______________________________________________ > > Queue-developers mailing list Que...@li... > > To unsubscribe, subscribe, or set options: > > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > > > > > --__--__-- > > _______________________________________________ > Queue-developers mailing list > Que...@li... > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > End of Queue-developers Digest |
From: Monica L. <la...@cs...> - 2001-04-18 05:42:08
|
Hi Chris, When the queue_manager first starts up, the hosts are in the VALIDHOSTS list. The queued's must periodically send update messages to the queue_manager; the period between these update messages are defined in the queue_define.h file (MAX_MODULO and MIN_MODULO). When the queue_manager hears from these queued's, then it will move these hosts from the VALIDHOSTS list to the AVAILHOSTS list. So, just wait a bit before submitting your jobs. I hope this helps. Regards, Monica On Tue, 17 Apr 2001, Hazelrig, Chris C. (Contractor - Simtech) wrote: > Greetings, > > Having a few problems with latest Queue development version. I'm trying it > with just two nodes (host1, host2), each running RedHat Linux 6.2. On the > master (host1), I am running queued (queued --debug --foreground), > queue_manager, and task_manager. On the slave (host2), I am running queued > (queued --debug --foreground) and task_manager. From host1 I execute the > following: > > queue -D -i -w -n -a dummylicense -H host2 -- hostname > > I get the following error message: > > Queue.c Error: no |'s allowed in Queue software > > If I remove the -H option OR the -D option, no error is reported, but no > result is reported either. The command seems to go off into the weeds and > never comes back. The status file says host1 and host2 are VALIDHOSTS but > lists nothing as AVAILHOSTS, and the submitted job is listed under > HIGH_WAITING. It appears that queue_manager thinks both nodes are busy. > Upon submitting the job, queue_manager reports "After getting licenses" and > then "After getting user's environment" and then "Timed out". Attempting to > kill the job with ^C returns to the command line, but doesn't actually kill > the job, it is still listed in the status file. Using task_control -k > <JOB_ID> does kill it, and queue returns the following message: > > Queue.c Error: did not get an assigned host > > I'm stumped. Any thoughts? > > Thanks in advance, > Chris > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Hazelrig, C. C. (C. - Simtech)
<Chr...@hw...> - 2001-04-17 21:34:12
|
Greetings, Having a few problems with latest Queue development version. I'm trying it with just two nodes (host1, host2), each running RedHat Linux 6.2. On the master (host1), I am running queued (queued --debug --foreground), queue_manager, and task_manager. On the slave (host2), I am running queued (queued --debug --foreground) and task_manager. From host1 I execute the following: queue -D -i -w -n -a dummylicense -H host2 -- hostname I get the following error message: Queue.c Error: no |'s allowed in Queue software If I remove the -H option OR the -D option, no error is reported, but no result is reported either. The command seems to go off into the weeds and never comes back. The status file says host1 and host2 are VALIDHOSTS but lists nothing as AVAILHOSTS, and the submitted job is listed under HIGH_WAITING. It appears that queue_manager thinks both nodes are busy. Upon submitting the job, queue_manager reports "After getting licenses" and then "After getting user's environment" and then "Timed out". Attempting to kill the job with ^C returns to the command line, but doesn't actually kill the job, it is still listed in the status file. Using task_control -k <JOB_ID> does kill it, and queue returns the following message: Queue.c Error: did not get an assigned host I'm stumped. Any thoughts? Thanks in advance, Chris |