Re: [Queue-developers] some questions
Brought to you by:
wkrebs
From: W. G. K. <wer...@ya...> - 2001-02-22 13:59:13
|
Quoting Gert Van den Eynde <gvd...@sc...>: > Dear users/developers of Queue, > > The last few days I've been experimenting with Queue on a five node Linux > cluster (SuSE 7.0 out of the box *without* installing the Queue from SuSE 7.0). > I have encountered several problems. I've been browsing the maillist for more > information, maybe I missed it... > > * using the latest release 1.30.1 (after fixing the RLIMIT bug as mentioned in > the bugtrack and compiling), all seemed to go well. But... when I submitted a > large number of jobs to one queue (exceeding the total sum of different maxexec > on the nodes, so a couple of the jobs had to wait for a 'free slot'), I > observed that Queue fills up all queues on the machines very nice and puts the > other jobs on hold. However, when a job has finished, the waiting jobs keep on > waiting. They do not get a free slot and when I query the queuestat files, it > says in one that they are running, but nothing is happening on that host. > > * I also tried to get the latest CVS going (queue-development). Something > strange is going on during configuration and compilation. After the usual > ./configure --enable-root and make, all seems to be well. When I do make > install, it starts reconfiguring (and effectively changing config.h), > recompiling and then the compilation breaks due to a missing cleanutent. It > seems that during the reconfigure the support for rxvt utmp was added, but the > file logging.c is not in the sources list in the makefile. The reconfiguration > also changed the install directories for queue. Before, queue queue's were in > /usr/local/var/queue, now they were to go in /usr/local/var/spool/queue. The > directory for the qhostfile changed from /usr/local/share to > /usr/local/share/queue. > > * I managed to fix the above to have it compile, no probs. When I start this > CVS queued without debugging option (on one host, just for testing, the other > nodes are down) and I submit jobs to the now queue (the classical hostname job > from the manual), I get emails like this: > > ---- > Date: Thu, 22 Feb 2001 11:41:50 +0100 > From: The Queue Daemon <ro...@fe...> > To: gvd...@fe... > Subject: batch queue_b on fermi: queued queued.c sendmail(): SENDMAIL: From: > "queued" SENDMAIL: To: > "gvdeynde" > > queued queued.c sendmail(): > SENDMAIL: From: "queued" > SENDMAIL: To: "gvdeynde" > ---- > > and using the verbose option from queue gives me this: > > --- > Requesting load average for queue "now" on host "fermi"... > The host "fermi"is not able to serve queue "now". > Failed to submit job in queue "now" to host "fermi". > --- > > However, if I start queued in debug mode (queued --debug), I get this from > queue --verbose .... > > --- > Requesting load average for queue "now" on host "fermi"... > Host "fermi" appears to be able to serve queue "now". > Ok, connecting to QueueD at it. > Trying "fermi"... > Going to submit job to queue "now" on host "fermi". > queue.c: main(): tty(in/out/err): 1 1 1. > queued handle.c handle(): going to try to run "hostname". > queued handle.c handle(): assembled full path: "/bin/hostname". > queued handle.c handle(): going to execve(/bin/hostname). > fermi > --- > > > My questions: > > - Is the 1.30.1 release still relevant (is there a patch to fix the apparant > hang of queued ?) This patch has been rolled into the CVS release. > - Is the developers version reliable (I know it's a developers version, but I > am aware of projects where it is best to stick to the developers version than > to the stable releases) ? Unfortunately, our project is in such a phase right now. It is probably easiest to figure out what is wrong with the CVS version and get this working, as opposed to playing with 1.30.1. Once this works a little better, I hope to roll out 1.30.2 from the CVS version anyway, so that 1.30.1 should become obsolete soon. (The other thing you could try would be the pre-1.20.1 releases.) > Thank you for your time and a very promising tool. I'm really looking forward > to using queue on our system... > > Gert Van den Eynde > SCK-CEN > Reactor Physics & Myrrha dept. > Neutronics Calculation Section > Belgium > > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |