[Queue-developers] some questions
Brought to you by:
wkrebs
From: Gert V. d. E. <gvd...@sc...> - 2001-02-22 11:16:42
|
Dear users/developers of Queue, The last few days I've been experimenting with Queue on a five node Linux cluster (SuSE 7.0 out of the box *without* installing the Queue from SuSE 7.0). I have encountered several problems. I've been browsing the maillist for more information, maybe I missed it... * using the latest release 1.30.1 (after fixing the RLIMIT bug as mentioned in the bugtrack and compiling), all seemed to go well. But... when I submitted a large number of jobs to one queue (exceeding the total sum of different maxexec on the nodes, so a couple of the jobs had to wait for a 'free slot'), I observed that Queue fills up all queues on the machines very nice and puts the other jobs on hold. However, when a job has finished, the waiting jobs keep on waiting. They do not get a free slot and when I query the queuestat files, it says in one that they are running, but nothing is happening on that host. * I also tried to get the latest CVS going (queue-development). Something strange is going on during configuration and compilation. After the usual ./configure --enable-root and make, all seems to be well. When I do make install, it starts reconfiguring (and effectively changing config.h), recompiling and then the compilation breaks due to a missing cleanutent. It seems that during the reconfigure the support for rxvt utmp was added, but the file logging.c is not in the sources list in the makefile. The reconfiguration also changed the install directories for queue. Before, queue queue's were in /usr/local/var/queue, now they were to go in /usr/local/var/spool/queue. The directory for the qhostfile changed from /usr/local/share to /usr/local/share/queue. * I managed to fix the above to have it compile, no probs. When I start this CVS queued without debugging option (on one host, just for testing, the other nodes are down) and I submit jobs to the now queue (the classical hostname job from the manual), I get emails like this: ---- Date: Thu, 22 Feb 2001 11:41:50 +0100 From: The Queue Daemon <ro...@fe...> To: gvd...@fe... Subject: batch queue_b on fermi: queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "gvdeynde" queued queued.c sendmail(): SENDMAIL: From: "queued" SENDMAIL: To: "gvdeynde" ---- and using the verbose option from queue gives me this: --- Requesting load average for queue "now" on host "fermi"... The host "fermi"is not able to serve queue "now". Failed to submit job in queue "now" to host "fermi". --- However, if I start queued in debug mode (queued --debug), I get this from queue --verbose .... --- Requesting load average for queue "now" on host "fermi"... Host "fermi" appears to be able to serve queue "now". Ok, connecting to QueueD at it. Trying "fermi"... Going to submit job to queue "now" on host "fermi". queue.c: main(): tty(in/out/err): 1 1 1. queued handle.c handle(): going to try to run "hostname". queued handle.c handle(): assembled full path: "/bin/hostname". queued handle.c handle(): going to execve(/bin/hostname). fermi --- My questions: - Is the 1.30.1 release still relevant (is there a patch to fix the apparant hang of queued ?) - Is the developers version reliable (I know it's a developers version, but I am aware of projects where it is best to stick to the developers version than to the stable releases) ? Thank you for your time and a very promising tool. I'm really looking forward to using queue on our system... Gert Van den Eynde SCK-CEN Reactor Physics & Myrrha dept. Neutronics Calculation Section Belgium |