Re: Re: [Queue-developers] new intermediate development Queue version
Brought to you by:
wkrebs
From: Gert V. d. E. <gvd...@sc...> - 2001-02-28 08:06:10
|
Dear QingLong, > > How do I work around this ? > > > It looks like you are using development versions of the tools. > If so, then you are on your own here, I am not able to help you, > so you would probably have to try yourself to find out if this is > autoconf/automake or Queue bug. And if you find that it is Queue > configure.in that should be fixed, please, teach us how should we > modify it to meet autoconf requirements. Thank you. I'm sorry, I'm a coward under time pressure to get a queueing system working on our cluster and a non-expert on autoconf/automake, so I grabbed and installed the versions you mentioned. All compilation went well after that. I've done some playing around, these are my observations: - compiled queue and queued with --enable-root (no --enable-manager) - queued started with --debug and -t 10 on 5 hosts, each maxexec 2 - submitted 12 jobs for the now queue (using -i -w -p) - 10 start immediately, as expected - 1 is on hold, as expected - last one gives me back the shell after some seconds with the announcement 'Alarm clock' - queued started with --debug --foreground -t 10 on 5 hosts, each maxexec 1 - submitted 7 jobs for the now queue (using -v -i -w -p) -5 start immediately, as expected - 2 are on hold - I get lots of messages from queued among which there are file now/CFDIR/cfm701151391 has 0 length: Bad file descriptor now/CFDIR/cfm701151391 has 0 length. SENDMAIL: To 'root' from 'root': Subject: queued error on bohr: file now/CFDIR/cfm701151391 has 0 length: Interrupted system call Requesting load average for queue "now" on host "pauli"... queue_persistent_connect(): connect()ing to 192.168.1.5:1423 ... queue_persistent_connect(): connect()ed to 192.168.1.5:1423 on socket 6. queue_reliable_fread(): select()ing on socket 6... queue_reliable_fread(): select(6) timed out. getrldavg(): failed to fread() from stream opened on socket 6. getrldavg(): close(6). getrldavg(): ### failed to get load from pauli ### returning 1.00e+08 as rejection designator. Are these messages something you would expect from queued or are they indicating that something is wrong? If you need more specific debugging information (or if you have a test-case I can run on our system), tell me what to look for.... Have a nice day, Gert |