[Queue-developers] using queued on 2 different subnets
Brought to you by:
wkrebs
From: Sam M. <sa...@ja...> - 2001-05-13 19:33:25
|
Hi all, first post. I'm trying to setup queued using 3 machines, 2 are on the same subnet. To compound the problems i'm having the machines all have more than one name (DNS) which they can be reached with. For example: machine ip addr. hostname machine Names ---------------- -------- ------------- 66.66.73.190 sabres.dnsq.org roc-66-66-73-190.rochester.rr.com 66.66.123.94 bubs.dnsq.org roc-66-66-123-94.rochester.rr.com 192.168.1.3 skinner.bubba.net skinner I realize that the 3rd machine will not be visible from outside the 192.168.1.0 network, that should be ok though from queue's standpoint, right? I've successfully compiled 1.40.1beta on all three machines and have been able to successfully submit jobs as follows: try queue Client queued Server Submit jobs --- ------------ ------------- ----------- 1 bubs.dnsq.org bubs.dnsq.org Y 2 bubs.dnsq.org skinner.bubba.net Y 3 bubs.dnsq.org sabres.dnsq.org N 4 skinner.bubba.net skinner.bubba.net Y 5 skinner.bubba.net bubs.dnsq.org Y 6 skinner.bubba.net sabres.dnsq.org N/A 7 sabres.dnsq.org sabres.dnsq.org Y 8 sabres.dnsq.org bubs.dnsq.org N 9 sabres.dnsq.org skinner.bubba.net N/A Anyway my problem is really with the queued running on sabres.dnsq.org and bubs.dnsq.org. Since skinner.bubba.net doesn't have an ip addr. thats visible from the internet I don't expect that host to be able to participate outside of the 192.168.1.0 network, I do expect the other 2 hosts to be able to though. When I tried to submit jobs in either direction (tries 3 & 8) the clients just hung. I debugged the connection using netcat to see what was coming across, I could see the typical info "VERSION0..VERSION1..now" coming from the queue clients so I know the clients are atleast connecting to the queued servers. Here is where I get a little lost though. I tried to follow what was going in the queued.debug but is a little overwelming. Here is some of whats going on in the queued.debug file, its a little long so i've cut some of the repetition out. Please let me know if you would like to see the complete log. This is from a client connection (sabres.dnsq.org) -> server (bubs.dnsq.org) Execution commands: server: queued -D client: queue -h bubs.dnsq.org -i -w -- hostname <BEGIN LOG -- queued.debug> .dnsq.org[29506]: timestamp: "Sun May 13 13:35:37" bubs.dnsq.org[29506]: queued queued.c check_query(): accept()ing connection on QUERY port... bubs.dnsq.org[29506]: queued queued.c check_query(): accept()ed connection on QUERY port from 66.66.73.190:1210 on socket 7. bubs.dnsq.org[29506]: queued queued.c check_query(): going to fgets() from stream on fd 7... bubs.dnsq.org[29506]: queued queued.c check_query(): got 6 chars from stream on fd 7: "QUERY " bubs.dnsq.org[29506]: queued queued.c check_query(): got 9 chars from stream on fd 7: "VERSION0 " bubs.dnsq.org[29506]: queued queued.c check_query(): got 9 chars from stream on fd 7: "VERSION1 " bubs.dnsq.org[29506]: queued queued.c check_query(): going to fgets() from stream on fd 7... bubs.dnsq.org[29506]: queued queued.c check_query(): got 4 chars from stream on fd 7: "now " bubs.dnsq.org[29506]: queued queued.c check_query(): Got job query request for queue "now". bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_drain = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_deleteq = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_stopped = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_status = 1. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_status1: "now: enabled: maxexec=2 loadsched=25 loadstop=50 nice=0 cpu 71582788 min " bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_oldstat = -1. bubs.dnsq.org[29506]: queued queued.c check_query(): calculating load... avg = 1.130. bubs.dnsq.org[29506]: queued queued.c check_query(): calculated load = 0.710. bubs.dnsq.org[29506]: queued queued.c check_query(): Queue "now": load average query response: 0.71 (0x3f35c28f). bubs.dnsq.org[29506]: qlib.c netfwrite(): going to fwrite(1, 1, stream on fd 7)... bubs.dnsq.org[29506]: qlib.c netfwrite(): ok, done fwrite(1, 1, stream on fd 7)... ............REMOVED 3 occurrences of above 2 lines..................... bubs.dnsq.org[29506]: queued queued.c check_query(): Load average 1.130000, vmaxexec 2, nexec 0, pfactor 1. bubs.dnsq.org[29506]: queued queued.c check_query(): select()ing on sockets: 5 and 6... bubs.dnsq.org[29506]: queued queued.c check_query(): accept()ing connection on WAKEUP port... bubs.dnsq.org[29506]: queued queued.c check_query(): accept()ed connection on WAKEUP port from 66.66.73.190:1022 on socket 7. bubs.dnsq.org[29506]: queued queued.c check_query(): going to fgets() from stream on fd 7... bubs.dnsq.org[29506]: queued queued.c check_query(): Got job wakeup request for queue "now". bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_drain = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_deleteq = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_stopped = 0. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_status = 1. bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_status1: "now: enabled: maxexec=2 loadsched=25 loadstop=50 nice=0 cpu 71582788 min " bubs.dnsq.org[29506]: queued queued.c check_query(): The "now" queue: q_oldstat = -1. bubs.dnsq.org[29506]: qlib.c netfread(): going to fread(1, 1, stream on fd 7)... bubs.dnsq.org[29506]: qlib.c netfread(): ok, done fread(1, 1, stream on fd 7)... ............REMOVED 3 occurrences of above 2 lines..................... bubs.dnsq.org[29506]: queued queued.c main() child: looking around... bubs.dnsq.org[29506]: queued queued.c main() child: load averages are 1, 1, 1. bubs.dnsq.org[29506]: queued queued.c main() child: checking queue "wait". bubs.dnsq.org[29506]: queued queued.c runqueue_b(): queue_b status has not changed since last time bubs.dnsq.org[29506]: queued queued.c main() child: checking queue "now". bubs.dnsq.org[29506]: queued queued.c main() child: new 'now/CFDIR' modify time; checking for newly queue_bd jobs bubs.dnsq.org[29506]: qlib.c netfread(): going to fread(1, 1, stream on fd 9)... bubs.dnsq.org[29506]: qlib.c netfread(): ok, done fread(1, 1, stream on fd 9)... ............REMOVED 7 occurrences of above 2 lines..................... bubs.dnsq.org[29506]: queued queued.c jobinfo(): localuid: 0, userid: "root", mailuserid: "root", jobname: "cfm804019858", mailflag: 0, onlyhost: "roc-66-66-123-94.rochester.rr.com". bubs.dnsq.org[29506]: queued queued.c requeue_b(): new job 'now/CFDIR/cfm804019858' (cfm804019858) user 'root' (root) bubs.dnsq.org[29506]: queued queued.c runqueue_b(): queue has no jobs bubs.dnsq.org[29506]: queued queued.c main() child: Checking for queries. bubs.dnsq.org[29506]: queued queued.c check_query(): select()ing on sockets: 5 and 6... 13:28:26" </END LOG> One final comment. I noticed that onlyhost above has machine name roc-66-66-123-94.rochester.rr.com. So I tried changing the hostname of bubs.dnsq.org to roc...No difference, so something else must be going on here. The qhostsfiles have all the above names in them too. Thanx -- \|/ @-@ ------------ooO---(_)--Ooo---------------- | E-Mail: | (H): slm...@bu... | (W): sam...@bi... | | web: http://bubs.dnsq.org/~sam/ |