Thread: RE: [Queue-developers] Error running jobs
Brought to you by:
wkrebs
From: Sam L. <sam...@an...> - 2001-04-11 23:00:51
|
Now I fixed the syntax errors in the profiles, I get: wakeup.c getrldavg(): po.ananova.net returned load 5.72e+10. Queue "now" at host "po.ananova.net" has rldavg of 5.72e+10. The host "po.ananova.net" is not able to serve queue "now". Failed to submit job in queue "now" to host "po.ananova.net". an exceedingly high load. /tmp/queue/queued.debug: po.ananova.net[281]: queued queued.c check_query(): The "now" queue: q_status1: "now: enabled: maxexec=2 loadsched=25 loadstop=50 nice=0 cpu 71582788 min " po.ananova.net[281]: queued queued.c check_query(): The "now" queue: q_oldstat = -1. po.ananova.net[281]: queued queued.c check_query(): calculating load... avg = 0.160. po.ananova.net[281]: queued queued.c check_query(): calculated load = 0.387. po.ananova.net[281]: queued queued.c check_query(): Queue "now": load average query response: 0.39 (0x3ec60000). po.ananova.net[281]: queued queued.c check_query(): Load average 0.160156, vmaxexec 2, nexec 0, pfactor 1. po.ananova.net[281]: queued queued.c check_query(): select()ing on sockets: 6 and 7... Which all looks about right. So how did queue so badly misread the load? Sam |
From: Sam L. <sam...@an...> - 2001-04-12 08:21:19
|
Yeah, but queued says the load is less than 1, but queue says the load is 57,200,000,000 I set my pfactor to 100 and maxexec to 20 Sam > -----Original Message----- > From: W. G. Krebs [mailto:wer...@ya...] > Sent: 12 April 2001 01:05 > To: que...@li... > Subject: Re: [Queue-developers] Error running jobs > > > This is correct behavior. > > The load average is divided by vmaxexec minus number of > running jobs or > one, whichever is greater, and then multiplied by p-factor. > > In this case, the load is divided by two. > > This helps to give unused machines a slight leg-up in the competition. > (p-factor should be set to discourage slow machines from > getting jobs.) > > Sam Liddicott wrote: > > > Now I fixed the syntax errors in the profiles, I get: > > > > wakeup.c getrldavg(): po.ananova.net returned load 5.72e+10. > > Queue "now" at host "po.ananova.net" has rldavg of 5.72e+10. > > The host "po.ananova.net" is not able to serve queue "now". > > Failed to submit job in queue "now" to host "po.ananova.net". > > > > an exceedingly high load. > > > > /tmp/queue/queued.debug: > > po.ananova.net[281]: queued queued.c check_query(): > > The "now" queue: q_status1: > > "now: enabled: maxexec=2 loadsched=25 loadstop=50 nice=0 cpu > > 71582788 min > > " > > po.ananova.net[281]: queued queued.c check_query(): > > The "now" queue: q_oldstat = -1. > > po.ananova.net[281]: queued queued.c check_query(): > > calculating load... avg = 0.160. > > po.ananova.net[281]: queued queued.c check_query(): > > calculated load = 0.387. > > po.ananova.net[281]: queued queued.c check_query(): > > Queue "now": load average query response: 0.39 > (0x3ec60000). > > po.ananova.net[281]: queued queued.c check_query(): > > Load average 0.160156, vmaxexec 2, nexec 0, pfactor 1. > > po.ananova.net[281]: queued queued.c check_query(): > > select()ing on sockets: 6 and 7... > > > > Which all looks about right. > > So how did queue so badly misread the load? > > > > Sam > > > > _______________________________________________ > > Queue-developers mailing list Que...@li... > > To unsubscribe, subscribe, or set options: > > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: Sam L. <sam...@an...> - 2001-04-12 09:02:59
|
I got wakeup.c to print as long the value of the float (load) bytes, like queued.c does, and find the load bytes it reads aren't the same ones that were sent: po.ananova.net[19212]: queued queued.c check_query(): Queue "now": load average query response: 0.00 (0x3a1b8cec). wakeup.c getrldavg(): po.ananova.net returned load 5.72e+10. (0x51554552) (This says QUER) Sam > -----Original Message----- > From: Sam Liddicott [mailto:sam...@an...] > Sent: 12 April 2001 09:17 > To: que...@li... > Subject: RE: [Queue-developers] Error running jobs > > > > Yeah, but queued says the load is less than 1, but queue says > the load is > 57,200,000,000 > > I set my pfactor to 100 and maxexec to 20 > > Sam > > > -----Original Message----- > > From: W. G. Krebs [mailto:wer...@ya...] > > Sent: 12 April 2001 01:05 > > To: que...@li... > > Subject: Re: [Queue-developers] Error running jobs > > > > > > This is correct behavior. > > > > The load average is divided by vmaxexec minus number of > > running jobs or > > one, whichever is greater, and then multiplied by p-factor. > > > > In this case, the load is divided by two. > > > > This helps to give unused machines a slight leg-up in the > competition. > > (p-factor should be set to discourage slow machines from > > getting jobs.) > > > > Sam Liddicott wrote: > > > > > Now I fixed the syntax errors in the profiles, I get: > > > > > > wakeup.c getrldavg(): po.ananova.net returned load 5.72e+10. > > > Queue "now" at host "po.ananova.net" has rldavg of 5.72e+10. > > > The host "po.ananova.net" is not able to serve queue "now". > > > Failed to submit job in queue "now" to host "po.ananova.net". > > > > > > an exceedingly high load. > > > > > > /tmp/queue/queued.debug: > > > po.ananova.net[281]: queued queued.c check_query(): > > > The "now" queue: q_status1: > > > "now: enabled: maxexec=2 loadsched=25 loadstop=50 > nice=0 cpu > > > 71582788 min > > > " > > > po.ananova.net[281]: queued queued.c check_query(): > > > The "now" queue: q_oldstat = -1. > > > po.ananova.net[281]: queued queued.c check_query(): > > > calculating load... avg = 0.160. > > > po.ananova.net[281]: queued queued.c check_query(): > > > calculated load = 0.387. > > > po.ananova.net[281]: queued queued.c check_query(): > > > Queue "now": load average query response: 0.39 > > (0x3ec60000). > > > po.ananova.net[281]: queued queued.c check_query(): > > > Load average 0.160156, vmaxexec 2, nexec 0, pfactor 1. > > > po.ananova.net[281]: queued queued.c check_query(): > > > select()ing on sockets: 6 and 7... > > > > > > Which all looks about right. > > > So how did queue so badly misread the load? > > > > > > Sam > > > > > > _______________________________________________ > > > Queue-developers mailing list > Que...@li... > > > To unsubscribe, subscribe, or set options: > > > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > > > > _______________________________________________ > > Queue-developers mailing list Que...@li... > > To unsubscribe, subscribe, or set options: > > http://lists.sourceforge.net/lists/listinfo/queue-developers > > > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers > |
From: W. G. K. <wer...@ya...> - 2001-04-12 00:06:45
|
This is correct behavior. The load average is divided by vmaxexec minus number of running jobs or one, whichever is greater, and then multiplied by p-factor. In this case, the load is divided by two. This helps to give unused machines a slight leg-up in the competition. (p-factor should be set to discourage slow machines from getting jobs.) Sam Liddicott wrote: > Now I fixed the syntax errors in the profiles, I get: > > wakeup.c getrldavg(): po.ananova.net returned load 5.72e+10. > Queue "now" at host "po.ananova.net" has rldavg of 5.72e+10. > The host "po.ananova.net" is not able to serve queue "now". > Failed to submit job in queue "now" to host "po.ananova.net". > > an exceedingly high load. > > /tmp/queue/queued.debug: > po.ananova.net[281]: queued queued.c check_query(): > The "now" queue: q_status1: > "now: enabled: maxexec=2 loadsched=25 loadstop=50 nice=0 cpu > 71582788 min > " > po.ananova.net[281]: queued queued.c check_query(): > The "now" queue: q_oldstat = -1. > po.ananova.net[281]: queued queued.c check_query(): > calculating load... avg = 0.160. > po.ananova.net[281]: queued queued.c check_query(): > calculated load = 0.387. > po.ananova.net[281]: queued queued.c check_query(): > Queue "now": load average query response: 0.39 (0x3ec60000). > po.ananova.net[281]: queued queued.c check_query(): > Load average 0.160156, vmaxexec 2, nexec 0, pfactor 1. > po.ananova.net[281]: queued queued.c check_query(): > select()ing on sockets: 6 and 7... > > Which all looks about right. > So how did queue so badly misread the load? > > Sam > > _______________________________________________ > Queue-developers mailing list Que...@li... > To unsubscribe, subscribe, or set options: > http://lists.sourceforge.net/lists/listinfo/queue-developers |