I find that occasionally queued on host A will be getting the load avg of machine B and queued of machine B will be getting the load avg of machine A and the system deadlocks. Anyone else observe this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Around line 3021 in queued.c is there is an #undef TRANSMIT_DEBUG
If you comment this line out, or remove it, QueueD will fork off another process whenever it wants to spool a job out to another machine. This should end the deadlock situation.
However, one thing the call to wakeup() does is determine if there are any other hosts available in this batch queue to send the job to. If the answer is no, it stops trying for the moment --- 120 seconds or the next submission into the queue. The fork() loses this information, so I'm not sure how it will behave. (This can besolved with some sort IPC; open a pipe probably, but it will add a lot of code that doesn't do a whole lot except determine a return value.)
Hopefully, it will fix the deadlock problem, though.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yep - I've been getting this on Redhat 7.0. I've traced it down to the call to connect in getrldavg (start queued on machine A and B, and start flooding machine A with queue requests. The system will deadlock fairly quickly....kill queued on machine B, and queue on machine A will exit error "getrldavg: connect").
Should this be a non-blocking call to connect? I've left both machines on my test network running for a long time and it just hangs. I also tried removing the #undef TRANSMIT_DEBUG with no success.
Forgive me if I'm suggesting this out of ignorance, but perhaps a better solution for getting the load averages would be to have a separate loadavg daemon which can then serve clients in a serial fashion. Even so, is there any way I can ensure that the call to connect will return within a maximum amount of time?
Queue with Redhat 7.0 also has a bug to do with RLIMITS - I'll put a fix up soon.
Cheers,
Ben.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I find that occasionally queued on host A will be getting the load avg of machine B and queued of machine B will be getting the load avg of machine A and the system deadlocks. Anyone else observe this?
Around line 3021 in queued.c is there is an #undef TRANSMIT_DEBUG
If you comment this line out, or remove it, QueueD will fork off another process whenever it wants to spool a job out to another machine. This should end the deadlock situation.
However, one thing the call to wakeup() does is determine if there are any other hosts available in this batch queue to send the job to. If the answer is no, it stops trying for the moment --- 120 seconds or the next submission into the queue. The fork() loses this information, so I'm not sure how it will behave. (This can besolved with some sort IPC; open a pipe probably, but it will add a lot of code that doesn't do a whole lot except determine a return value.)
Hopefully, it will fix the deadlock problem, though.
Yep - I've been getting this on Redhat 7.0. I've traced it down to the call to connect in getrldavg (start queued on machine A and B, and start flooding machine A with queue requests. The system will deadlock fairly quickly....kill queued on machine B, and queue on machine A will exit error "getrldavg: connect").
Should this be a non-blocking call to connect? I've left both machines on my test network running for a long time and it just hangs. I also tried removing the #undef TRANSMIT_DEBUG with no success.
Forgive me if I'm suggesting this out of ignorance, but perhaps a better solution for getting the load averages would be to have a separate loadavg daemon which can then serve clients in a serial fashion. Even so, is there any way I can ensure that the call to connect will return within a maximum amount of time?
Queue with Redhat 7.0 also has a bug to do with RLIMITS - I'll put a fix up soon.
Cheers,
Ben.