From: Sean D. <ag...@sc...> - 2002-04-26 19:50:29
|
We found a bug in bpsh where if you send a single command to several (hundreds) of nodes at once, the output from some nodes would be lost. It turns out that in this case, some of the remote commands would finish running and exit before bpsh ever started handling I/O, therefore the sigchld handler would decrease outstanding_connections, then when bpsh started handling those connections (which were still pending) it decreased outstanding_connections again. This led to a condition where outstanding connections would be way below 0, and io_to_do and late_connections would both be zero, and thus bpsh would stop looping an exit, even though there might still be connections it hasn't handled yet. The attached patch keeps outstanding_connections from being decremented twice for a node that finishes early, thus solving the problem. Sean |