From: Owen M. <owe...@bc...> - 2012-10-10 19:06:32
|
On 10 October 2012 20:08, Anthony Scopatz <sc...@gm...> wrote: > So just to confirm this behavior, having run your sample on a couple of my > machines, what you see is that the code looks like it gets all the way to > the end, and then it stalls right before it is about to exit, leaving some > small number of processes (here names python tables_test.py) in the OS. Is > this correct? > More or less. What's really happening is that if your processor pool has N processes, then each time one of the workers hangs the pool will have N-1 processes running thereafter. Eventually when all the tasks have completed (or all workers are hung, something that has happened to me when processing many tasks), the main process will just block waiting for the hung processes. If you're running Linux, when the test is finished and the main process is still waiting on the hung processes, you can just kill the main process. The orphaned processes that are still there afterward are the ones of interest. > It seems to be the case that these failures do not happen when I set the > processor pool size to be less than or equal to the number of processors > (physical or hyperthreaded) that I have on the machine. I was testing this > both on an 32 proc cluster and my dual core laptop. Is this also > the behavior you have seen? > No, I've never noticed that to be the case. It appears that the greater the true parallelism (ie - physical cores on which there are workers executing in parallel) the greater the odds of there being a hang. I don't have any real proof of this though; as with most concurrency bugs, it's tough to be certain of anything. Regards, Owen |