From: Nicholas H. <he...@se...> - 2003-07-09 17:41:30
|
On Mon, 2003-07-07 at 15:20, er...@he... wrote: > I have a hunch about what might be going on here. There's some > potential for badness in exit_notify with BProc. kill_pg and > is_orphaned_pgrp might end up setting the process state back to > RUNNING instead of ZOMBIE. Then they could get hung up because the > ghost is gone because it's already exited. > > I've attached a revised patch which I think should fix that. Can you > try it an see if it helps? It seems to have helped, but not solved the problem. It seems like more of the processes are running, and not getting hung, but there were a few that did hang. I was able to do a 'top->bottom' kill -9 with a 'sleep 1' between, and in one case it worked, but in another, I had to go to the node again and kill -9 the process there. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania |