From: Nicholas H. <he...@se...> - 2003-04-09 22:51:56
|
On Wed, 9 Apr 2003 16:04:19 -0600 er...@he... wrote: > > Hrm.... The only plausible reason I can think of for the kill -9 to > not work is that it's actually blocked in kernel space somewhere. It > could be that the process is getting signaled while it's waiting for > some remote request to complete. Most likely in a bpr_rsyscall. I > think a message trace for all the pids involved would be very > interesting here. If that's the case we need to figure out what that > remote request is and (of course) why it's not completing in a > reasonable amount of time. Sure will do. > > I suspect there's a problem in the signal forwarding and the remote > system call stuff that the slave side does. That code *looks* ok to > me but maybe there's a problem. Seeing a message trace for the PIDs > involved should shed some light on this. > > Also, process 5377 reparenting to bpslave is normal. bpslave is the > "child reaper" (instead of init) for bproc managed processes on the > nodes. This is necessary for ptrace to work properly. I think the > parents exited and it didn't so that reparent is correct. Ok -- I just thought it weird that it reparented, but didn't die from kill -9. I will get those message traces to you as soon as I can ( most likely in the morning ). Thanks a ton Erik :) Nic -- Nicholas Henke Linux Cluster Systems Programmer Liniac Project - Univ. of Pennsylvania |