Re: [BProc] pthreads & bproc, round 2

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Tue, 2003-07-01 at 18:39, er...@he... wrote:
> I think user land back-traces are probably useless since this is some
> kind of weird kernel-land problem - and the judging by the message
> traces you've sent me before, the procs are getting caught somewhere
> in exit (i.e. signal received and *trying* to exit).

Ahhh.. that would make sense. 

> 
> It doesn't look like much changed to me between 2.4.18 and 2.4.19 but
> some of the process tree handling code in exit code did.  The examples
> you sent me a while back all show several threads/processes being
> killed at once.  I have a sneaking suspicion that this is somehow a
> race related to many things exiting and getting re-parented at the
> same time.

Ew -- and that is my official opinion of that.

> 
> I have no idea how that's getting hung up but maybe we can determine
> if it's really such a race or not.  To make a long story short, can
> you try the following:

Sure -- I have attached a text file with the results -- slightly more
readable than limiting it to 80 chars in email.
> 
> Kill the threads one at time and see if they still get hung up in that
> weird state.  A half a second in between kills should be more than
> enough.  Then maybe bottom up or top->bottom might be interesting.

Basically -- top->bottom : screwed. bottom->top+sleep: ok,
bottom->top+nosleep: screwed.

> 
> I appoligize if I've ased this before: When the threads are hung, does
> the system seem healthy otherwise?  Specifically, no problems creating
> or killing other processes?

Yes it does -- I have no problems ssh'ing or bpsh'ing in and running
anything.

> 
> P.S.  I've attached a quick port of the 3.2.3 patch to 2.4.20.  I
> think it should work.

Thanks! I will see what this produces as well.

Nic
-- 
Nicholas Henke
Penguin Herder & Linux Cluster System Programmer
Liniac Project - Univ. of Pennsylvania