From: Nicholas H. <he...@se...> - 2003-04-10 16:48:23
|
On Wed, 9 Apr 2003 16:04:19 -0600 er...@he... wrote: > > I suspect there's a problem in the signal forwarding and the remote > system call stuff that the slave side does. That code *looks* ok to > me but maybe there's a problem. Seeing a message trace for the PIDs > involved should shed some light on this. > > Also, process 5377 reparenting to bpslave is normal. bpslave is the > "child reaper" (instead of init) for bproc managed processes on the > nodes. This is necessary for ptrace to work properly. I think the > parents exited and it didn't so that reparent is correct. Ok -- The traces are huge -- and frankly I could not discern the interesting parts -- I have placed them on hour web server for your fun and amusement. Here is the ps output before and after the kill -9 569 ? S 0:02 /usr/sbin/bpslave -m /scratch/bpslave_new.strace -r 192.168.0.223 2223 570 ? S 0:00 \_ /usr/sbin/bpslave -m /scratch/bpslave_new.strace -r 192.168.0.223 2223 624 ? S 0:00 \_ mond -d 3271 ? S 0:00 \_ /bin/sh /proc/self/fd/3 /scratch/user/sfischer/slot_1/result /genomics/binf/scratch/dotsBuilds/nicTest/mus/similarity/f 3272 ? S 0:00 \_ /usr/bin/perl /home/sfischer/gushome/bin/blastSimilarity --blastBinDir /genomics/share/pkg/bio/wu-blast/current --d 4923 ? S 0:00 \_ sh -c /genomics/share/pkg/bio/wu-blast/current/blastx /scratch/user/sfischer/prodom.fsa seqTmp -wordmask=seg+xn 4924 ? S 0:00 \_ /genomics/share/pkg/bio/wu-blast/current/blastx /scratch/user/sfischer/prodom.fsa seqTmp -wordmask seg+xnu 4937 ? S 0:00 \_ /genomics/share/pkg/bio/wu-blast/current/blastx /scratch/user/sfischer/prodom.fsa seqTmp -wordmask seg+ 4938 ? S 0:00 \_ /genomics/share/pkg/bio/wu-blast/current/blastx /scratch/user/sfischer/prodom.fsa seqTmp -wordmask kill -9 4924 4937 4938 569 ? S 0:02 /usr/sbin/bpslave -m /scratch/bpslave_new.strace -r 192.168.0.223 2223 570 ? S 0:00 \_ /usr/sbin/bpslave -m /scratch/bpslave_new.strace -r 192.168.0.223 2223 624 ? S 0:00 \_ mond -d 4938 ? S 0:00 \_ /genomics/share/pkg/bio/wu-blast/current/blastx /scratch/user/sfischer/prodom.fsa seqTmp -wordmask seg+xnu W 3 T 1000 B 3271 ? S 0:00 \_ /bin/sh /proc/self/fd/3 /scratch/user/sfischer/slot_1/result /genomics/binf/scratch/dotsBuilds/nicTest/mus/similarity/f 3272 ? S 0:00 \_ /usr/bin/perl /home/sfischer/gushome/bin/blastSimilarity --blastBinDir /genomics/share/pkg/bio/wu-blast/current --d 4923 ? Z 0:00 \_ [sh <defunct>] The trace is located at http://www.liniac.upenn.edu/~henken/bproc/node48bpslave.strace. I will send another node's info just for comparison. Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania |