From: Nicholas H. <he...@se...> - 2002-02-06 20:47:13
|
More info on last bug-- ---------- Forwarded message ---------- Date: Wed, 6 Feb 2002 15:44:43 -0500 From: Daniel Widyono <wi...@ci...> To: Daniel Widyono <wi...@ci...> Cc: Nicholas Henke <he...@se...> Subject: Re: bpsh bug Nic, more info. I've selected different subsets, i.e. [root@admin root]# bpsh 0-31 hostname|wc -l 32 [root@admin root]# bpsh 0-42 hostname|wc -l 43 [root@admin root]# bpsh 32-63 hostname|wc -l 32 [root@admin root]# bpsh 21-63 hostname|wc -l 43 Those above consistenly worked fine. When we get to the 44th node is when things go awry (it takes longer for bpsh to return, the hangup is between the return from the 43rd node and the return from the 44th node): [root@admin root]# bpsh 0-43 hostname|wc -l 44 [root@admin root]# bpsh 0-43 hostname|wc -l 43 [root@admin root]# bpsh 20-63 hostname|wc -l 44 [root@admin root]# bpsh 20-63 hostname|wc -l 43 It looks like there's some bound being hit between using 43 nodes and using 44 nodes. I think it might be on the bproc or clubmask side (if bproc communicates with clubmask somehow), since a simple sleep 5 && echo script works fine (bpsh -a runs on all and returns all nodes); therefore it looks like a timing issue on the server end. Of course, not having read the source code and actually debugging, this is pure conjecture. Dan W. On Wed, Feb 06, 2002 at 03:31:25PM -0500, Daniel Widyono wrote: > Just rebooted all clients, diagnose -n now reports all 64 idle. bpstat shows > all 64 up. Here is some sample output: > > [root@admin root]# bpsh -a echo hi|wc -l > 45 > [root@admin root]# bpsh -a echo hi|wc -l > 64 > [root@admin root]# bpsh -a echo hi|wc -l > 64 > [root@admin root]# bpsh -a echo hi|wc -l > 44 > > bpsh -a hostname yielded 44 replies once: I get node0 through node42 in > order, then node63. By the way, bpsh -a hostname when showing all nodes, > shows nodes 43 through 63 out of order. Wacky? Did those hosts get inserted > into the database incorrectly, get their IP addresses in the wrong order, or > something? > > Dan W. > > On Wed, Feb 06, 2002 at 02:34:32PM -0500, Nicholas Henke wrote: > > Could you redirect the output to a file and send it to me so that I could > > send it to Erik as a bug report. > > -- > -- Daniel Widyono http://www.cis.upenn.edu/~widyono > -- Linux Cluster Group, CIS Dept., SEAS, University of Pennsylvania > -- Mail: Rm 556, CIS Dept 200 S 33rd St Philadelphia, PA 19104 -- -- Daniel Widyono http://www.cis.upenn.edu/~widyono -- Linux Cluster Group, CIS Dept., SEAS, University of Pennsylvania -- Mail: Rm 556, CIS Dept 200 S 33rd St Philadelphia, PA 19104 |