From: Erik A. H. <er...@he...> - 2002-08-09 20:07:24
|
On Fri, Aug 09, 2002 at 10:33:55AM -0400, Mike Snitzer wrote: > On Mar 25, 2002 Erik Arjan Hendriks (er...@he...) said: > > > A new version of the clustermatic CD was placed on > > www.clustermatic.org today. The diff looks like this: > > > > * Updated BProc (3.1.9): fewer bugs, remote exec hacks, etc. > > > > * Updated Beoboot (lanl 1.2): minor changes. > <snip> > > The MPICH hacks which I've mentioned are included on the CD in the > > mpirun-0.1 tar ball in the tarballs directory. We used those hacks to > > run a 396 process job on Chiba city at Argonne National Lab the other > > day. This clustermatic also survived some boot time torture testing > > on about 200 nodes there. (Thanks guys!) The load spike on the front > > end is severe when they all boot at once but all the nodes we started > > with came up. We've got a guy working on that problem here. > > Hopefully beoboot lanl 1.3 will mostly eliminate that problem. > > Has there been any progress on beoboot lanl 1.3 development? Yup. Lots. It's gotten a bit more complicated than I had in mind but it's actually started working (as of a few days ago). I'm planning on cleaning up a bit and throwing it up "soon". I expect next week. It's looking good right now. I setup 100 nodes (with 40MB of libraries) in 12 seconds yesterday. There's basically no more load spike though 12 seconds isn't really long enough to tell via the usual mechanisms. - Erik |