Re: [SSI-users] 2-node OpenSSI-1.2.2 cluster hangs on "top" and "ps"
Brought to you by:
brucewalker,
rogertsang
From: Roger T. <rog...@gm...> - 2005-12-09 17:24:23
|
Hi John, We have had people compile the entire kernel on the cluster without a glitch. I think you may be having network/hardware problems. If you dig the mailing list you will find that Maurice had a similar problem with period hangs and such while migrating large processes (can be similar to rsync activity). We eventually nailed it down to a network driver problem. Roger On 12/9/05, John David <jo...@na...> wrote: > > I am running FC2 + OpenSSI 1.2.2 set up successfully with a 2-node i686 > cluster in a testing environment. The installation is fresh and stock, > with no patches or modifications. I have limited experience with > OpenSSI, but it seems to be the solution we've been waiting for. > > I have been testing it lately before using it in production. During my > tests, I tried to use "rsync" to save copies of important files to > another machine. Rsync would, at times, hang hard -- something I have > experienced with rsync before, so I did not attribute it to OpenSSI. > During this time, I could (from another terminal or physically at the > node) type "top" and the machine would hang. Additionally, trying "ps > -aux" would list a few processes, then hang, list another, hang, etc. > for a very long time -- I don't think I ever got a complete list of > processes. > > Thinking this to all be rsync-specific, I recently decided to omit rsync > and instead opt for tar. So I began to build the production cluster to > be used as our server. > > However, in the process, as I was compiling certain programs I had the > same exact symptoms. This really concerns me! Is OpenSSI 1.2.2 stable > enough for production use? I would think that, regardless of how bad a > crash is on a process, I would be able to "top" or "ps" to find out the > status. I have stripped the cluster down to its most basic form, but am > still running into this problem. Could this be hardware- or > network-related? Would it be possible to locate the problem through > logs? Any insight would be very helpful in deciding which route to take > (i.e. whether to use OpenSSI!) > > TIA, > John David > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log > files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=3D7637&alloc_id=3D16865&op=3Dclick > _______________________________________________ > Ssic-linux-users mailing list > Ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-users > |