From: Daniel G. <dg...@ti...> - 2004-03-04 19:22:25
|
Thanks for the quick response, Erik. On Thu, Mar 04, 2004 at 10:51:09AM -0700, er...@he... wrote: > On Thu, Mar 04, 2004 at 11:54:36AM -0500, Daniel Gruner wrote: > > Hi > > > > I have experienced a strange phenomenon on my alpha cluster. It is running > > bproc 3.2.6, on alpha UX machines. For the most part the cluster behaves > > quite normally, allowing me to run jobs, and all the normal stuff. > > > > However, I am testing a fairly short, highly cpu-intensive job, simply to > > have a way of submitting many jobs using bjs and learn its functioning, > > and it appears that the job is so cpu-intensive that the node appears > > dead to the master (i.e. it does not respond or something like that), > > and it dies before the job is completed. Well, actually the job manages > > to complete, but the node is reset anyway. If I do "ps" on the master I > > don't see the job (actually, it says it has not used up any time), nor do > > I see it appear in "top". Is it possible that the node gets TOO busy with > > the computation? I append two files: The program itself (waster.cpp), > > and its output (junk). The command line to run the program was: > > > > bpsh 1 -I /dev/null ./waster > & junk & > > > > The program ends with: > > [1] Exit 255 bpsh 1 -I /dev/null ./waster >& junk > > and the node is reset. From the /var/log/messages file I get: > > Mar 4 11:29:57 racaille bpmaster: ping timeout on slave 1 > > > > It looks like the node is too busy computing to even respond to pings... > > That shouldn't be possible but maybe something is going wrong with > priorities or something. The slave daemon is supposed to run with an > elevated priority to avoid these starvation issues. I saw this kind > of behavior once when somebody decided to start 1500 cpu intensive > processes on a slave node once. In any case, sharing with one other > process shouldn't be a problem. A possible problem could come up if > the slave daemon failed to reset priorities for the child processes it > created. I'm not seeing this problem on our systems here so I suspect > that that's not it. > > First a few questions: > > What kernel version? > Are you starting with a kernel.org kernel? > Any other patches other than BProc? > How many cpus? 2.4.18-27.7hdbp.ux.0 It is basically a stock kernel that has been patched for BProc, and made to run on the UX (ruffian) board. These are single cpu machines, with EV56 at 600 MHz. Here is the list of packages installed: beoboot-cm.1.5-hddcs.2.alpha.rpm beoboot-modules-cm.1.5-22.4.18_27.7hdbp.ux.0.alpha.rpm beonss-1.0.12-lanl.2.1.alpha.rpm bjs-1.2-hd.3.alpha.rpm bproc-3.2.6-hddcs.2.alpha.rpm bproc-devel-3.2.6-hddcs.2.alpha.rpm bproc-libs-3.2.6-hddcs.2.alpha.rpm bproc-modules-3.2.6-2.k2.4.18_27.7hdbp.ux.0.alpha.rpm cmtools-1.1-1.alpha.rpm cmtools-devel-1.1-1.alpha.rpm kernel-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-beoboot-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-doc-2.4.18-27.7hdbp.ux.0.alpha.rpm kernel-source-2.4.18-27.7hdbp.ux.0.alpha.rpm supermon-1.4-hddcs.3.alpha.rpm supermon-modules-1.4-3.k2.4.18_27.7hdbp.ux.0.alpha.rpm It was built by the good folks at HardData (Michal Jaegermann). > > Anyway, a few things you can do to try to figure out that's what's > going on... (I tried this on our alphas real quick and it doesn't > seem to be happening here.) > > - Comment out this stuff in the slave daemon: > > /* bump our priority to RT to avoid getting hosed by errant > * stuff that gets run on our node */ > p.sched_priority = 1; > if (sched_setscheduler(0, SCHED_FIFO, &p)) > syslog(LOG_NOTICE, "Failed to set real-time scheduling for" > " slave daemon.\n"); > > and rebuild the slave daemon. (also reinstall libbpslave.a and > rebuild the phase 2 boot image if you're using the rest of > clustermatic) I will try, if I get some time... > > - bpsh other stuff (e.g. uptime) to the node while this job is > running but before the node dies. Is it responsive or is it just > completely dead? Is it really slow? Nothing else runs. "bpsh 1 uptime" just hangs. The process table on the master does not get updated, and the waster does not appear on "top" at all. The node dies (is killed, actually, because it times out) although the waster job manages to finish. I guess even the bpctl does not get through... > > - run something else alongside the waster that does something like: > > while(1) { printf("hi\n"); fflush(stdout); sleep(1); } > > does it get starved and stop printing? In this case both jobs seem to run, but the node dies anyway, since at least the network stuff does not respond. Even when the job that just prints "hi" is running alone on the node I don't see it updating the process table. You'd think it would work if only because it spends most of its time sleep()ing... > > - Finally, if everything seems to be working but slow or something > like that you can up the ping timeout by adding a line like this > to your /etc/beowulf/config. > > pingtimeout 120 > > 120 is the timeout in seconds. The default is 30. Not a solution yet... Could it be something to do with the network driver? I am using eepro100 cards. Here is the list of loaded modules on the nodes: racaille:dgruner{116}> bpsh 1 /sbin/lsmod Module Size Used by Not tainted vfat 18816 0 (unused) fat 52176 0 [vfat] ext3 94152 0 (unused) jbd 74840 0 [ext3] nfs 140328 2 lockd 84488 0 [nfs] sunrpc 110736 1 [nfs lockd] sym53c8xx_2 101438 0 (unused) de4x5 66452 0 (unused) eepro100 29688 1 bproc 99736 2 vmadump 19320 0 [bproc] Daniel -- Dr. Daniel Gruner dg...@ti... Dept. of Chemistry dan...@ut... University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key |