From: henken <he...@se...> - 2002-01-17 15:44:02
|
Hey-- I am running 2.4.16 with bproc-3.1.5. I am still using the infamous 'noop.sh' test script to try and break things. I have noticed that the stability is much better, but I have noticed a few things. The first is that bpsh will seem to hang indefinately, and if the bpsh is kill -9'd, the program it was executing goes into zombie state. I didn't see any errors in the logs on the nodes or master for that. The second error I saw, did give me some error message, but there was no core dump from bpmaster. Here are the messages: Jan 29 11:19:29 master /usr/sbin/bpmaster: FATAL: assoc_find: invalid pid -11553 After that, bpmaster was dead and of course the bpsh would give the usual errors. I have tried using -m on bpmaster to get message traces, but I dont seem to have enough harddrive space to watch those as this takes around 350K iterations of noop in parallel on 4 processors ( 1.2 mil procs total) to trigger. Any ideas on how to give you more information on what is happening? Nic -- Nicholas Henke Undergraduate - Engineerring 2002 -- Senior Architect and Developer Liniac Project - University of Pennsylvania http://clubmask.sourceforge.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There's nothing like good food, good beer, and a bad girl. |