From: Wilton W. <ww...@ha...> - 2002-10-10 01:38:26
|
Well after working a bit more it appears I may have a genuine bug in _bproc_vrfork_io(), I can seem to only fork off one process at a time, see the attached patch to beoboot../node_up/node_up.c and see what I mean. Anyways we have no ideas here.. and really have no clue as to what is supposed to happen or what is not happening ;) this patch was sort of a "we know what works" now let's just do that type patch.. not for use in real production environments. All I know is that with this patch I am able to boot more than 1 node at the same time. Our local linux hacker had this to say "I don't know what is going on.. this patch is one big hack" when he "fixed" this. ;) I am currently running: beoboot-lanl.1.3 bproc-3.2.1 linux-2.4.19+bproc patches - Wilton Original Message Follows: > I am having a bit of difficulty booting more than one node at the same time, > (booting works if I stagger the booting) something seems to hang up when I > reach the point in boeoboot where it starts the node_up worker processes for > more than 1 node.. > > In /var/beowulf/node.2 .. node.3 .. etc... I see it hangs here: > <SNIP> > ... > nodeup : Plugin vmadlib returned status 0 (ok) > nodeup : No premove function for nodeinfo > nodeup : Starting 2 child processes. > </SNIP> > > In /var/log/messages > <SNIP> > Oct 8 17:43:52 srv001 beoserv: Starting node_up worker for 2 clients. > </SNIP> > > The cluster boots fine if nodeup only starts 1 child process at a time. > > <SNIP> > ... > nodeup : Plugin vmadlib returned status 0 (ok) > nodeup : No premove function for nodeinfo > nodeup : Starting 1 child processes. > nodeup : Running postmove functions > nodeup : Calling postmove for kmod > nodeup : Plugin kmod returned status 0 (ok) > ... > </SNIP> > > <SNIP> > Oct 8 17:50:49 srv001 beoserv: Starting node_up worker for 1 clients. > </SNIP> ----[ Wilton William Wong ]--------------------------------------------- 11060-166 Avenue Ph : 01-780-456-9771 High Performance UNIX Edmonton, Alberta FAX: 01-780-456-9772 and Linux Solutions T5X 1Y3, Canada URL: http://www.harddata.com -------------------------------------------------------[ Hard Data Ltd. ]---- |