From: Florent C. <Flo...@un...> - 2003-09-12 11:28:44
|
Florent Calvayrac wrote: > Nicholas Henke wrote: > >> On Fri, 2003-09-05 at 05:53, Florent Calvayrac wrote: >> >>> Hi >>> >>> I am in the process of joining an experimental cluster (8 nodes) >>> to a larger one (40+ nodes) using Clustermatic; everything is >>> working fine excepted for the last 4 nodes : I get an >>> >>> bpmaster: Connect from unrecognized node 192.168.32.55 >>> >>> and so on, although the nodes got an IP assigned with RARP and >>> reboot and the very end of the stage 2... >>> >>> loooking at the source code I wonder if this comes >>> from a socket not accepted in "master". How do I increase >>> the number of available sockets ? What configuration file >>> can I have bad written ? >> >> >> >> It sound like you need to add your new nodes in /etc/beowulf/config -- >> or the config file you use for bpmaster. >> >> Nic > > > > thanks again ; but yes, I had increased the number of nodes, > and I did a kill -HUP <beoserv> without it complaining. > I eventually found the problem : with a kill -HUP beoserv just rereads the list of MAC addresses and does not allocate more nodes than when launched the first time. I had to restart beoserv, thus killing the jobs being processed on the older nodes. This should maybe be fixed or included in the documentation ; or at least, maybe declaring at first more nodes than present in the cluster... -- Florent Calvayrac | Tel : 02 43 83 26 26 Laboratoire de Physique de l'Etat Condense | Fax : 02 43 83 35 18 UMR-CNRS 6087 | http://www.univ-lemans.fr/~fcalvay Universite du Maine-Faculte des Sciences | 72085 Le Mans Cedex 9 |