From: Moshe B. <mos...@mo...> - 2005-01-29 12:30:46
|
You almost certainly have a bad NIC moshe On Jan 28, 2005, at 8:01 AM, Maurice Libes wrote: > hi to all > > some help please i have 2 questions: > > my cluster is under Linux 2.4.22-openmosix-2 and i plan to upgrade > soon > > my problem is that very often my computing processes die with the error > below.. could you explain to me possible reasons which could suggest > me what > can i do? many thanks > > Process 4034(calbio3D.exe), uid=410, killed because it lost > communication > with the remote node where it was running > Process 6984(calbio3D.exe), uid=410, killed because it lost > communication > with the remote node where it was running > Process 28114(calbio3D.exe), uid=410, killed because it lost > communication > with the remote node where it was running > Process 31771(symphonie), uid=531, killed because it lost > communication with the remote node where it was running > Process 1949(symphonie), uid=531, killed because it lost communication > with the remote node where it was running > > > II/ i plan to upgrade from this old version to more recent versions > > what is your advise for a running stable version 2.4.24-2 or 2.4.26-1 ? > > in OM-2.4.26 I saw that the development of oMFS was stopped > is there a replacement for a cluster distributed file system? > or do you advise to compute with a single master node with disk and > computing diskless nodes? > > what about if i share disk with NFS? big trouble? or nothing more > > thanks for your advise and any links > > ML > > > -- > Maurice Libes > Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille > Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, > Case 901 > mailto:mau...@co... F-13288 Marseille cedex 9 > Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=libes |