From: Maurice L. <Mau...@co...> - 2005-01-28 16:02:44
|
hi to all some help please i have 2 questions: my cluster is under Linux 2.4.22-openmosix-2 and i plan to upgrade soon my problem is that very often my computing processes die with the error below.. could you explain to me possible reasons which could suggest me what can i do? many thanks Process 4034(calbio3D.exe), uid=3D410, killed because it lost communication with the remote node where it was running Process 6984(calbio3D.exe), uid=3D410, killed because it lost communication with the remote node where it was running Process 28114(calbio3D.exe), uid=3D410, killed because it lost communication with the remote node where it was running Process 31771(symphonie), uid=3D531, killed because it lost communication w= ith the remote node where it was running Process 1949(symphonie), uid=3D531, killed because it lost communication with the remote node where it was running II/ i plan to upgrade from this old version to more recent versions what is your advise for a running stable version 2.4.24-2 or 2.4.26-1 ? in OM-2.4.26 I saw that the development of oMFS was stopped is there a replacement for a cluster distributed file system? or do you advise to compute with a single master node with disk and computing diskless nodes? what about if i share disk with NFS? big trouble? or nothing more thanks for your advise and any links ML --=20 Maurice Libes Tel : +33 (04) 91 82 93 25 Centre d'Oceanologie de Marseille=20 Fax : +33 (04) 91 82 65 48 UMS2196 CNRS- Campus de Luminy, Case = 901=20 mailto:mau...@co... F-13288 Marseille cedex 9 Annuaire : http://annuaire.univ-aix.fr/showuser.php?uid=3Dlibes |