From: Peter E. <Pet...@un...> - 2004-09-08 08:43:27
|
Hi, we had a 'Scyld' Cluster running for years until some disk on the master died. Some MPI program and other 'normal' programs were running. After the disk crash the bproc system was hanging and after a reboot all nodes didn't come up. Finally, we installed RH9+Clustermatic 4. Main reason for this: we had no good documentation aboud scyld and the old sysadmin had left the institute. Setting up clustermatic was quite easy. Nodes had scratch disks with three partitions: Scyld boot partition (actually an ext2 partition), swap, and /scratch. All three partitions where destroyed - I could not mount any of them. On about 10 of 22 nodes! Even swap was not recognized as such, although fdisk reported all partitions. I suspect something was going wrong with the kernel part of bproc. Is this possible? Best, Peter. -- Astron. Institut Uni Basel, Venusstr. 7, CH-4102 Binningen, Switzerland Phone: +41 (61) 2055-434, fax: -455, http://www.astro.unibas.ch/~ppe/ |