From: Peter C. <pe...@co...> - 2005-09-30 20:45:10
|
On Tue, Sep 27, 2005 at 03:13:32PM -0300, Peter Cordes wrote: > I have been running 2.4.27-om20041102-tab (pe...@mo...) > (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 SMP Wed Sep 7 on my dual Opteron > cluster. The master node has 4GB of RAM, and the other 7 nodes have 2GB. > All with gigabit ethernet. The master has an Intel e100 built in to the > mobo too, which is used to connect to the outside world. > > My kernel config is > CONFIG_MOSIX=y > # CONFIG_MOSIX_TOPOLOGY is not set > CONFIG_MOSIX_SECUREPORTS=y > CONFIG_MOSIX_DISCLOSURE=3 > # CONFIG_MOSIX_FS is not set > CONFIG_MOSIX_PIPE_EXCEPTIONS=y > # CONFIG_MOSIX_NO_OOM is not set > CONFIG_MOSIX_EXT_LOCALTIME=y > > My filesystems are JFS on RAID0 and RAID1 partitions of two SATA drives, > connected to the SATA_SIL ports on the Tyan S2882 motherboard. I was also using channel bonding (modprobe bonding mode=4) on my gigabit ethernet interfaces. > > While doing a big file mirroring update with rsync (of BLAST dna sequence > databases...), the kernel logged some errors. I don't think this has > happened consistently during the bi-monthly updates, though. Even after a reboot, doing the big rsync made weird things happen very quickly. For example, the data in the disk cache (or pagecache? I don't know) gets corrupted, so some programs start crashing when you run then (e.g. dstat, which is written in python). Using debsums to verify file md5 hashes showed that there were in fact "bad" file. I think they were only just bad in the cache, so if I hadn't done any apt-get install --reinstall, I would have been fine. Since I did, I had to reinstall again after another reboot. Anyway, no biggie, debsums can make sure everything is ok :). After rebooting without channel bonding yesterday, I haven't seen the same problems during some quick rsync tests on big files. But it's really too early to tell, because I haven't done the full rsync and stuff, because users need the cluster for some big computations! -- #define X(x,y) x##y Peter Cordes ; e-mail: X(peter@cor , des.ca) "The gods confound the man who first found out how to distinguish the hours! Confound him, too, who in this place set up a sundial, to cut and hack my day so wretchedly into small pieces!" -- Plautus, 200 BC |