From: Boyko Y. <b.y...@ex...> - 2011-03-20 12:45:36
|
Hello! I've been using moosefs for a while. I have 3 metadata backup loggers running. I noticed that if I kill mfsmaster process on the master node (simulating power failure), mfsmetalogger crashes (segfault) on the metadata logger node. Here are logs entries: Mar 20 11:45:35 server110 mfsmetalogger[6546]: metadata downloaded 72105B/0.009982s (7.224 MB/s) Mar 20 11:45:35 server110 mfsmetalogger[6546]: changelog_0 downloaded 0B/0.000001s (0.000 MB/s) Mar 20 11:45:35 server110 mfsmetalogger[6546]: changelog_1 downloaded 164193B/0.015491s (10.599 MB/s) Mar 20 11:45:35 server110 mfsmetalogger[6546]: sessions downloaded 3050B/0.001501s (2.032 MB/s) Mar 20 11:46:03 server110 mfsmetalogger[6546]: sessions downloaded 3050B/0.001497s (2.037 MB/s) Mar 20 11:47:00 server110 mfsmetalogger[6546]: sessions downloaded 3050B/0.001246s (2.448 MB/s) Mar 20 11:48:48 server110 mfsmetalogger[6546]: sessions downloaded 3050B/0.001009s (3.023 MB/s) Mar 20 11:48:48 server110 mfsmetalogger[6546]: connection was reset by Master Mar 20 11:49:00 server110 mfsmetalogger[6546]: connecting ... Mar 20 11:49:00 server110 mfsmetalogger[6546]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 11:49:05 server110 mfsmetalogger[6546]: connecting ... Mar 20 11:49:05 server110 mfsmetalogger[6546]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 11:49:06 server110 kernel: mfsmetalogger[6546]: segfault at 0000000000000060 rip 000000318c26119d rsp 00007fff2f368170 error 4 from another metadata logger: Mar 20 13:33:00 server102 mfsmetalogger[5088]: sessions downloaded 3388B/0.000993s (3.412 MB/s) Mar 20 13:34:00 server102 mfsmetalogger[5088]: sessions downloaded 3388B/0.001000s (3.388 MB/s) Mar 20 13:35:00 server102 mfsmetalogger[5088]: sessions downloaded 3388B/0.001000s (3.388 MB/s) Mar 20 13:35:48 server102 mfsmetalogger[5088]: connection was reset by Master Mar 20 13:35:50 server102 mfsmetalogger[5088]: connecting ... Mar 20 13:35:50 server102 mfsmetalogger[5088]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 13:35:55 server102 mfsmetalogger[5088]: connecting ... Mar 20 13:35:55 server102 mfsmetalogger[5088]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 13:35:56 server102 kernel: mfsmetalogger[5088]: segfault at 0000000000000060 rip 0000003c6386119d rsp 00007fff7d13a7d0 error 4 Mar 20 13:37:23 server102 mfsmetalogger[12676]: set gid to 502 Mar 20 13:37:23 server102 mfsmetalogger[12676]: set uid to 502 Mar 20 13:37:23 server102 mfsmetalogger[12676]: connecting ... Mar 20 13:37:23 server102 mfsmetalogger[12676]: open files limit: 5000 Mar 20 13:37:23 server102 mfsmetalogger[12676]: connected to Master Mar 20 13:37:23 server102 mfsmetalogger[12676]: metadata downloaded 72113B/0.013963s (5.165 MB/s) Mar 20 13:37:23 server102 mfsmetalogger[12676]: changelog_0 downloaded 981876B/0.086934s (11.294 MB/s) Mar 20 13:37:23 server102 mfsmetalogger[12676]: changelog_1 downloaded 164193B/0.015978s (10.276 MB/s) Mar 20 13:37:23 server102 mfsmetalogger[12676]: sessions downloaded 3388B/0.001993s (1.700 MB/s) Mar 20 13:39:00 server102 mfsmetalogger[12676]: sessions downloaded 3388B/0.002965s (1.143 MB/s) Mar 20 13:40:00 server102 mfsmetalogger[12676]: sessions downloaded 3388B/0.001986s (1.706 MB/s) Mar 20 13:41:00 server102 mfsmetalogger[12676]: sessions downloaded 3388B/0.000991s (3.419 MB/s) Mar 20 13:41:23 server102 mfsmetalogger[12676]: connection was reset by Master Mar 20 13:41:25 server102 mfsmetalogger[12676]: connecting ... Mar 20 13:41:25 server102 mfsmetalogger[12676]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 13:41:30 server102 mfsmetalogger[12676]: connecting ... Mar 20 13:41:30 server102 mfsmetalogger[12676]: connection failed, error: ECONNREFUSED (Connection refused) Mar 20 13:41:31 server102 kernel: mfsmetalogger[12676]: segfault at 0000000000000060 rip 0000003c6386119d rsp 00007fff5207ee20 error 4 Both machines are running centos 5.5, x86_64, mfs-1.6.20-2, same for the master. Also, not sure if related, but while running tests - killing mfsmaster process and trying to restore from a metadata logger - sometimes I am unable to create the metadata.mfs data file, getting the following message: [root@server102 mfs]# mfsmetarestore -a -d /var/lib/mfs file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead loading objects (files,directories,etc.) ... ok loading names ... ok loading deletion timestamps ... ok checking filesystem consistency ... ok loading chunks data ... ok connecting files and chunks ... ok hole in change files (entries from 791301 to 791305 are missing) - add more files Wondering why are these entries missing. As mfsmetalogger process crashes after the mfsmaster process is killed, can this be related? (btw, I'm building the metadata.mfs file as suggested by Michal Borychowski in another email regarding a bug in moosefs when using snapshots) Can't tell for sure but I think that if I clear the /var/lib/mfs folder (delete all the logs/files) and then start mfsmetalogger clean, there are no issues when restoring metadata.mfs - all goes fine (at least for the 10 times I've tried so far). So the 'add more files' errors may be related to having old changelogs in /var/lig/mfs, can anyone confirm this? Anyone having similar issues? Thanks a lot! Boyko |