From: Michał B. <mic...@ge...> - 2010-06-10 08:46:00
|
From: kuer ku [mailto:ku...@gm...] Sent: Wednesday, June 09, 2010 4:19 AM To: Michał Borychowski Subject: Re: [Moosefs-users] how to fix unavailabe chunk ?? Thanks, Michal. 1. Reboot just took place on one of mfsmount box, none of metaservers or chunkservers reboot. So what cause the file lost, it is very interesting. and I can not understand why and how. [MB] Unfortunately we also do not know why it happened, we have too little detailed information on this matter. Interruption of the writing process on a client side could cause that the file would be shorter (lacking some data at the end) or probably could also cause a wrong version number of the chunk. But we hardly imagine why the chunk could have disappeared. Client machine has nothing to do with creating or deleting chunks or with assigning chunks to files. These operations are made only on the level of communication between master and chunkservers. So if none of the chunkservers or the master server had not been rebooted this situation is really unlikely. 2. I try to mfsfilerepair the file, and it worked. I can view the content of the file after repair. But, on web interface, it still shows that : "xxx file currently unavailable. ". How it get these information? and can I find this information somewhere else? [MB] Interface shows data collected with one hour lag so only after one hour you would have updated information. Regards Michał Borychowski thanks all. 2010/6/8 Michał Borychowski <mic...@ge...> Hi! The system says that chunk numbered "D710" is not available (none copy of the 3 set in goal). If all chunkservers and all the disks are connected it means that this chunk simply does not exist. If reboot took place while the file had been written it can happen that such a chunk will be lost. The important question is - was it the reboot of the master server, chunkservers or the whole system? An abrupt reboot of the whole system (eg. lack of electricity) could cause something like this. Fsck on chunkserver could have unfortunately deleted this chunk. It may be worthy to look into "lost+found" on disks connected on mfschunkservers. You can also issue "mfsfilerepair", but this will help only by creating zeros in the "damaged" place of the file. The system would not try to read it (to be exact system does not hang up, it makes lots of retries to read it - waits for the file to show up and after several minutes it gives up). If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: kuer ku [mailto:ku...@gm...] Sent: Saturday, June 05, 2010 12:05 PM To: moo...@li... Subject: [Moosefs-users] how to fix unavailabe chunk ?? Hi, all, I setup a moosefs storage with 1 metaserver + 4 chunkserver. today I found some error messages on http interface : there are some files lost. currently unavailable chunk 000000000000D710 (inode: 331 ; index: 0) * currently unavailable file 331: sink/fifodata/00126/20100604/00126_20100604164805 On box where mfsmount, when executing 'ls' command, it shows : -rw-rw-rw- 1 sea sea 2778996 6 4 17:32 00126_20100604164805 There is a system reboot occurs on 06/04 17:32; it is the last time when file was written. Now, at present, I can list it, but I cannot cat content of the files. Moreover, when you cat this file, the command would hang. I can find some error message in /var/log/messages : Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 15) Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 22) Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 29) and, the goal of the file should be 3, because I set goal of its parent-directory is 3. What is the problem ? how to fix it ?? My environment : metaserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 chunkserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 mfsmount : MFS version 1.6.15 (FUSE library version: 2.7.4) on FreeBSD 6.2 thanks, - kuer |