From: wkmail <wk...@bn...> - 2011-12-12 23:55:57
|
On 12/12/2011 3:36 PM, René Pfeiffer wrote: > Yes, it did, but I get the "D" state for processes accessing the mount, > too. The logs show messages of the type "chunk xyz has only invalid copies > (1) - please repair it manually", so I guess the metadata is still not > correct (IP addresses and names of the chunk servers haven't changed). > > The biggest problem is that we cannot figure out what the RAID controller > exactly did to the file system of the master server, and we haven't found > any traces of a more recent metadata file. The metalogger system had no > problem, but can it be that the metalogger was/is out of sync due to the > silent file system corruption on the master system? That is a question for the devs, but early in our MFS testing with essentially throwaway kit, we had a master fail with a broken raid. In that case the underlying disk system had been essentially readonly for a few days and no recent data was in /usr/local/var/mfs. However, the metalogger DID have accurate information and we simply recovered using that data using the restore process and then copying over metadata file to the now fixed master. Except for the 'on the fly files' lost when the damm thing crashed, no other data was lost, including files that had been received and written to chunkserver during the time the disk subsystem was out of order. So my guess is that the metaloggers get their info from the masters memory, not from a file on the master. But that is something that should be confirmed by the devs. |