From: Ken <ken...@gm...> - 2012-03-13 02:21:58
|
Hi, Davies and Moosefs I agree Davies' opinion in previous mail, undelete metadata.mfs.bak maybe is the best way. Recovery is very simply. After downloaded the damaged metadata from ChenGang, I began digging the code, comment a few lines(filesystem.c) of "return -1" in fs_loadnodes, fs_loadedges, chunk_load(procedures in restore). And print some count, like fsnodes count, edge count.., now we know inode information is complete, lost a half of edge, chunks totally lost. The most important information is inode ==> chunk id(s). It's complete, data will be remained. Then collect all the chunks id, version info from disk of chunkserver, and write(use a python script) it to a single file(chunks.bin) which format same as metadata.mfs. Here, I made a mistake because of duplicate chunks via goal 2. Make some dirty change of chunk_load(in chunks.c), load chunks.bin instead of "metadata.mfs" At last execute "mfsmetarestore -o metadata.mfs -m metadata.mfs.part". We got the metadata.mfs. This took me almost 4 hours. In the accident, I notice something: a. The reason is disk full of master server, Why damaged occured in metalogger? Take a close look in fs_storeall(filesystem.c). Why unlink ("metadata.mfs") after write metadata failed? b. Keep more copies metadata.mfs in metalogger and mfsmaster maybe good. c. Split the huge metadata.mfs to 3 files: inode, edge, chunk maybe be benefet in diagnosing, performance optimization. I guess. Any suggestion? Best Regards. -Ken On Tue, Mar 13, 2012 at 9:53 AM, Davies Liu <dav...@gm...> wrote: > Hi, > > congratulation ! > > Can you show some details about how to recover the data back ? > > Davies > > > On Tue, Mar 13, 2012 at 4:24 AM, 陈钢 <yik...@gm...> wrote: > >> Hi all. I got some message to report. >> >> The situation I faced is that I got my metadata.mfs broken when >> "mfsmaster restart" include the metadata.mfs on mfsmetalogger. >> >> In fact, the metadata on mfsmetalogger is "more broken" than the metadata >> on mfsmaster.That is wired. >> >> So, I got all my files back but half of them lost their filename. >> >> Lucky,ken...@gm... helped me,and I have a SQLite file which >> stored filesize and filename in it.I have already restored my important >> data 90 percent now. >> >> And , I wrote some script helps me cp the metadata.mfs then rsync it to >> another server every hour. just like then suggestion in >> http://www.moosefs.org/moosefs-faq.html#metadata-backup. >> >> Metadata is really important , it worse a incremental backup. >> >> >> 2012/3/9 陈钢 <yik...@gm...> >> >>> Maybe, 80% data can be found. i still trying. >>> i tried restore every metadata.mfs i have, just not work. >>> >>> >>> 2012/3/8 Olivier Thibault <Oli...@lm...> >>> >>>> Hi, >>>> >>>> Did you solve your problem ? >>>> I had few days ago a mfsmaster crash which went out of memory. >>>> When I tried to restart, it crashed saying that there was no >>>> metadata.mfs file. >>>> I tried "mfsmetarestore -a". It didn't work, saying that >>>> metadata.mfs.back was corrupted. >>>> There was a metadata.mfs.back file and a metadata.mfs.back.tmp file. >>>> metadata.mfs.back was half the size it should be. >>>> I restored from a daily backup the metadata.mfs file, then ran again >>>> 'mfsmetarestore -a', and this time, it worked. I could then start mfsmaster >>>> successfully. >>>> Did you tried that ? I mean, just restore the latest working >>>> metadata.mfs file ? >>>> >>>> HTH. >>>> >>>> Best regards, >>>> >>>> Olivier >>>> >>>> >>>> >>>> Le 07/03/12 04:20, 陈钢 a écrit : >>>> >>>>> In master`s metadata.mfs,i saw nodes part is complete, part of names, >>>>> no free, >>>>> no chunks.. >>>>> >>>>> 2012/3/7 Davies Liu <dav...@gm... <mailto:dav...@gm... >>>>> >> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I had try to recover it from metadata.mfs and metadata_ml.mfs.back, >>>>> but failed. >>>>> >>>>> Because disk is full, mfsmaster had not dump all the metadata into >>>>> disks, >>>>> had part of nodes in metadata.mfs, no names, no edges, no chunks. >>>>> No hope to recover from the broken metadata.mfs, it's too short. >>>>> >>>>> The changelogs are also helpless, only in two days. >>>>> >>>>> The only hope is try to undelete the previous metadata_ml.mfs.back >>>>> from >>>>> metalogger machine, Chenggang had also failed, with extundelete and >>>>> etx3grep, >>>>> maybe some experts can archive this. >>>>> >>>>> The final options is to GUESS the relation between files and chunks >>>>> by >>>>> chunk id and size fo files, if data lost can not been afforded. >>>>> Each chunk is >>>>> combined with crc checksum and real data, if we know the relation >>>>> between >>>>> files and chunksever, then we can get data back. Or we can contruct >>>>> the >>>>> metadata according to the GUESS, the using mfsmaster to recover >>>>> them. >>>>> >>>>> Davies >>>>> >>>>> On Wed, Mar 7, 2012 at 10:56 AM, Ken <ken...@gm... >>>>> <mailto:ken...@gm...>> wrote: >>>>> >>>>> Hi, chengang >>>>> >>>>> I think you should try more, and post detail here. Someone must >>>>> resolve it. >>>>> Maybe you will lost some data in last few minutes, but 250T >>>>> should be saved. >>>>> >>>>> At first, BACKUP all files: >>>>> /var/lib/mfs/* on master >>>>> /var/lib/mfs/* on mfsmetalogger >>>>> >>>>> about restore error: >>>>> >>>>> file 'metadata.mfs.back' not found - will try >>>>> 'metadata_ml.mfs.back' >>>>> instead >>>>> loading objects (files,directories,etc.) ... loading node: >>>>> read >>>>> error: ENOENT (No such file or directory) >>>>> error >>>>> can't read metadata from file: .//metadata_ml.mfs.back >>>>> >>>>> How did you run mfsmetarestore? add -d options? >>>>> If stat(datapath + metadata_ml.mfs.back) fail, these error will >>>>> occur. >>>>> Maybe use strace will show why stat fail exactly. >>>>> >>>>> ps: I am in Beijing now and I can provide more help. >>>>> >>>>> HTH >>>>> >>>>> -Ken >>>>> >>>>> >>>>> >>>>> On Wed, Mar 7, 2012 at 10:21 AM, 陈钢 <yik...@gm... >>>>> <mailto:yik...@gm...**>> wrote: >>>>> >>>>> can not start mfsmaster with the file "78962688 Mar 6 17:18 >>>>> metadata.mfs ".. >>>>> i tried that. :( >>>>> >>>>> 2012/3/7 Ricardo J. Barberis <ric...@da... >>>>> <mailto:ricardo.barberis@**dattatec.com<ric...@da...> >>>>> >> >>>>> >>>>> >>>>> El Martes 06/03/2012, 陈钢 escribió: >>>>> > on master >>>>> [ ... ] >>>>> > -rw-r----- 1 mfs mfs 78962688 Mar 6 17:18 >>>>> metadata.mfs >>>>> > -rw-r--r-- 1 root root 8 Jul 4 2011 >>>>> metadata.mfs.empty >>>>> > -rw-r----- 1 mfs mfs 5984 Mar 6 12:00 >>>>> sessions.mfs >>>>> > -rw-r----- 1 mfs mfs 0 Mar 6 16:46 >>>>> sessions.mfs.tmp >>>>> > -rw-r----- 1 mfs mfs 131072 Mar 6 17:18 >>>>> stats.mfs >>>>> >>>>> You have /var/lib/mfs/metadata.mfs on the master, it >>>>> might not >>>>> be corrupt >>>>> after all? >>>>> >>>>> I'd suggest: >>>>> >>>>> - backup /var/lib/mfs to another disk/server (for later >>>>> recovery >>>>> if needed) >>>>> - make sure you have free space in your main disk >>>>> - then simply try to start mfsmaster >>>>> - check mfs.cgi (web interface) if it looks OK >>>>> >>>>> >>>>> BUT: if you can, wait for confirmation from Michał >>>>> Borychowski >>>>> first, in case >>>>> what I'm telling you is not safe. >>>>> >>>>> >>>>> (BTW: You have Reply-To set to che...@cp... >>>>> <mailto:che...@cp...>, I don't know if that's >>>>> >>>>> intentional on your part) >>>>> >>>>> Hope it helps, >>>>> -- >>>>> Ricardo J. Barberis >>>>> Senior SysAdmin / ITI >>>>> Dattatec.com :: Soluciones de Web Hosting >>>>> Tu Hosting hecho Simple! >>>>> >>>>> ------------------------------**------------ >>>>> >>>>> >>>>> >>>>> ------------------------------** >>>>> ------------------------------**------------------ >>>>> Virtualization & Cloud Management Using Capacity Planning >>>>> Cloud computing makes use of virtualization - but cloud >>>>> computing >>>>> also focuses on allowing computing to be delivered as a >>>>> service. >>>>> http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>> >>>>> ______________________________**_________________ >>>>> moosefs-users mailing list >>>>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>>>> <mailto:moosefs-users@lists.**sourceforge.net<moo...@li...> >>>>> > >>>>> >>>>> https://lists.sourceforge.net/** >>>>> lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>>>> >>>>> >>>>> >>>>> ------------------------------**------------------------------* >>>>> *------------------ >>>>> Virtualization & Cloud Management Using Capacity Planning >>>>> Cloud computing makes use of virtualization - but cloud >>>>> computing >>>>> also focuses on allowing computing to be delivered as a service. >>>>> http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>> ______________________________**_________________ >>>>> moosefs-users mailing list >>>>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>>>> <mailto:moosefs-users@lists.**sourceforge.net<moo...@li...> >>>>> > >>>>> >>>>> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> - Davies >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------**------------------------------** >>>>> ------------------ >>>>> Virtualization& Cloud Management Using Capacity Planning >>>>> >>>>> Cloud computing makes use of virtualization - but cloud computing >>>>> also focuses on allowing computing to be delivered as a service. >>>>> http://www.accelacomm.com/jaw/**sfnl/114/51521223/<http://www.accelacomm.com/jaw/sfnl/114/51521223/> >>>>> >>>>> >>>>> >>>>> ______________________________**_________________ >>>>> moosefs-users mailing list >>>>> moosefs-users@lists.**sourceforge.net<moo...@li...> >>>>> https://lists.sourceforge.net/**lists/listinfo/moosefs-users<https://lists.sourceforge.net/lists/listinfo/moosefs-users> >>>>> >>>> >>>> >>>> >>> >> > > > -- > - Davies > |