From: Flow J. <fl...@gm...> - 2011-03-01 16:12:39
|
This is nice but it would be great if the one week period is configurable (by reading the link below, it was 2 hrs?). Or can we have some native command to release the reserved files and clear nonexistent chunks? I'm asking this because last time we had the issue, all the chunks reported by cgiserver are healthy (has goal = 2), but the master server still claims non-existing chunks. So I had to find and delete them manually in the chunk server file system. Thanks Flow On 03/01/2011 07:51 PM, Michal Borychowski wrote: > > Hi! > > In recent MooseFS versions session are cleaned up only after a week so > this is just a matter of time. > > Regards > > Micha? > > *From:*Flow Jiang [mailto:fl...@gm...] > *Sent:* Monday, February 28, 2011 12:15 PM > *To:* Stas Oskin > *Cc:* moosefs-users > *Subject:* Re: [Moosefs-users] Non-existing chunks hang the mfsmaster > > Hi, > > We had the similar issue recently and the symptom was it took a long > time for mfsmaster to start (but it eventually gets up and running, > after about 5mins). Here are what I did to make mfsmaster happy after > it starts again: > > 1. Use the script provided at > http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722 > <http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722> > to release all reserved files. (Should comment the optional section in > it to speed up the process) > 2. Delete all the nonexistent trunks on the trunk server. > > I'm not mfs expert but these steps do make our mfsmaster server happy > and it now loads has about 700M metadata in 5 seconds, no error in log > file. > > I'm also curious about the *official solution* from Michal :) > > Thanks > Flow > > On 02/28/2011 03:25 AM, Stas Oskin wrote: > > Hi. > > We got a very strange that happened on our test cluster. > > After a power crash, the mfsmaster syslog is full of following errors: > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005393A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005493A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005293A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005513A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005113A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005093A3_00000001), so create it for future deletion > > These errors appears all the time, and practically hang the mfsmaster. > mfscgi stops working (hangs), and mounts are aborting with following > error: > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > > > Upgrading to .20 didn't help. > > Any idea what this could be and how to resolve it? > > Thanks. > > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights.http://p.sf.net/sfu/splunk-dev2dev > > > _______________________________________________ > moosefs-users mailing list > moo...@li... <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users |