From: Flow J. <fl...@gm...> - 2011-02-28 11:15:32
|
Hi, We had the similar issue recently and the symptom was it took a long time for mfsmaster to start (but it eventually gets up and running, after about 5mins). Here are what I did to make mfsmaster happy after it starts again: 1. Use the script provided at http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722 to release all reserved files. (Should comment the optional section in it to speed up the process) 2. Delete all the nonexistent trunks on the trunk server. I'm not mfs expert but these steps do make our mfsmaster server happy and it now loads has about 700M metadata in 5 seconds, no error in log file. I'm also curious about the *official solution* from Michal :) Thanks Flow On 02/28/2011 03:25 AM, Stas Oskin wrote: > Hi. > > We got a very strange that happened on our test cluster. > > After a power crash, the mfsmaster syslog is full of following errors: > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005393A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005493A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005293A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005513A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005113A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005093A3_00000001), so create it for future deletion > > These errors appears all the time, and practically hang the mfsmaster. > mfscgi stops working (hangs), and mounts are aborting with following > error: > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > > > Upgrading to .20 didn't help. > > Any idea what this could be and how to resolve it? > > Thanks. > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |