From: Stas O. <sta...@gm...> - 2011-02-27 19:25:44
|
Hi. We got a very strange that happened on our test cluster. After a power crash, the mfsmaster syslog is full of following errors: Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005393A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005493A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005293A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005513A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005113A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005093A3_00000001), so create it for future deletion These errors appears all the time, and practically hang the mfsmaster. mfscgi stops working (hangs), and mounts are aborting with following error: error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) Upgrading to .20 didn't help. Any idea what this could be and how to resolve it? Thanks. |
From: 刀刀 <948...@qq...> - 2013-08-01 06:15:17
|
hi: I got some problem,my mfsmaster power crash in July 28th, but i only have July 26th data bak,so i recove July 26th metadata, but now,mfsmaster start OK, mfsmaster cannot mount data,like this: # ./mfsmount -H 1.1.1.1 /mnt/mfs -L 0 error receiving data from mfsmaster #cat /var/log/message mfsmaster[20132]: chunkserver has nonexistent chunk (00000000039C77C5_00000001), so create it for future deletion mfsmaster[20132]: chunkserver has nonexistent chunk (00000000039BF7C5_00000001), so create it for future deletion mfsmaster[20132]: chunkserver has nonexistent chunk (00000000039A77C5_00000001), so create it for future deletion mfsmaster[20132]: chunkserver has nonexistent chunk (000000000395F7C5_00000001), so create it for future deletion how can i do this ? my mfs version 1.16, only upgrade to 1.26 can solve this problem? any other Solution? |
From: Michal B. <mic...@ge...> - 2011-02-28 07:18:29
|
Hi! How does your metadata file look like? What do you have inside "/usr/local/var/mfs" (PREFIX/var/mfs) directory? Regards Michał From: Stas Oskin [mailto:sta...@gm...] Sent: Sunday, February 27, 2011 8:25 PM To: moosefs-users Subject: [Moosefs-users] Non-existing chunks hang the mfsmaster Hi. We got a very strange that happened on our test cluster. After a power crash, the mfsmaster syslog is full of following errors: Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005393A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005493A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005293A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005513A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005113A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005093A3_00000001), so create it for future deletion These errors appears all the time, and practically hang the mfsmaster. mfscgi stops working (hangs), and mounts are aborting with following error: error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) Upgrading to .20 didn't help. Any idea what this could be and how to resolve it? Thanks. |
From: Flow J. <fl...@gm...> - 2011-02-28 11:15:32
|
Hi, We had the similar issue recently and the symptom was it took a long time for mfsmaster to start (but it eventually gets up and running, after about 5mins). Here are what I did to make mfsmaster happy after it starts again: 1. Use the script provided at http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722 to release all reserved files. (Should comment the optional section in it to speed up the process) 2. Delete all the nonexistent trunks on the trunk server. I'm not mfs expert but these steps do make our mfsmaster server happy and it now loads has about 700M metadata in 5 seconds, no error in log file. I'm also curious about the *official solution* from Michal :) Thanks Flow On 02/28/2011 03:25 AM, Stas Oskin wrote: > Hi. > > We got a very strange that happened on our test cluster. > > After a power crash, the mfsmaster syslog is full of following errors: > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005393A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005493A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005293A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005513A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005113A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005093A3_00000001), so create it for future deletion > > These errors appears all the time, and practically hang the mfsmaster. > mfscgi stops working (hangs), and mounts are aborting with following > error: > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > > > Upgrading to .20 didn't help. > > Any idea what this could be and how to resolve it? > > Thanks. > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michal B. <mic...@ge...> - 2011-03-01 11:52:12
|
Hi! In recent MooseFS versions session are cleaned up only after a week so this is just a matter of time. Regards Michał From: Flow Jiang [mailto:fl...@gm...] Sent: Monday, February 28, 2011 12:15 PM To: Stas Oskin Cc: moosefs-users Subject: Re: [Moosefs-users] Non-existing chunks hang the mfsmaster Hi, We had the similar issue recently and the symptom was it took a long time for mfsmaster to start (but it eventually gets up and running, after about 5mins). Here are what I did to make mfsmaster happy after it starts again: 1. Use the script provided at http://sourceforge.net/tracker/?func=detail <http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&ati d=1075722> &aid=3104619&group_id=228631&atid=1075722 to release all reserved files. (Should comment the optional section in it to speed up the process) 2. Delete all the nonexistent trunks on the trunk server. I'm not mfs expert but these steps do make our mfsmaster server happy and it now loads has about 700M metadata in 5 seconds, no error in log file. I'm also curious about the *official solution* from Michal :) Thanks Flow On 02/28/2011 03:25 AM, Stas Oskin wrote: Hi. We got a very strange that happened on our test cluster. After a power crash, the mfsmaster syslog is full of following errors: Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005393A2_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005493A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005293A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005513A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005313A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005113A3_00000001), so create it for future deletion Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent chunk (00000000005093A3_00000001), so create it for future deletion These errors appears all the time, and practically hang the mfsmaster. mfscgi stops working (hangs), and mounts are aborting with following error: error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) Upgrading to .20 didn't help. Any idea what this could be and how to resolve it? Thanks. ---------------------------------------------------------------------------- -- Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Flow J. <fl...@gm...> - 2011-03-01 16:12:39
|
This is nice but it would be great if the one week period is configurable (by reading the link below, it was 2 hrs?). Or can we have some native command to release the reserved files and clear nonexistent chunks? I'm asking this because last time we had the issue, all the chunks reported by cgiserver are healthy (has goal = 2), but the master server still claims non-existing chunks. So I had to find and delete them manually in the chunk server file system. Thanks Flow On 03/01/2011 07:51 PM, Michal Borychowski wrote: > > Hi! > > In recent MooseFS versions session are cleaned up only after a week so > this is just a matter of time. > > Regards > > Micha? > > *From:*Flow Jiang [mailto:fl...@gm...] > *Sent:* Monday, February 28, 2011 12:15 PM > *To:* Stas Oskin > *Cc:* moosefs-users > *Subject:* Re: [Moosefs-users] Non-existing chunks hang the mfsmaster > > Hi, > > We had the similar issue recently and the symptom was it took a long > time for mfsmaster to start (but it eventually gets up and running, > after about 5mins). Here are what I did to make mfsmaster happy after > it starts again: > > 1. Use the script provided at > http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722 > <http://sourceforge.net/tracker/?func=detail&aid=3104619&group_id=228631&atid=1075722> > to release all reserved files. (Should comment the optional section in > it to speed up the process) > 2. Delete all the nonexistent trunks on the trunk server. > > I'm not mfs expert but these steps do make our mfsmaster server happy > and it now loads has about 700M metadata in 5 seconds, no error in log > file. > > I'm also curious about the *official solution* from Michal :) > > Thanks > Flow > > On 02/28/2011 03:25 AM, Stas Oskin wrote: > > Hi. > > We got a very strange that happened on our test cluster. > > After a power crash, the mfsmaster syslog is full of following errors: > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005393A2_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005493A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005293A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005513A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005313A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005113A3_00000001), so create it for future deletion > Feb 27 19:19:21 web1 mfsmaster[30654]: chunkserver has nonexistent > chunk (00000000005093A3_00000001), so create it for future deletion > > These errors appears all the time, and practically hang the mfsmaster. > mfscgi stops working (hangs), and mounts are aborting with following > error: > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > error receiving data from mfsmaster: ETIMEDOUT (Operation timed out) > > > Upgrading to .20 didn't help. > > Any idea what this could be and how to resolve it? > > Thanks. > > > > ------------------------------------------------------------------------------ > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT data > generated by your applications, servers and devices whether physical, virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights.http://p.sf.net/sfu/splunk-dev2dev > > > _______________________________________________ > moosefs-users mailing list > moo...@li... <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users |