From: Steve W. <st...@pu...> - 2011-03-01 13:42:45
|
Hi, For a couple peak periods of time this morning, my logs were full of messages like the following: Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 0 - fs_writechunk returns status 11 Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 1 - fs_writechunk returns status 11 They all reference the same file: 278364. Does anyone know what this means? And, if so, what I should do about it? Thanks! Steve -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University (765) 496-1946 |
From: Brent A N. <br...@ph...> - 2012-03-27 17:53:50
|
I've been using MooseFS 1.6.20 on 64-bit Ubuntu 10.04 for a year or so, and it's been working rather well. However, I get a certain type of error that will fill my logging space from time-to-time: Mar 23 07:02:01 somehost mfsmount[31484]: file: 4282810, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214721, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214723, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214717, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214722, index: 0 - fs_writechunk returns status 11 This will repeat in long bursts, and then be quiet for awhile (hours or days). The machine on which this occurs is one where MooseFS is mounted for my home directory (Linux/Gnome environment). I have not noticed any corresponding errors in the server logs. The files referenced above are all Google Chrome cache-related. Perhaps all such errors have been from Google Chrome files; most or all that I've looked at previously have been. I'd like to move other users' home directories to MooseFS. It's been doing fine for non-home directories, but this error filling the logs (but not having any other obvious impact) looks to be a small obstacle for home directories. There is also the Openoffice bug which cause it to break with MooseFS, but I can get all the rest of my machines over to LibreOffice without much trouble. We want to be on LibreOffice, anyway. Does anyone know what causes the "fs_writechunk returns status 11" complaints? Is it already fixed in the new release? Is it specific to Google Chrome (and perhaps its fault), or is it just that Google Chrome is active more than anything else, at all times, and is therefore more prone to glitches? Thanks, Brent Nelson Director of Computing Dept. of Physics University of Florida |
From: Steve T. <sm...@cb...> - 2012-03-28 19:57:47
|
On Tue, 27 Mar 2012, Brent A Nelson wrote: > Mar 23 07:02:01 somehost mfsmount[31484]: file: 4282810, index: 0 - > fs_writechunk returns status 11 I've seen exactly one occurrence of this in the two months that I have been using MooseFS (CentOS 5.7, mfs 1.6.20). We don't use Google Chrome, so it wasn't that in my case. I don't know the cause. Steve |
From: Brent A N. <br...@ph...> - 2012-03-28 20:46:37
|
By some near miraculous coincidence, my subject linked up with a thread from a year ago, which explained that the message is due to an attempt to write to a chunk that is already locked for writing by something else. I'm guessing that Google Chrome, with its highly threaded nature, kind of trips over itself when it comes to the cache, sometimes generating huge bursts of warnings from MooseFS (it really can quickly fill /var). Otherwise, you'd probably run into this when running two programs that try to write to the same file, which should be rare and not really an issue. It's a pity that Chrome doesn't have some file to edit so that you can set system-wide defaults, but the '--disk-cache-dir="/dev/null"' command-line option does look promising (and, as I mentioned in the Firefox thread, it seems to give Chrome a substantial speed boost when running on MooseFS). On Wed, 28 Mar 2012, Steve Thompson wrote: > On Tue, 27 Mar 2012, Brent A Nelson wrote: > >> Mar 23 07:02:01 somehost mfsmount[31484]: file: 4282810, index: 0 - >> fs_writechunk returns status 11 > > I've seen exactly one occurrence of this in the two months that I have been > using MooseFS (CentOS 5.7, mfs 1.6.20). We don't use Google Chrome, so it > wasn't that in my case. I don't know the cause. > > Steve > |
From: Giovanni T. <me...@gi...> - 2012-03-28 21:18:03
|
2012/3/28 Brent A Nelson <br...@ph...>: > It's a pity that Chrome doesn't have some file to edit so that you can set > system-wide defaults, but the '--disk-cache-dir="/dev/null"' command-line > option does look promising (and, as I mentioned in the Firefox thread, it > seems to give Chrome a substantial speed boost when running on MooseFS). On Debian/Ubuntu there is: $ cat /etc/chromium/default # Default settings for chromium. This file is sourced by /bin/sh from # /usr/bin/chromium # Options to pass to chromium CHROMIUM_FLAGS="--password-store=detect --disk-cache-dir=/dev/shm/$USER-chromium" -- Giovanni Toraldo http://gionn.net/about-me http://it.linkedin.com/in/giovannitoraldo |
From: Michał B. <mic...@co...> - 2012-03-30 10:41:20
|
Hi! Error 11 means "chunk locked". It appears when several processes at different computers try to write to the same file in parallel. You should not be bothered by this message but it's wise to minimize occurrences of this situation. Kind regards Michał Borychowski MooseFS Support Manager -----Original Message----- From: Brent A Nelson [mailto:br...@ph...] Sent: Tuesday, March 27, 2012 7:54 PM To: moo...@li... Subject: [Moosefs-users] fs_writechunk returns status 11 I've been using MooseFS 1.6.20 on 64-bit Ubuntu 10.04 for a year or so, and it's been working rather well. However, I get a certain type of error that will fill my logging space from time-to-time: Mar 23 07:02:01 somehost mfsmount[31484]: file: 4282810, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214721, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214723, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214717, index: 0 - fs_writechunk returns status 11 Mar 23 07:02:01 somehost mfsmount[31484]: file: 7214722, index: 0 - fs_writechunk returns status 11 This will repeat in long bursts, and then be quiet for awhile (hours or days). The machine on which this occurs is one where MooseFS is mounted for my home directory (Linux/Gnome environment). I have not noticed any corresponding errors in the server logs. The files referenced above are all Google Chrome cache-related. Perhaps all such errors have been from Google Chrome files; most or all that I've looked at previously have been. I'd like to move other users' home directories to MooseFS. It's been doing fine for non-home directories, but this error filling the logs (but not having any other obvious impact) looks to be a small obstacle for home directories. There is also the Openoffice bug which cause it to break with MooseFS, but I can get all the rest of my machines over to LibreOffice without much trouble. We want to be on LibreOffice, anyway. Does anyone know what causes the "fs_writechunk returns status 11" complaints? Is it already fixed in the new release? Is it specific to Google Chrome (and perhaps its fault), or is it just that Google Chrome is active more than anything else, at all times, and is therefore more prone to glitches? Thanks, Brent Nelson Director of Computing Dept. of Physics University of Florida ---------------------------------------------------------------------------- -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Brent A N. <br...@ph...> - 2012-03-30 16:15:32
|
On Fri, 30 Mar 2012, Michał Borychowski wrote: > Hi! > > Error 11 means "chunk locked". It appears when several processes at > different computers try to write to the same file in parallel. Huh, I don't see a reason any other machines or even different mfsmounts on the same machine would have written to my google-chrome cache files at the same time. I'm only running google-chrome on one machine (indeed, it prevents me from starting another instance accidentally on a different machine at the same time, and I checked through all of our computers for stray processes). I do have a cron job running on a different machine to purge old cache files, although I think that was added after my last errors. Would a file deletion even lock a chunk for writing, or would that be purely a metadata operation? How about an atime update (I would think that would have to be a metadata server operation and would not involve the chunks in any way)? Could the servers, in their internal maintenance schedule, be briefly locking the chunks for other reasons? I still wonder if Chrome might be tickling some corner case every now and then and producing these messages with the involvement of just one mfsmount... > You should not be bothered by this message but it's wise to minimize > occurrences of this situation. Indeed, I've observed no problems resulting from this warning, apart from it generating enough complaints to fill a 1.2GB /var after awhile. When it happens, it happens in large bursts. By the way, so far, I've seen no further warnings since disabling the google-chrome cache (which seems much faster, anyway). So far, so good. Thanks, Brent |
From: Steve W. <st...@pu...> - 2011-03-01 14:06:57
|
On 03/01/2011 08:42 AM, Steve Wilson wrote: > Hi, > > For a couple peak periods of time this morning, my logs were full of > messages like the following: > Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 0 - > fs_writechunk returns status 11 > Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 1 - > fs_writechunk returns status 11 > > They all reference the same file: 278364. Does anyone know what this > means? And, if so, what I should do about it? > > Thanks! > > Steve > A further look showed up a handful of other files but 99% of the messages refer to that one file. Additionally, I saw several messages like the following from last night's logs: Mar 1 00:35:03 stanley mfsmount[1253]: file: 161779, index: 0, chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) was timed out (unfinished writes: 5; try counter: 1) Mar 1 00:35:05 stanley mfsmount[1253]: file: 161779, index: 0, chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) was timed out (unfinished writes: 5; try counter: 1) Thanks, Steve |
From: Michal B. <mic...@ge...> - 2011-03-02 11:07:58
|
Hi Steve! Regarding your first question this means "CHUNK_LOCKED" - it may happen when several clients try to write to the same file. The message doesn't mean anything bad but it is better to avoid such situations. Regarding your second question please have a look here: http://www.moosefs.org/moosefs-faq.html#error-messages and the two following answers. Best regards Michal -----Original Message----- From: Steve Wilson [mailto:st...@pu...] Sent: Tuesday, March 01, 2011 3:07 PM To: moo...@li... Subject: Re: [Moosefs-users] fs_writechunk returns status 11 On 03/01/2011 08:42 AM, Steve Wilson wrote: > Hi, > > For a couple peak periods of time this morning, my logs were full of > messages like the following: > Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 0 - > fs_writechunk returns status 11 > Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 1 - > fs_writechunk returns status 11 > > They all reference the same file: 278364. Does anyone know what this > means? And, if so, what I should do about it? > > Thanks! > > Steve > A further look showed up a handful of other files but 99% of the messages refer to that one file. Additionally, I saw several messages like the following from last night's logs: Mar 1 00:35:03 stanley mfsmount[1253]: file: 161779, index: 0, chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) was timed out (unfinished writes: 5; try counter: 1) Mar 1 00:35:05 stanley mfsmount[1253]: file: 161779, index: 0, chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) was timed out (unfinished writes: 5; try counter: 1) Thanks, Steve ---------------------------------------------------------------------------- -- Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Steve W. <st...@pu...> - 2011-03-03 13:24:59
|
Hi Michal, Thanks for the help! We found that the problematic file was a .history file used by tcsh. The user was running a large number of scripts which spawned shells to run jobs with each one attempting to read/write the same .history file. He now modified his startup script to only access the .history file for interactive shells and this has eliminated almost all of these messages in the logs. Steve On 03/02/2011 06:07 AM, Michal Borychowski wrote: > Hi Steve! > > Regarding your first question this means "CHUNK_LOCKED" - it may happen when > several clients try to write to the same file. The message doesn't mean > anything bad but it is better to avoid such situations. > > Regarding your second question please have a look here: > http://www.moosefs.org/moosefs-faq.html#error-messages and the two following > answers. > > > Best regards > Michal > > > > > -----Original Message----- > From: Steve Wilson [mailto:st...@pu...] > Sent: Tuesday, March 01, 2011 3:07 PM > To: moo...@li... > Subject: Re: [Moosefs-users] fs_writechunk returns status 11 > > On 03/01/2011 08:42 AM, Steve Wilson wrote: >> Hi, >> >> For a couple peak periods of time this morning, my logs were full of >> messages like the following: >> Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 0 - >> fs_writechunk returns status 11 >> Mar 1 07:49:49 stanley mfsmount[1253]: file: 278364, index: 1 - >> fs_writechunk returns status 11 >> >> They all reference the same file: 278364. Does anyone know what this >> means? And, if so, what I should do about it? >> >> Thanks! >> >> Steve >> > A further look showed up a handful of other files but 99% of the > messages refer to that one file. > > Additionally, I saw several messages like the following from last > night's logs: > Mar 1 00:35:03 stanley mfsmount[1253]: file: 161779, index: 0, > chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) > was timed out (unfinished writes: 5; try counter: 1) > Mar 1 00:35:05 stanley mfsmount[1253]: file: 161779, index: 0, > chunk: 310854, version: 1 - writeworker: connection with (80D2302F:9422) > was timed out (unfinished writes: 5; try counter: 1) > > Thanks, > Steve > > > ---------------------------------------------------------------------------- > -- > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT > data > generated by your applications, servers and devices whether physical, > virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University (765) 496-1946 |