|
From: Michał B. <mic...@ge...> - 2010-11-05 11:03:00
|
Hi! My answers are below From: Stas Oskin [mailto:sta...@gm...] Sent: Sunday, October 31, 2010 3:08 PM To: Michał Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] writeWorker time out Hi. Thanks for the explanation. Is this error appears on replications of MFS servers themselves, or on application writing to MFS mount (and as a result, writing to single MFS server)? [MB] This is message from mfsmount. It means that from the moment of sending data to a chunkserver passed to much time. Unless the messages occur very often there is nothing to worry about. Also, if this happens on writing to MFS mount, is this error reported to application as error in writing file? [MB] No. Only after several such failures application gets EIO (input / output error). By default after 30. Here you have not much timeouts so that the application would not know about this. Also, if it happens on the MFS mount side - where the information is kept during these retries? In RAM? On disk? [MB] In RAM. MooseFS mount has write cache with configurable size (check the man pages for details) Kind regards Michał Thanks again! 2010/10/29 Michał Borychowski <mic...@ge...> There is a small timeout set for the write operation (several seconds). It may happen that a single write operation takes several or more seconds. If these messages are sent by different servers, there is nothing to worry about. But if the message is sent mainly by one server (IP in hex C0A8020F = 192.168.2.15) you should investigate it more. In CGI monitor go to the Disks tab and click “hour” in “I/O stats last min (switch to hour,day)” row and sort by “write” in “max time (switch to avg)” column. Now look if there are disks which obviously stay from the others. You can also look at the “fsync” column and sort the results. Maximum times should not exceed 2 seconds (2 million microseconds). You should look for individual disks which may be a bottleneck of the system. "try counter: 1" alone is not a problem – number of trials is set as an option to mfsmount (by default 30). Until mfsmounts reaches this limit write operations are repeated and the application gets the OK status. Regards Michal From: Stas Oskin [mailto:sta...@gm...] Sent: Wednesday, October 20, 2010 1:04 PM To: moosefs-users Subject: [Moosefs-users] writeWorker time out Hi. We noticed the following message in logs: file: 28, index: 7, chunk: 992, version: 1 - writeworker: connection with (C0A8020F:9422) was timed out (unfinished writes: 5; try counter: 1) MFS seems to be working and functioning normally. It seems to be related to write process timing-out, but connection is normal. Can it be caused by slow speed of disks? Also, what counter 1 can do, and where it can be changed? Finally, what operation system will return to application - that write operation has failed? Thanks in advance! |