From: Casper L. <cas...@pr...> - 2016-06-25 14:35:55
|
Hi All, I have created a script to monitor my MooseFS masters with Nagios, and I find that quite some times the follower master is in desync state. On a check interval of 5 minutes, a desync state happens about a dozen times a day. The amount of time time happens seems to be correlating with the (write?) load the file system is taking. I'm guessing this is expected behaviour? I'm reading the MATOCL_INFO (PROTO_BASE+511) data, the same mfs.cgi does in line 1836: *workingstate,nextstate,stablestate,sync,leaderip,changetime,metaversion = struct.unpack(">BBBBLLQ",data[101:121])* *sync* seems to be 1 most of the time, but sometimes 0. *metaversion* is updated continuously, but *changetime* is always the same. On my system it is a timestamp set at the date and time my master started. Should *changetime* be the date and time of *metaversion*? i.e., continuously updating on any file system changes? If that is the case I can change my nagios script to only warn me if the follower is a certain number of seconds behind. Is there a better way to monitor follower master status? Greetings, Casper |