From: Piotr R. K. <pio...@mo...> - 2018-02-15 23:23:40
|
Ok, so this is not what I thought. I suppose it could have been some network issue on Master Server Peter -- Piotr Robert Konopelko | mobile: +48 601 476 440 MooseFS Client Support Team | moosefs.com <http://moosefs.com/> GitHub <https://github.com/moosefs/moosefs> | Twitter <https://twitter.com/moosefs> | Facebook <https://www.facebook.com/moosefs> | LinkedIn <https://www.linkedin.com/company/moosefs> > On 14 Feb 2018, at 4:57 AM, Michael Tinsay <mic...@ho...> wrote: > > The words "foreground" and "background" was *not* found in the yesterday's syslog. > > CGI says "Saved in background" > > From: Piotr Robert Konopelko <pio...@mo...> > Sent: Wednesday, February 14, 2018 5:24:50 AM > To: Michael Tinsay > Cc: MooseFS-Users > Subject: Re: [MooseFS-Users] Some glitch happened... how to troubleshoot the next time it happens? > > Hi Michael, > > could you please grep your syslog by "foreground" word or "background" word? > Or, please check on CGI, what is the status in "INFO" tab, column named "last metadata save status" - background or foreground? > > Thanks, > Peter > > -- > Piotr Robert Konopelko | mobile: +48 601 476 440 > MooseFS Client Support Team | moosefs.com <http://moosefs.com/> > > GitHub <https://github.com/moosefs/moosefs> | Twitter <https://twitter.com/moosefs> | Facebook <https://www.facebook.com/moosefs> | LinkedIn <https://www.linkedin.com/company/moosefs> > >> On 13 Feb 2018, at 10:22 AM, Michael Tinsay <mic...@ho... <mailto:mic...@ho...>> wrote: >> >> So one of my chunk servers is currently doing an internal rebalance as I had to replace a couple of disks. While monitoring its progress via the web interface, suddenly the cgi server could not see the master. After a few of frantic refreshing of the web page, the master came back up. >> >> I took a look at the syslog and saw these: >> >> Feb 13 17:00:18 HO-MFSMaster01 mfsmaster[872]: child finished >> Feb 13 17:00:18 HO-MFSMaster01 mfsmaster[872]: store process has finished - store time: 18.531 >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.103 / port: 9422, usedspace: 14225786048512 (13248.80 GiB), totalspace: 18739671220224 (17452.68 GiB) >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.101 / port: 9422, usedspace: 14516117061632 (13519.19 GiB), totalspace: 18582972526592 (17306.74 GiB) >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.102 / port: 9422, usedspace: 15202350829568 (14158.29 GiB), totalspace: 18827063713792 (17534.07 GiB) >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.110) has been closed by peer >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.112) has been closed by peer >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.111) has been closed by peer >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.111) has been closed by peer >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.99) has been closed by peer >> Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.112) has been closed by peer >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.103:9422,3) >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.103 / port: 9422, usedspace: 14225795702784 (13248.80 GiB), totalspace: 18739671220224 (17452.68 GiB) >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.102:9422,2) >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.102 / port: 9422, usedspace: 15202350829568 (14158.29 GiB), totalspace: 18827063713792 (17534.07 GiB) >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunk 0000000001C0B151_000000DE: there are no copies >> Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunk 0000000001C0B1B7_000000F8: there are no copies >> Feb 13 17:00:31 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.101:9422,1) >> Feb 13 17:00:31 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.101 / port: 9422, usedspace: 14516117061632 (13519.19 GiB), totalspace: 18582972526592 (17306.74 GiB) >> Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.102 / port: 9422 has been fully removed from data structures >> Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.101 / port: 9422 has been fully removed from data structures >> Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.103 / port: 9422 has been fully removed from data structures >> Feb 13 17:01:01 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.103 / port: 9422 >> Feb 13 17:01:02 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.102 / port: 9422 >> Feb 13 17:01:03 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.101 / port: 9422 >> >> No explicit error message (as far as I'm aware). Just notification that all chunkservers and clients disconnected at the same time and the chunkservers and clients reconnected around 15 seconds later. >> >> Is there a way by which I can monitor is this happens a lot? >> >> By the way, the machines are all running version 3.0.100. >> >> >> --- mike t. >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot_________________________________________ <http://sdm.link/slashdot_________________________________________> >> moosefs-users mailing list >> moo...@li... <mailto:moo...@li...> >> https://lists.sourceforge.net/lists/listinfo/moosefs-users <https://lists.sourceforge.net/lists/listinfo/moosefs-users> > > > Best regards, > Peter > > -- > Piotr Robert Konopelko | mobile: +48 601 476 440 > MooseFS Client Support Team | moosefs.com <http://moosefs.com/> > > GitHub <https://github.com/moosefs/moosefs> | Twitter <https://twitter.com/moosefs> | Facebook <https://www.facebook.com/moosefs> | LinkedIn <https://www.linkedin.com/company/moosefs> |