From: Michael T. <mic...@ho...> - 2018-02-15 23:14:55
|
The words "foreground" and "background" was *not* found in the yesterday's syslog. CGI says "Saved in background" ________________________________ From: Piotr Robert Konopelko <pio...@mo...> Sent: Wednesday, February 14, 2018 5:24:50 AM To: Michael Tinsay Cc: MooseFS-Users Subject: Re: [MooseFS-Users] Some glitch happened... how to troubleshoot the next time it happens? Hi Michael, could you please grep your syslog by "foreground" word or "background" word? Or, please check on CGI, what is the status in "INFO" tab, column named "last metadata save status" - background or foreground? Thanks, Peter -- Piotr Robert Konopelko | mobile: +48 601 476 440 MooseFS Client Support Team | moosefs.com<http://moosefs.com/> GitHub<https://github.com/moosefs/moosefs> | Twitter<https://twitter.com/moosefs> | Facebook<https://www.facebook.com/moosefs> | LinkedIn<https://www.linkedin.com/company/moosefs> On 13 Feb 2018, at 10:22 AM, Michael Tinsay <mic...@ho...<mailto:mic...@ho...>> wrote: So one of my chunk servers is currently doing an internal rebalance as I had to replace a couple of disks. While monitoring its progress via the web interface, suddenly the cgi server could not see the master. After a few of frantic refreshing of the web page, the master came back up. I took a look at the syslog and saw these: Feb 13 17:00:18 HO-MFSMaster01 mfsmaster[872]: child finished Feb 13 17:00:18 HO-MFSMaster01 mfsmaster[872]: store process has finished - store time: 18.531 Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.103 / port: 9422, usedspace: 14225786048512 (13248.80 GiB), totalspace: 18739671220224 (17452.68 GiB) Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.101 / port: 9422, usedspace: 14516117061632 (13519.19 GiB), totalspace: 18582972526592 (17306.74 GiB) Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: chunkserver disconnected - ip: 10.77.77.102 / port: 9422, usedspace: 15202350829568 (14158.29 GiB), totalspace: 18827063713792 (17534.07 GiB) Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.110) has been closed by peer Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.112) has been closed by peer Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.111) has been closed by peer Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.111) has been closed by peer Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.99) has been closed by peer Feb 13 17:00:28 HO-MFSMaster01 mfsmaster[872]: connection with client(ip:10.77.77.112) has been closed by peer Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.103:9422,3) Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.103 / port: 9422, usedspace: 14225795702784 (13248.80 GiB), totalspace: 18739671220224 (17452.68 GiB) Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.102:9422,2) Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.102 / port: 9422, usedspace: 15202350829568 (14158.29 GiB), totalspace: 18827063713792 (17534.07 GiB) Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunk 0000000001C0B151_000000DE: there are no copies Feb 13 17:00:30 HO-MFSMaster01 mfsmaster[872]: chunk 0000000001C0B1B7_000000F8: there are no copies Feb 13 17:00:31 HO-MFSMaster01 mfsmaster[872]: csdb: found cs using ip:port and csid (10.77.77.101:9422,1) Feb 13 17:00:31 HO-MFSMaster01 mfsmaster[872]: chunkserver register begin (packet version: 6) - ip: 10.77.77.101 / port: 9422, usedspace: 14516117061632 (13519.19 GiB), totalspace: 18582972526592 (17306.74 GiB) Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.102 / port: 9422 has been fully removed from data structures Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.101 / port: 9422 has been fully removed from data structures Feb 13 17:00:43 HO-MFSMaster01 mfsmaster[872]: server ip: 10.77.77.103 / port: 9422 has been fully removed from data structures Feb 13 17:01:01 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.103 / port: 9422 Feb 13 17:01:02 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.102 / port: 9422 Feb 13 17:01:03 HO-MFSMaster01 mfsmaster[872]: chunkserver register end (packet version: 6) - ip: 10.77.77.101 / port: 9422 No explicit error message (as far as I'm aware). Just notification that all chunkservers and clients disconnected at the same time and the chunkservers and clients reconnected around 15 seconds later. Is there a way by which I can monitor is this happens a lot? By the way, the machines are all running version 3.0.100. --- mike t. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org<http://slashdot.org/>! http://sdm.link/slashdot_________________________________________ moosefs-users mailing list moo...@li...<mailto:moo...@li...> https://lists.sourceforge.net/lists/listinfo/moosefs-users Best regards, Peter -- Piotr Robert Konopelko | mobile: +48 601 476 440 MooseFS Client Support Team | moosefs.com<http://moosefs.com> GitHub<https://github.com/moosefs/moosefs> | Twitter<https://twitter.com/moosefs> | Facebook<https://www.facebook.com/moosefs> | LinkedIn<https://www.linkedin.com/company/moosefs> |