From: 彭智希 <pen...@16...> - 2016-02-29 07:25:19
|
dear all: I use the version of 2.0 and i found that the master can't server any request at every o'clock. It gets righter 3-4 minutes later. I do it according to the solution of https://sourceforge.net/p/moosefs/mailman/message/34310363/, but the result is not satisfied. There is 64G memory total. So i think the memory is enough. The file of metadata.mfs.back is about 4.3G. It is able to or not too big to image the metadate to disk for the child process? I hope any body could give me some hints! Thanks!! |
From: 彭智希 <pen...@16...> - 2016-02-29 07:42:05
|
dear all: I use the version of 2.0 and i found that the master can't server any request at every o'clock. It gets righter 3-4 minutes later. I do it according to the solution of https://sourceforge.net/p/moosefs/mailman/message/34310363/, but the result is not satisfied. There is 64G memory total. So i think the memory is enough. The file of metadata.mfs.back is about 4.3G. It is able to or not too big to image the metadate to disk for the child process? I hope any body could give me some hints! Thanks!! |
From: Aleksander W. <ale...@mo...> - 2016-03-01 07:55:20
|
Hi, I have another question for you. Your 64G master server is used only for MooseFS master or you have other MooseFS components, or some other applications like virtual machines? Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com |
From: 彭智希 <pen...@16...> - 2016-03-01 09:48:30
|
Hi it is only used for mfsmaster, no more MooseFS components! [root@mfs-CNC-GZSX-231 mfs]# free -l total used free shared buff/cache available Mem: 65774444 12521016 35433164 3361080 17820264 49476984 Low: 65774444 30341280 35433164 High: 0 0 0 Swap: 16777212 0 16777212 At 2016-03-01 15:55:11, "Aleksander Wieliczko" <ale...@mo...> wrote: >Hi, >I have another question for you. > >Your 64G master server is used only for MooseFS master or you have other >MooseFS components, or some other applications like virtual machines? > > >Best regards >Aleksander Wieliczko >Technical Support Engineer >MooseFS.com |
From: Aleksander W. <ale...@mo...> - 2016-03-01 11:07:26
|
Hi, Thank you for this information. Would you be so kind end check if you have enabled overcommit_memory in your system? This command should return 1: cat /proc/sys/vm/overcommit_memory Also can you send us results from command: cat /etc/sysctl.conf Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com <moosefs.com> |
From: 彭智希 <pen...@16...> - 2016-03-02 01:09:41
|
Hi [root@mfs-CNC-GZSX-231 ~]# cat /proc/sys/vm/overcommit_memory 1 [root@mfs-CNC-GZSX-231 ~]# cat /etc/sysctl.conf net.ipv4.ip_forward = 0 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.log_martians = 1 kernel.sysrq = 0 net.ipv4.neigh.default.gc_thresh1 = 8192 net.ipv4.neigh.default.gc_thresh2 = 4092 net.ipv4.neigh.default.gc_thresh3 = 8192 net.ipv4.conf.all.accept_source_route = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.icmp_echo_ignore_broadcasts = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_fack = 1 #####vm.max_map_count = 6553000 fs.file-max = 819200 ########## #####TCP sockets net.ipv4.tcp_max_orphans = 400000 #######syn cookies net.ipv4.tcp_max_syn_backlog = 400000 net.ipv4.tcp_syn_retries = 3 net.ipv4.tcp_synack_retries = 5 net.ipv4.tcp_syncookies = 1 ######## ########TIME_WAIT net.ipv4.tcp_max_tw_buckets = 1440000 net.ipv4.tcp_tw_recycle=0 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_timestamps = 1 ######## net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_time = 300 net.core.netdev_max_backlog = 400000 net.ipv4.ip_local_port_range = 1024 65535 ######## net.core.rmem_max = 9174960 net.core.rmem_default = 9174960 net.core.wmem_max = 9174960 net.core.wmem_default = 9174960 net.core.optmem_max = 921600 net.ipv4.tcp_rmem = 28192 565248 819200 net.ipv4.tcp_wmem = 14096 141312 409600 ########### net.ipv4.ipfrag_high_thresh = 5242880 net.ipv4.tcp_slow_start_after_idle = 0 #net.ipv4.tcp_westwood = 1 #net.ipv4.tcp_bic = 1 net.ipv4.ipfrag_low_thresh = 2932160 ################# net.core.somaxconn = 400000 ######################### vm.swappiness = 0 ###Version 2014.05.26 ###### net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_fack = 1 net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_low_latency = 0 net.ipv4.tcp_slow_start_after_idle = 0 ######core kernel.core_pattern = /var/core/core.%e-%p-%t At 2016-03-01 19:07:11, "Aleksander Wieliczko" <ale...@mo...> wrote: >Hi, >Thank you for this information. > >Would you be so kind end check if you have enabled overcommit_memory in >your system? >This command should return 1: >cat /proc/sys/vm/overcommit_memory > >Also can you send us results from command: >cat /etc/sysctl.conf > >Best regards >Aleksander Wieliczko >Technical Support Engineer >MooseFS.com <moosefs.com> |
From: Aleksander W. <ale...@mo...> - 2016-03-02 08:09:28
|
Hi, Thank you for this information. All results look reasonable. Also I would like to add that you don't have problems with fork operation. Metadata dump process took: 24.561 seconds: Feb 29 10:00:24 mfs-CNC-GZSX-231 mfsmaster20936: store process has finished - store time: 24.561 The most alarming aspect in your system log is that all chunkservers disconnected at the same time and 34 seconds after metadata save: Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 (10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11535864147968 (10743.61 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536655306752 (10744.35 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536570953728 (10744.27 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536672366592 (10744.36 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 (10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536639004672 (10744.33 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536581537792 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536540909568 (10744.24 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536369881088 (10744.08 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536581566464 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536586833920 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536620646400 (10744.32 GiB), totalspace: 22425943621632 (20885.79 GiB) Would you be so kind and send us logs from one chunkserver and client machines. We suspect that something is going on in your network. It's look like your master server loosing network connection. Please check your network configuration. We are waiting for your feedback. Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com <moosefs.com> |
From: 彭智希 <pen...@16...> - 2016-03-02 08:32:54
|
Hi the log of mfschunkserver as below: Feb 29 13:32:37 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Feb 29 13:32:37 localhost mfschunkserver[9920]: (read) packet too long (67108864/100000) Feb 29 13:32:37 localhost mfschunkserver[9920]: (read) packet too long (115343360/100000) Feb 29 13:32:42 localhost mfschunkserver[9920]: (read) packet too long (788529152/100000) Feb 29 13:32:42 localhost mfschunkserver[9920]: (read) packet too long (3472493488/100000) Feb 29 13:32:42 localhost mfschunkserver[9920]: (read) packet too long (16777216/100000) Feb 29 13:32:42 localhost mfschunkserver[9920]: got unknown message (type:196609) Feb 29 13:32:42 localhost mfschunkserver[9920]: (read) packet too long (790644820/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (790644820/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (1929256211/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (16777217/100000) Feb 29 13:33:29 localhost mfschunkserver[9920]: (read) packet too long (268435456/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (1392574464/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (4283649346/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: got unknown message (type:1811942144) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (795765091/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (1635085428/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (23070466/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (67108864/100000) Feb 29 13:33:35 localhost mfschunkserver[9920]: (read) packet too long (115343360/100000) Feb 29 13:33:40 localhost mfschunkserver[9920]: (read) packet too long (788529152/100000) Feb 29 13:33:41 localhost mfschunkserver[9920]: (read) packet too long (3472493488/100000) Feb 29 13:33:42 localhost mfschunkserver[9920]: (read) packet too long (16777216/100000) Feb 29 13:33:42 localhost mfschunkserver[9920]: got unknown message (type:196609) Feb 29 13:33:42 localhost mfschunkserver[9920]: (read) packet too long (790644820/100000) Feb 29 14:00:43 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 14:00:46 localhost mfschunkserver[9920]: connecting ... Feb 29 14:00:46 localhost mfschunkserver[9920]: connected to Master Feb 29 14:00:56 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 14:00:56 localhost mfschunkserver[9920]: connecting ... Feb 29 14:00:56 localhost mfschunkserver[9920]: connected to Master Feb 29 15:00:34 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 15:00:36 localhost mfschunkserver[9920]: connecting ... Feb 29 15:00:36 localhost mfschunkserver[9920]: connected to Master Feb 29 15:00:46 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 15:00:46 localhost mfschunkserver[9920]: connecting ... Feb 29 15:00:46 localhost mfschunkserver[9920]: connected to Master Feb 29 16:00:20 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 16:00:21 localhost mfschunkserver[9920]: connecting ... Feb 29 16:00:21 localhost mfschunkserver[9920]: connected to Master Feb 29 17:00:41 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 17:00:46 localhost mfschunkserver[9920]: connecting ... Feb 29 17:00:46 localhost mfschunkserver[9920]: connected to Master Feb 29 18:00:45 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 18:00:46 localhost mfschunkserver[9920]: connecting ... Feb 29 18:00:46 localhost mfschunkserver[9920]: connected to Master Feb 29 18:00:56 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 18:00:56 localhost mfschunkserver[9920]: connecting ... Feb 29 18:00:56 localhost mfschunkserver[9920]: connected to Master Feb 29 19:00:31 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 19:00:36 localhost mfschunkserver[9920]: connecting ... Feb 29 19:00:36 localhost mfschunkserver[9920]: connected to Master Feb 29 19:16:58 localhost mfschunkserver[9920]: (read) packet too long (790644820/100000) Feb 29 19:17:32 localhost mfschunkserver[9920]: (read) packet too long (790644820/100000) Feb 29 21:00:46 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 21:00:51 localhost mfschunkserver[9920]: connecting ... Feb 29 21:00:51 localhost mfschunkserver[9920]: connected to Master Feb 29 22:00:31 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 22:00:36 localhost mfschunkserver[9920]: connecting ... Feb 29 22:00:36 localhost mfschunkserver[9920]: connected to Master Feb 29 23:00:45 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 23:00:46 localhost mfschunkserver[9920]: connecting ... Feb 29 23:00:46 localhost mfschunkserver[9920]: connected to Master Feb 29 23:00:56 localhost mfschunkserver[9920]: masterconn: connection timed out Feb 29 23:00:56 localhost mfschunkserver[9920]: connecting ... Feb 29 23:00:56 localhost mfschunkserver[9920]: connected to Master At 2016-03-02 16:09:18, "Aleksander Wieliczko" <ale...@mo...> wrote: >Hi, > >Thank you for this information. >All results look reasonable. >Also I would like to add that you don't have problems with fork >operation. Metadata dump process took: 24.561 seconds: > >Feb 29 10:00:24 mfs-CNC-GZSX-231 mfsmaster20936: store process has >finished - store time: 24.561 > >The most alarming aspect in your system log is that all chunkservers >disconnected at the same time and 34 seconds after metadata save: > >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 >(10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11535864147968 >(10743.61 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536655306752 >(10744.35 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536570953728 >(10744.27 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536672366592 >(10744.36 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 >(10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536639004672 >(10744.33 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536581537792 >(10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536540909568 >(10744.24 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536369881088 >(10744.08 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536581566464 >(10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536586833920 >(10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) >Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster20936: chunkserver >disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536620646400 >(10744.32 GiB), totalspace: 22425943621632 (20885.79 GiB) > >Would you be so kind and send us logs from one chunkserver and client >machines. > >We suspect that something is going on in your network. >It's look like your master server loosing network connection. > >Please check your network configuration. >We are waiting for your feedback. > >Best regards >Aleksander Wieliczko >Technical Support Engineer >MooseFS.com <moosefs.com> |
From: Aleksander W. <ale...@mo...> - 2016-03-02 09:44:57
|
Hi, It's look really bad. You receiving wrong packages sizes from master. You have some serious network problems. This is the first time when we see such a log entry! Feb 29 13:32:37 localhost mfschunkserver[9920]: *(read) packet too long* (1330533152/100000) Please check all network components - even hardware (NIC card and RAM). I can also suggest to check MTU size - network cards and switch. We are waiting for your reply. Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com <moosefs.com> |
From: 彭智希 <pen...@16...> - 2016-03-02 09:56:43
|
hi the information of ifconfig as below: bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet 115.238.*.* netmask 255.255.255.192 broadcast 115.238.231.191 inet6 fe80::72e2:84ff:fe10:be07 prefixlen 64 scopeid 0x20<link> ether 70:e2:84:10:be:07 txqueuelen 0 (Ethernet) RX packets 38702523073 bytes 5773786695954 (5.2 TiB) RX errors 0 dropped 1 overruns 0 frame 0 TX packets 53013419940 bytes 27359436449810 (24.8 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0:1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet 101.69.*.* netmask 255.255.255.192 broadcast 101.69.175.191 ether 70:e2:84:10:be:07 txqueuelen 0 (Ethernet) eth0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500 ether 70:e2:84:10:be:07 txqueuelen 1000 (Ethernet) RX packets 19234284897 bytes 2899736294954 (2.6 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 32277114680 bytes 16883118272719 (15.3 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xf7d20000-f7d3ffff eth1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 1500 ether 70:e2:84:10:be:07 txqueuelen 1000 (Ethernet) RX packets 19468238178 bytes 2874050401132 (2.6 TiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 20736305277 bytes 10476318189468 (9.5 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xf7d00000-f7d1ffff At 2016-03-02 17:44:46, "Aleksander Wieliczko" <ale...@mo...> wrote: Hi, It's look really bad. You receiving wrong packages sizes from master. You have some serious network problems. This is the first time when we see such a log entry! Feb 29 13:32:37 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Please check all network components - even hardware (NIC card and RAM). I can also suggest to check MTU size - network cards and switch. We are waiting for your reply. Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com |
From: 彭智希 <pen...@16...> - 2016-03-02 10:07:33
|
Hi the information of ethtool as below: [root@mfs-CNC-GZSX-231 mfs]# ethtool bond0 Settings for bond0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 2000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off Link detected: yes [root@mfs-CNC-GZSX-231 mfs]# ethtool bond0:1 Settings for bond0:1: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Speed: 2000Mb/s Duplex: Full Port: Other PHYAD: 0 Transceiver: internal Auto-negotiation: off Link detected: yes At 2016-03-02 17:44:46, "Aleksander Wieliczko" <ale...@mo...> wrote: Hi, It's look really bad. You receiving wrong packages sizes from master. You have some serious network problems. This is the first time when we see such a log entry! Feb 29 13:32:37 localhost mfschunkserver[9920]: (read) packet too long (1330533152/100000) Please check all network components - even hardware (NIC card and RAM). I can also suggest to check MTU size - network cards and switch. We are waiting for your reply. Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com |
From: Aleksander W. <ale...@mo...> - 2016-03-03 08:13:38
|
Hello, I would like to propose another test, to know little bit more about chunkservers disconnections. We have just released MooseFS 2.0.88 version with special option added for metadata save frequency. METADATA_SAVE_FREQ = 1 New parameter allow to set how often master will store metadata. Till version 2.0.88 default metadata save frequency was set permanently to 1 hour. Our test proposition is to: 1. Update MooseFS cluster to version 2.0.88. 2. Change MooseFS master parameter METADATA_SAVE_FREQ = 4 (This is only suggestion). 3. Check if chunkservers disconnections will occur only during metadata save - each 4 hours. We are waiting for your reply and details about chunkservers disconnections. By the way, which BONDING mode you are using? mode=0 (Balance Round Robin) mode=1 (Active backup) mode=2 (Balance XOR) mode=3 (Broadcast) mode=4 (802.3ad) mode=5 (Balance TLB) mode=6 (Balance ALB) Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com <moosefs.com> |
From: 彭智希 <pen...@16...> - 2016-03-03 08:30:33
|
Hi The mode of bond is: [root@mfs-CNC-GZSX-231 ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing (xor) Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth1 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 70:e2:84:10:be:07 Slave queue ID: 0 Slave Interface: eth0 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 70:e2:84:10:be:06 Slave queue ID: 0 At 2016-03-03 16:13:25, "Aleksander Wieliczko" <ale...@mo...> wrote: >Hello, > >I would like to propose another test, to know little bit more about >chunkservers disconnections. > >We have just released MooseFS 2.0.88 version with special option added >for metadata save frequency. > >METADATA_SAVE_FREQ = 1 > >New parameter allow to set how often master will store metadata. Till >version 2.0.88 default metadata save frequency was set permanently to 1 >hour. > >Our test proposition is to: >1. Update MooseFS cluster to version 2.0.88. >2. Change MooseFS master parameter METADATA_SAVE_FREQ = 4 (This is only >suggestion). >3. Check if chunkservers disconnections will occur only during metadata >save - each 4 hours. > >We are waiting for your reply and details about chunkservers disconnections. > > >By the way, which BONDING mode you are using? > >mode=0 (Balance Round Robin) >mode=1 (Active backup) >mode=2 (Balance XOR) >mode=3 (Broadcast) >mode=4 (802.3ad) >mode=5 (Balance TLB) >mode=6 (Balance ALB) > >Best regards >Aleksander Wieliczko >Technical Support Engineer >MooseFS.com <moosefs.com> |
From: 彭智希 <pen...@16...> - 2016-03-03 09:55:53
|
Hi how do i get the version of 2.0.88? Please give me a path to download!! thanks very much!! At 2016-03-03 16:13:25, "Aleksander Wieliczko" <ale...@mo...> wrote: >Hello, > >I would like to propose another test, to know little bit more about >chunkservers disconnections. > >We have just released MooseFS 2.0.88 version with special option added >for metadata save frequency. > >METADATA_SAVE_FREQ = 1 > >New parameter allow to set how often master will store metadata. Till >version 2.0.88 default metadata save frequency was set permanently to 1 >hour. > >Our test proposition is to: >1. Update MooseFS cluster to version 2.0.88. >2. Change MooseFS master parameter METADATA_SAVE_FREQ = 4 (This is only >suggestion). >3. Check if chunkservers disconnections will occur only during metadata >save - each 4 hours. > >We are waiting for your reply and details about chunkservers disconnections. > > >By the way, which BONDING mode you are using? > >mode=0 (Balance Round Robin) >mode=1 (Active backup) >mode=2 (Balance XOR) >mode=3 (Broadcast) >mode=4 (802.3ad) >mode=5 (Balance TLB) >mode=6 (Balance ALB) > >Best regards >Aleksander Wieliczko >Technical Support Engineer >MooseFS.com <moosefs.com> |
From: Aleksander W. <ale...@mo...> - 2016-03-03 10:00:56
|
Hi, Please use this instruction: https://moosefs.com/download.html Best regards Aleksander Wieliczko Technical Support Engineer MooseFS.com <moosefs.com> |
From: Ricardo J. B. <ric...@do...> - 2016-02-29 17:03:04
|
El Lunes 29/02/2016, 彭智希 escribió: > dear all: > I use the version of 2.0 and i found that the master can't server any > request at every o'clock. It gets righter 3-4 minutes later. I do it > according to the solution of > https://sourceforge.net/p/moosefs/mailman/message/34310363/, but the result > is not satisfied. There is 64G memory total. So i think the memory is > enough. The file of metadata.mfs.back is about 4.3G. It is able to or not > too big to image the metadate to disk for the child process? I hope any > body could give me some hints! Thanks!! Probably a slow disk? Check the status of your disk, like latency, IOPS, etc. If possible, put metadata.mfs in a SSD disk and make sure your partitions are correctly aligned, especially if your disk uses 4K sectors, e.g.: # parted --script /dev/sdX align-check optimal N Where you have to replace sdX with your disk name and N with the partition number you want to check (iterate for every partition on your disk). If the partitions are correctly aligned, parted should give you no output. Cheers, -- Ricardo J. Barberis Senior SysAdmin / IT Architect DonWeb La Actitud Es Todo www.DonWeb.com _____ |
From: Piotr R. K. <pio...@mo...> - 2016-02-29 17:14:26
|
Just forwarding the e-mail, because the user is not subscribed to MooseFS-Users list. Regards, Piotr Robert Konopelko > Begin forwarded message: > > From: "Ricardo J. Barberis" <ric...@do...> > Subject: Re: [MooseFS-Users] why master can't server any request at every o'clock? > Date: 29 February 2016 at 5:42:02 PM GMT+1 > To: moo...@li... > > El Lunes 29/02/2016, 彭智希 escribió: >> dear all: >> I use the version of 2.0 and i found that the master can't server any >> request at every o'clock. It gets righter 3-4 minutes later. I do it >> according to the solution of >> https://sourceforge.net/p/moosefs/mailman/message/34310363/, but the result >> is not satisfied. There is 64G memory total. So i think the memory is >> enough. The file of metadata.mfs.back is about 4.3G. It is able to or not >> too big to image the metadate to disk for the child process? I hope any >> body could give me some hints! Thanks!! > > Probably a slow disk? Check the status of your disk, like latency, IOPS, etc. > > If possible, put metadata.mfs in a SSD disk and make sure your partitions are > correctly aligned, especially if your disk uses 4K sectors, e.g.: > > # parted --script /dev/sdX align-check optimal N > > Where you have to replace sdX with your disk name and N with the partition > number you want to check (iterate for every partition on your disk). > > If the partitions are correctly aligned, parted should give you no output. > > Cheers, > -- > Ricardo J. Barberis > Senior SysAdmin / IT Architect > DonWeb > La Actitud Es Todo > www.DonWeb.com > _____ |
From: Piotr R. K. <pio...@mo...> - 2016-02-29 17:35:08
|
Hello 彭智希, could you please send us logs from Master Server while such situation happens? cat /var/log/syslog | grep mfsmaster or cat /var/log/messages | grep mfsmaster Which operating system are you using exactly? Debian? Ubuntu? CentOS? Which version? Which exactly version of MooseFS Master are you ruuning? 2.0.x, x=? Best regards, -- <https://moosefs.com/> Piotr Robert Konopelko MooseFS Technical Support Engineer e-mail : pio...@mo... <mailto:pio...@mo...> www : https://moosefs.com <https://moosefs.com/> <https://twitter.com/MooseFS> <https://www.facebook.com/moosefs> <https://www.linkedin.com/company/moosefs> <https://github.com/moosefs> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received it by mistake, please let us know by e-mail reply and delete it from your system; you may not copy this message or disclose its contents to anyone. Finally, the recipient should check this email and any attachments for the presence of viruses. Core Technology accepts no liability for any damage caused by any virus transmitted by this email. > On 29 Feb 2016, at 8:25 AM, 彭智希 <pen...@16...> wrote: > > dear all: > I use the version of 2.0 and i found that the master can't server any request at every o'clock. It gets righter 3-4 minutes later. I do it according to the solution of https://sourceforge.net/p/moosefs/mailman/message/34310363/ <https://sourceforge.net/p/moosefs/mailman/message/34310363/>, but the result is not satisfied. There is 64G memory total. So i think the memory is enough. > The file of metadata.mfs.back is about 4.3G. It is able to or not too big to image the metadate to disk for the child process? I hope any body could give me some hints! > Thanks!! > > > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: 彭智希 <pen...@16...> - 2016-03-01 01:36:19
|
Hello all: the information of environment list at below: OS: centos 7.1 Version of Mfs: 2.0.68 I cut some log information as below: Feb 29 09:58:29 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 00000000022688C8_00000001: there are no copies Feb 29 09:59:56 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000002B76EEC_00000001: there are no copies Feb 29 10:00:00 mfs-CNC-GZSX-231 mfsmaster[20936]: no metaloggers connected !!! Feb 29 10:00:02 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000002004141_00000001: there are no copies Feb 29 10:00:23 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000000F1F567_00000001: there are no copies Feb 29 10:00:24 mfs-CNC-GZSX-231 mfsmaster[20936]: child finished Feb 29 10:00:24 mfs-CNC-GZSX-231 mfsmaster[20936]: store process has finished - store time: 24.561 Feb 29 10:00:25 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000000EB84BA_00000001: there are no copies Feb 29 10:00:26 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000000F4E3A7_00000001: there are no copies Feb 29 10:00:29 mfs-CNC-GZSX-231 mfsmaster[20936]: chunk 0000000000ED73C0_00000001: there are no copies Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: main master server module: (ip:115.238.*.*) write error: EPIPE (Broken pipe) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,5), but server is still connected Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: can't accept chunkserver (ip: 115.238.*.* / port: 9422) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 (10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11535864147968 (10743.61 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536655306752 (10744.35 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536570953728 (10744.27 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536672366592 (10744.36 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 (10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536639004672 (10744.33 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536581537792 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536540909568 (10744.24 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536369881088 (10744.08 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536581566464 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238..*.* / port: 9422, usedspace: 11536586833920 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver disconnected - ip: 115.238.*.* / port: 9422, usedspace: 11536620646400 (10744.32 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,11) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536672366592 (10744.36 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,8) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536570953728 (10744.27 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,9) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536655306752 (10744.35 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,3) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11535864147968 (10743.61 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.231.144:9422,1) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536620646400 (10744.32 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,10) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536586833920 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,12) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536369881088 (10744.08 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.231.151:9422,6) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536581566464 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,4) Feb 29 10:00:58 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536540909568 (10744.24 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:59 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,2) Feb 29 10:00:59 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536581537792 (10744.28 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:00:59 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,7) Feb 29 10:00:59 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11536639004672 (10744.33 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:01:00 mfs-CNC-GZSX-231 mfsmaster[20936]: csdb: found cs using ip:port and csid (115.238.*.*:9422,5) Feb 29 10:01:00 mfs-CNC-GZSX-231 mfsmaster[20936]: chunkserver register begin (packet version: 6) - ip: 115.238.*.* / port: 9422, usedspace: 11537568169984 (10745.20 GiB), totalspace: 22425943621632 (20885.79 GiB) Feb 29 10:01:12 mfs-CNC-GZSX-231 mfsmaster[20936]: server ip: 115.238.*.* / port: 9422 has been fully removed from data structures At 2016-03-01 01:34:49, "Piotr Robert Konopelko" <pio...@mo...> wrote: Hello 彭智希, could you please send us logs from Master Server while such situation happens? cat /var/log/syslog | grep mfsmaster or cat /var/log/messages | grep mfsmaster Which operating system are you using exactly? Debian? Ubuntu? CentOS? Which version? Which exactly version of MooseFS Master are you ruuning? 2.0.x, x=? Best regards, -- Piotr Robert Konopelko MooseFS Technical Support Engineer e-mail : pio...@mo... www : https://moosefs.com This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received it by mistake, please let us know by e-mail reply and delete it from your system; you may not copy this message or disclose its contents to anyone. Finally, the recipient should check this email and any attachments for the presence of viruses. Core Technology accepts no liability for any damage caused by any virus transmitted by this email. On 29 Feb 2016, at 8:25 AM, 彭智希 <pen...@16...> wrote: dear all: I use the version of 2.0 and i found that the master can't server any request at every o'clock. It gets righter 3-4 minutes later. I do it according to the solution of https://sourceforge.net/p/moosefs/mailman/message/34310363/, but the result is not satisfied. There is 64G memory total. So i think the memory is enough. The file of metadata.mfs.back is about 4.3G. It is able to or not too big to image the metadate to disk for the child process? I hope any body could give me some hints! Thanks!! ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |