From: 叶俊[技术中心] <jo...@vi...> - 2015-10-08 15:06:31
|
Dear Piotr 1. Thanks for your reply . regarding the log meesage , it have something to do with replication process . We do have MFS upgrade experience for another MFS Cluster A(from 1.6.x to 2.0.x), we will also deploy upgrade to this issue Cluster B. 2. But we have over 100TB data in this issued MFS cluster , restart master would take over 2 hours; regarding to log err , it seems replication process cause that problem . is it able to dig further info , like master can't connect chunks' port 9422, port 9422 timeout ? ________________________________ 发件人: Piotr Robert Konopelko <pio...@mo...> 发送时间: 2015年10月8日 20:37 收件人: 叶俊[技术中心] 抄送: moo...@li... 主题: Re: [MooseFS-Users] reply: reply: mooseFS_issue Hi John, the solution for problems you encounter is to upgrade your MooseFS instance to version 2.0. In MooseFS 2.0 a lot of problems and issues have been fixed. Also - in 2.0 a lot of algorithms (including replication algorithms) have been improved. MooseFS 2.0 is released for more than a year already and it is really stable version. Frankly, MFS 1.6 is no longer supported and we strongly recommend to do the upgrade. Upgrade is a simple process - mainly it is 1. update package version 2. restart the service. But some crucial aspects (like configuration files paths change, order of upgrade) are described in manual. Please take a look at MooseFS Upgrade Manual and MooseFS Step by Step installation Guide. You can find these documents here: https://moosefs.com/documentation/moosefs-2-0.html Please remember, that we support only upgrade from 1.6.27-5 MooseFS version, so if any of your components (especially Master, excepting mounts) is running in different (older) MFS version, you first of all need to upgrade them to 1.6.27-5, an then to the newest MFS 2.0 (2.0.77 at the time of writing this message). Before starting the upgrade process please remember to do a backup of metadata.mfs file. In case of any further questions or problems, you can contact me directly. Best regards, -- Piotr Robert Konopelko MooseFS Technical Support Engineer | moosefs.com<https://moosefs.com/> On 08 Oct 2015, at 10:57 am, 叶俊[技术中心] <jo...@vi...<mailto:jo...@vi...>> wrote: 1. With more information , MFS GUI info is as below : 2. We have 3 MFS cluster , Cluster A rebalance number is 0 , somehow Cluster B with replication disconnect trouble , replication number is 53432; <image001.png> 发件人: 叶俊[技术中心] 发送时间: 2015年10月8日 15:17 收件人: 'moo...@li...<mailto:moo...@li...>' 主题: reply: mooseFS_issue 1. On addtion, the File system is ext4; 2. The old cluster with 8 old chunk server runs well, didn’t meet any issue before; John.ye 发件人: 叶俊[技术中心] 发送时间: 2015年10月8日 15:13 收件人: 'moo...@li...<mailto:moo...@li...>' 主题: mooseFS_issue Dear support team, This is John from vip.com<http://vip.com/>; system administrator team ; our mfs system is : OS: CentOS 6.3 Master: 1 Metalog:1 Chunk:8 (16TB / server) version: 1.6.27 Expend an other chunk: 8 (16TB / Server) version: 1.6.27 Total chunk: 8+8=16 1. Why we expand: since old chunkx8 is over 90% disk usage 2. What issue we meet: After expand another 8 chunk server , old chunk try to replicate data to new chunk , but it fail due to some reason, /var/log/message log of master : Oct 8 14:29:35 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.175:9422) chunk: 000000000AF7D9DC replication status: Disconnected Oct 8 14:29:35 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.171:9422) chunk: 000000000BB7D9DC replication status: Disconnected Oct 8 14:29:36 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.180:9422) chunk: 000000000B68703F replication status: Disconnected Oct 8 14:29:36 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.173:9422) chunk: 000000000B48703F replication status: Disconnected Oct 8 14:29:36 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.175:9422) chunk: 000000000BB8703F replication status: Disconnected Oct 8 14:29:36 GD6-MFS-MASTER-FLASHCACHE-001 mfsmaster[2168]: (10.201.70.172:9422) chunk: 000000000B88703F replication status: Disconnected 3. the error log beyond come out once we setup another 8 chunk server; error log detect every day; Can pls help to give some advice to solve this replication failure issue? a. Why replication fail b. What we can do 4. master config is as below: cat /usr/local/mfs/etc/mfs/mfsmaster.cfg # WORKING_USER = mfs # WORKING_GROUP = mfs # SYSLOG_IDENT = mfsmaster # LOCK_MEMORY = 0 # NICE_LEVEL = -19 # EXPORTS_FILENAME = /usr/local/mfs/etc/mfs/mfsexports.cfg # TOPOLOGY_FILENAME = /usr/local/mfs/etc/mfs/mfstopology.cfg # DATA_PATH = /usr/local/mfs/lib/mfs # BACK_LOGS = 50 # BACK_META_KEEP_PREVIOUS = 1 # REPLICATIONS_DELAY_INIT = 300 # REPLICATIONS_DELAY_DISCONNECT = 3600 # MATOML_LISTEN_HOST = * # MATOML_LISTEN_PORT = 9419 # MATOML_LOG_PRESERVE_SECONDS = 600 # MATOCS_LISTEN_HOST = * # MATOCS_LISTEN_PORT = 9420 # MATOCL_LISTEN_HOST = * # MATOCL_LISTEN_PORT = 9421 # CHUNKS_LOOP_MAX_CPS = 100000 # CHUNKS_LOOP_MIN_TIME = 300 # CHUNKS_SOFT_DEL_LIMIT = 10 # CHUNKS_HARD_DEL_LIMIT = 25 # CHUNKS_WRITE_REP_LIMIT = 2 # CHUNKS_READ_REP_LIMIT = 10 # ACCEPTABLE_DIFFERENCE = 0.1 # SESSION_SUSTAIN_TIME = 86400 # REJECT_OLD_CLIENTS = 0 # deprecated: # CHUNKS_DEL_LIMIT - use CHUNKS_SOFT_DEL_LIMIT instead # LOCK_FILE - lock system has been changed, and this option is used only to search for old lockfile Best regards John.ye System administraotr email: jo...@vi...<mailto:jo...@vi...> VIP.com<http://vip.com/> | 唯品会 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy. ------------------------------------------------------------------------------ _________________________________________ moosefs-users mailing list moo...@li...<mailto:moo...@li...> https://lists.sourceforge.net/lists/listinfo/moosefs-users 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy. |