From: Michał B. <mic...@ge...> - 2010-12-01 08:06:29
|
Hi! You have here some problem with starting the chunkserver. But for sure this has no connection with the TIMEMODE_RUNONCE constant (it says that after clock time change sth has to be run once and not 60 times). We also had some very rare cases that while starting master server, chunkserver got hung up. In 1.6.18 this problem should be eliminated. Kind regards Michal From: kuer ku [mailto:ku...@gm...] Sent: Monday, November 29, 2010 12:18 PM To: moo...@li... Subject: [Moosefs-users] how many times chunckserver will retry when disconnecting from metaserver ? Hi, all, I deloyed a mfs-1.6.15 in my environment, today I found a problem. The appearance is one of mfsmount (FUSE) complained that : Nov 29 18:26:11 storage04 mfsmount[32233]: file: 43, index: 0 - can't connect to proper chunkserver (try counter: 29) I donot know which chunkserver cause this. ??? On web interface, I found storage01, one of chunkservers, is not in the server list. and on storage01, there are some logs in /var/log/messages : Nov 29 14:43:27 storage01 mfsmount[13155]: master: connection lost (1) Nov 29 14:43:27 storage01 mfsmount[13155]: registered to master Nov 29 14:44:12 storage01 mfschunkserver[11730]: Master connection lost ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mfschunkserver found connection lost, but there no logs indicate that mfschunkserver try to reconnect with master Nov 29 15:07:44 storage01 smartd[4268]: System clock time adjusted to the past. Resetting next wakeup time. # the following log happened because I restart chunkserver forcely Nov 29 18:29:59 storage01 h*U¥2[11730]: closing *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: listen on *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connecting ... Nov 29 18:30:13 storage01 mfschunkserver[6764]: open files limit: 10000 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connected to Master and in chunkserver/masterconn.c , I found codes : 1311 main_eachloopregister(masterconn_check_hdd_reports); 1312 main_timeregister(TIMEMODE_RUNONCE,ReconnectionDelay,0,masterconn_reconnect); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it will try to reconnect once ?????? 1313 main_destructregister(masterconn_term); 1314 main_pollregister(masterconn_desc,masterconn_serve); 1315 main_reloadregister(masterconn_reload); I think chunkserver should re-connect to master again and again, until it reachs master. but I does not find that in the code. P.S. I remember that I adjust storage01 's time by ntpdate dateserver. does this affect chunckserver so seriously ?? thanks -- kuer |