From: Davies L. <dav...@gm...> - 2012-05-21 07:33:17
|
This bug was fixed in 1.6.26. On Fri, Apr 20, 2012 at 1:58 PM, Ken <ken...@gm...> wrote: > Nobody interesting this? > > There is one-thousandth of the possibility which cause file damage. > > Maybe log like: > mfsmaster[7192]: chunk 00000000000EC3A8 has only invalid copies (2) - please > repair it manually > mfsmaster[7192]: chunk 00000000000EC3A8_00000002 - invalid copy on (10.1.1.3 > - ver:00000001) > > > -Ken > > > > > On Thu, Apr 19, 2012 at 9:01 AM, Ken <ken...@gm...> wrote: >> >> hi, list >> >> We found some crashes in mfschunkserver(1.6.24) in stopping. The test >> script maybe weired: >> >> while true: >> select a ChunkServer >> stop_it >> start_it >> sleep 1 second >> >> Almost 20MiB/s are writing to the system when the script running. It's a >> little crazy? er. >> >> >> The crash stack: >> #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', >> packet=0x269b170) at masterconn.c:351 >> 351 if (eptr->mode==DATA || eptr->mode==HEADER) { >> >> #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', >> packet=0x269b170) at masterconn.c:351 >> #1 0x0000000000403b6e in job_pool_check_jobs (jpool=0x7f39b43ddea0) at >> bgjobs.c:338 >> #2 0x0000000000403f17 in job_pool_delete (jpool=0x7f39b43ddea0) at >> bgjobs.c:365 >> #3 0x0000000000414b31 in masterconn_term () at masterconn.c:864 >> #4 0x0000000000419173 in destruct () at ../mfscommon/main.c:312 >> #5 0x000000000041b60f in main (argc=1, argv=0x7fffc810dda0) at >> ../mfscommon/main.c:1162 >> >> # mfschunkserver -v >> version: 1.6.24 >> >> I think masterconn_termm cause crash: >> >> void masterconn_term(void) { >> packetstruct *pptr,*paptr; >> // syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); >> masterconn *eptr = masterconnsingleton; >> >> if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { >> tcpclose(eptr->sock); >> >> if (eptr->inputpacket.packet) { >> free(eptr->inputpacket.packet); >> } >> pptr = eptr->outputhead; >> while (pptr) { >> if (pptr->packet) { >> free(pptr->packet); >> } >> paptr = pptr; >> pptr = pptr->next; >> free(paptr); >> } >> } >> >> free(eptr); >> masterconnsingleton = NULL; >> job_pool_delete(jpool); // this is too later >> free(MasterHost); >> free(MasterPort); >> free(BindHost); >> } >> >> So we move the line to start. And patch below >> >> --- a/mfschunkserver/masterconn.c >> +++ b/mfschunkserver/masterconn.c >> @@ -842,6 +842,8 @@ void masterconn_term(void) { >> // syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); >> masterconn *eptr = masterconnsingleton; >> >> + job_pool_delete(jpool); >> + >> if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { >> tcpclose(eptr->sock); >> >> @@ -861,7 +863,7 @@ void masterconn_term(void) { >> >> free(eptr); >> masterconnsingleton = NULL; >> - job_pool_delete(jpool); >> + >> free(MasterHost); >> free(MasterPort); >> free(BindHost); >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ patch end >> >> Crash did not happened again with the patch, and the test almost run 12 >> hours. >> >> >> HTH >> >> -Ken >> > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > -- - Davies |