From: Ken <ken...@gm...> - 2012-04-20 05:58:32
|
Nobody interesting this? There is one-thousandth of the possibility which cause file damage. Maybe log like: mfsmaster[7192]: chunk 00000000000EC3A8 has only invalid copies (2) - please repair it manually mfsmaster[7192]: chunk 00000000000EC3A8_00000002 - invalid copy on (10.1.1.3 - ver:00000001) -Ken On Thu, Apr 19, 2012 at 9:01 AM, Ken <ken...@gm...> wrote: > hi, list > > We found some crashes in mfschunkserver(1.6.24) in stopping. The test > script maybe weired: > > while true: > select a ChunkServer > stop_it > start_it > sleep 1 second > > Almost 20MiB/s are writing to the system when the script running. It's a > little crazy? er. > > > The crash stack: > #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', > packet=0x269b170) at masterconn.c:351 > 351 if (eptr->mode==DATA || eptr->mode==HEADER) { > > #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', > packet=0x269b170) at masterconn.c:351 > #1 0x0000000000403b6e in job_pool_check_jobs (jpool=0x7f39b43ddea0) at > bgjobs.c:338 > #2 0x0000000000403f17 in job_pool_delete (jpool=0x7f39b43ddea0) at > bgjobs.c:365 > #3 0x0000000000414b31 in masterconn_term () at masterconn.c:864 > #4 0x0000000000419173 in destruct () at ../mfscommon/main.c:312 > #5 0x000000000041b60f in main (argc=1, argv=0x7fffc810dda0) at > ../mfscommon/main.c:1162 > > # mfschunkserver -v > version: 1.6.24 > > I think masterconn_termm cause crash: > > void masterconn_term(void) { > packetstruct *pptr,*paptr;// syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); > masterconn *eptr = masterconnsingleton; > > if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { > tcpclose(eptr->sock); > > if (eptr->inputpacket.packet) { > free(eptr->inputpacket.packet); > } > pptr = eptr->outputhead; > while (pptr) { > if (pptr->packet) { > free(pptr->packet); > } > paptr = pptr; > pptr = pptr->next; > free(paptr); > } > } > > free(eptr); > masterconnsingleton = NULL;* job_pool_delete(jpool); // this is too later* > free(MasterHost); > free(MasterPort); > free(BindHost);} > > So we move the line to start. And patch below > > --- a/mfschunkserver/masterconn.c > +++ b/mfschunkserver/masterconn.c > @@ -842,6 +842,8 @@ void masterconn_term(void) { > // syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); > masterconn *eptr = masterconnsingleton; > > + job_pool_delete(jpool); > + > if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { > tcpclose(eptr->sock); > > @@ -861,7 +863,7 @@ void masterconn_term(void) { > > free(eptr); > masterconnsingleton = NULL; > - job_pool_delete(jpool); > + > free(MasterHost); > free(MasterPort); > free(BindHost); > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ patch end > > Crash did not happened again with the patch, and the test almost run 12 > hours. > > > HTH > > -Ken > > |