From: Ken <ken...@gm...> - 2012-04-19 01:01:44
|
hi, list We found some crashes in mfschunkserver(1.6.24) in stopping. The test script maybe weired: while true: select a ChunkServer stop_it start_it sleep 1 second Almost 20MiB/s are writing to the system when the script running. It's a little crazy? er. The crash stack: #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', packet=0x269b170) at masterconn.c:351 351 if (eptr->mode==DATA || eptr->mode==HEADER) { #0 0x00000000004139e7 in masterconn_replicationfinished (status=0 '\0', packet=0x269b170) at masterconn.c:351 #1 0x0000000000403b6e in job_pool_check_jobs (jpool=0x7f39b43ddea0) at bgjobs.c:338 #2 0x0000000000403f17 in job_pool_delete (jpool=0x7f39b43ddea0) at bgjobs.c:365 #3 0x0000000000414b31 in masterconn_term () at masterconn.c:864 #4 0x0000000000419173 in destruct () at ../mfscommon/main.c:312 #5 0x000000000041b60f in main (argc=1, argv=0x7fffc810dda0) at ../mfscommon/main.c:1162 # mfschunkserver -v version: 1.6.24 I think masterconn_termm cause crash: void masterconn_term(void) { packetstruct *pptr,*paptr;// syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); masterconn *eptr = masterconnsingleton; if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { tcpclose(eptr->sock); if (eptr->inputpacket.packet) { free(eptr->inputpacket.packet); } pptr = eptr->outputhead; while (pptr) { if (pptr->packet) { free(pptr->packet); } paptr = pptr; pptr = pptr->next; free(paptr); } } free(eptr); masterconnsingleton = NULL;* job_pool_delete(jpool); // this is too later* free(MasterHost); free(MasterPort); free(BindHost);} So we move the line to start. And patch below --- a/mfschunkserver/masterconn.c +++ b/mfschunkserver/masterconn.c @@ -842,6 +842,8 @@ void masterconn_term(void) { // syslog(LOG_INFO,"closing %s:%s",MasterHost,MasterPort); masterconn *eptr = masterconnsingleton; + job_pool_delete(jpool); + if (eptr->mode!=FREE && eptr->mode!=CONNECTING) { tcpclose(eptr->sock); @@ -861,7 +863,7 @@ void masterconn_term(void) { free(eptr); masterconnsingleton = NULL; - job_pool_delete(jpool); + free(MasterHost); free(MasterPort); free(BindHost); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ patch end Crash did not happened again with the patch, and the test almost run 12 hours. HTH -Ken |