From: Michal B. <mic...@ge...> - 2011-04-07 09:45:18
|
Thanks for the information. We are now looking into it Regards Michal -----Original Message----- From: 姜智华 [mailto:fl...@gm...] Sent: Wednesday, April 06, 2011 2:12 PM To: Michal Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs Hi, Michal I tried to spent some time today on MFS and noticed both the core dump and the Valgrind trace you gave was generated from "read_data_term", but the patch you gave was at "read_data_end". So I tried to apply the same strategy in both "read_data_end" AND "read_data_term", here's the patch I made: @@ -178,6 +178,7 @@ } if (rrec->rbuff!=NULL) { free(rrec->rbuff); + rrec->rbuff=NULL; } pthread_mutex_lock(&glock); @@ -221,6 +222,7 @@ } if (rr->rbuff) { free(rr->rbuff); + rr->rbuff = NULL; } pthread_cond_destroy(&(rr->cond)); free(rr); Now no more core files!!! So could you please confirm if the change in "read_data_term" is also expected (or the patch you gave in "read_data_end" should actually happen in "read_data_term"?) If you don't think the patch I made is reasonable or still want to recreate the issue, I'll send my AutoFS configuration later. Thanks Flow On 4/5/11, Michal Borychowski <mic...@ge...> wrote: > Hi! > > We made tests on several different operating system and are not able to > reproduce your error. Valgrind also doesn't fined anything bad. > > Without the patch we have: > root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869== > Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU > GPL'd, by Julian Seward et al. > ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root ==1869== Invalid free() / delete / delete[] > ==1869== at 0x4C270BD: free (vg_replace_malloc.c:366) > ==1869== by 0x41366C: read_data_term (readdata.c:224) > ==1869== by 0x418CF4: mainloop (main.c:698) > ==1869== by 0x41906B: main (main.c:941) > ==1869== Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd > ==1869== > mfsmount[1869]: master: connection lost (1) > (...) > > > But after applying the patch we have: > ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009, > and GNU GPL'd, by Julian Seward et al. > ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root > mfsmount[1947]: master: connection lost (1) > (...) > > > If your problem exists only while using mount with autofs, please send your > autofs configuration - we'll try to recreate the problem again. You may > again check if you applied our patch correctly. > > BTW. We strongly not recommend to use the "mfscachefiles" option. It forces > to leave files in the cache forever. It's not good. From 1.6.21 its usage > will be marked as deprecated and probably from 1.7 we will remove it > completely. > > > Regards > Michal > > > -----Original Message----- > From: 姜智华 [mailto:fl...@gm...] > Sent: Wednesday, March 16, 2011 4:32 AM > To: Michal Borychowski > Cc: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > Hi, > > I tried the patch but the core file still gets dumped (with mfs 1.6.20) > > Core was generated by `mfsmount /home/fwjiang -o > rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'. > Program terminated with signal 6, Aborted. > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64 > glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 > (gdb) bt > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > #1 0x00000031c1633fd5 in abort () from /lib64/libc.so.6 > #2 0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6 > #3 0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6 > #4 0x000000000040e4ad in read_data_term () at readdata.c:224 > #5 0x00000000004131b5 in mainloop (args=0x7fffc23bf030, > mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600 > #6 0x00000000004134f8 in main (argc=<value optimized out>, > argv=0x7fffc23bf158) at main.c:819 > > Any clues? > > Thanks > Flow > > On 3/15/11, Michal Borychowski <mic...@ge...> wrote: >> It'll be fixed in the next release. For the moment you may try this >> "patch": >> >> >> @@ -178,6 +178,7 @@ void read_data_end(void* rr) { >> } >> if (rrec->rbuff!=NULL) { >> free(rrec->rbuff); >> + rrec->rbuff=NULL; >> } >> >> pthread_mutex_lock(&glock); >> >> >> Kind regards >> Michal >> >> -----Original Message----- >> From: Flow Jiang [mailto:fl...@gm...] >> Sent: Friday, March 04, 2011 4:21 PM >> To: Michal Borychowski >> Cc: moo...@li... >> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs >> >> I tried to re-compile mfsmount with the "free(freecblockshead)" line >> commented out. Now our servers (which keep running 7x24) are happy, no >> more core files. However, core files still gets generated on our >> workstations when they reboot. The core is generated from the >> "read_data_term" line right after the "write_data_term" line mentioned >> previously. >> >> Hopefully this will also get fixed in next release, and will even be >> better if I can have a quick solution / patch for the issue. >> >> Thanks >> Flow >> >> On 03/01/2011 11:37 PM, Flow Jiang wrote: >>> Michal, >>> >>> Glad to know that this error could be simply solved by commenting out >>> that line and will try tomorrow to see if it fixes this issue. >>> >>> It does annoying since each core file takes about 170M and I tried to >>> disable the core dump but failed. So hopefully we can have a better >>> solution in the next release. >>> >>> Thanks >>> Flow >>> >>> On 03/01/2011 09:00 PM, Michal Borychowski wrote: >>>> Hi! >>>> >>>> This error is not a serious one. It may happen only upon exits. If these >>>> errors are annoying a quick solution is to comment out the >>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >>>> prepare a better solution in the next release. >>>> >>>> >>>> Kind regards >>>> Michał >> >> ------------------------------------------------------------------------------ >> What You Don't Know About Data Connectivity CAN Hurt You >> This paper provides an overview of data connectivity, details >> its effect on application quality, and explores various alternative >> solutions. http://p.sf.net/sfu/progress-d2d >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users >> >> > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > ------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |