From: Flow J. <fl...@gm...> - 2011-02-26 08:44:57
|
Hi, After merging moosefs into our production environment for about 2 weeks, I now found there are a lot of core files dumped from mfsmount and remaining in "/" directory. All the back traces look simulator, which dies at free(freecblockshead) call in write_data_term (), when mainloop() ends. Fedora 12 x64 (mfsmount is compiled from source): Core was generated by `mfsmount /home/fwjiang -o rw,mfssubfolder=UserHome/fwjiang'. Program terminated with signal 6, Aborted. #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 (gdb) bt #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 #1 0x00000039ab433fd5 in abort () from /lib64/libc.so.6 #2 0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6 #3 0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6 #4 0x000000000040eb12 in write_data_term () at writedata.c:906 #5 0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599 #6 0x0000000000412c48 in main (argc=<value optimized out>, argv=0x7fff49f485f8) at main.c:819 Centos 5.5 x86 (mfsmount is from DAG repository): Core was generated by `mfsmount /project/ui -o rw,mfssubfolder=ProjectData/project/ui'. Program terminated with signal 6, Aborted. #0 0x00417410 in __kernel_vsyscall () (gdb) bt #0 0x00417410 in __kernel_vsyscall () #1 0x00a8ddf0 in raise () from /lib/libc.so.6 #2 0x00a8f701 in abort () from /lib/libc.so.6 #3 0x00ac628b in __libc_message () from /lib/libc.so.6 #4 0x00ace5a5 in _int_free () from /lib/libc.so.6 #5 0x00ace9e9 in free () from /lib/libc.so.6 #6 0x08056cc3 in write_data_term () #7 0x0805a768 in mainloop () #8 0x0805ab37 in main () The auto.home file to auto mount user home on Fedore 12 boxes look like: * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount All the server / clients run mfs 1.6.19. And all core files are dumped from those mounts with Read/Write access. By reading time log of the core dump listed above, I found it's dumped at when autofs timeouts (the default timeout is 300s on CentOS 5.5). So I tried manually copy a file (about 80MB) to a user folder which haven't been auto mounted, then wait 300s until the folder is auto unmounted, the core was dumped as expected. Does anyone has the same issue? Am I doing the right thing to auto mount with Moosefs? Thanks Flow |
From: Flow J. <fl...@gm...> - 2011-02-26 11:18:57
|
I found a more easy way to re-create the issue. Just keep the auto.home as: * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount Then copy a file to an unmounted user folder, and run: service autofs stop The core file will be dumped. Flow On 02/26/2011 04:44 PM, Flow Jiang wrote: > Hi, > > After merging moosefs into our production environment for about 2 > weeks, I now found there are a lot of core files dumped from mfsmount > and remaining in "/" directory. All the back traces look simulator, > which dies at free(freecblockshead) call in write_data_term (), when > mainloop() ends. > > Fedora 12 x64 (mfsmount is compiled from source): > > Core was generated by `mfsmount /home/fwjiang -o > rw,mfssubfolder=UserHome/fwjiang'. > Program terminated with signal 6, Aborted. > #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 > libgcc-4.4.4-10.fc12.x86_64 > (gdb) bt > #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 > #1 0x00000039ab433fd5 in abort () from /lib64/libc.so.6 > #2 0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6 > #3 0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6 > #4 0x000000000040eb12 in write_data_term () at writedata.c:906 > #5 0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 > "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599 > #6 0x0000000000412c48 in main (argc=<value optimized out>, > argv=0x7fff49f485f8) at main.c:819 > > Centos 5.5 x86 (mfsmount is from DAG repository): > > Core was generated by `mfsmount /project/ui -o > rw,mfssubfolder=ProjectData/project/ui'. > Program terminated with signal 6, Aborted. > #0 0x00417410 in __kernel_vsyscall () > (gdb) bt > #0 0x00417410 in __kernel_vsyscall () > #1 0x00a8ddf0 in raise () from /lib/libc.so.6 > #2 0x00a8f701 in abort () from /lib/libc.so.6 > #3 0x00ac628b in __libc_message () from /lib/libc.so.6 > #4 0x00ace5a5 in _int_free () from /lib/libc.so.6 > #5 0x00ace9e9 in free () from /lib/libc.so.6 > #6 0x08056cc3 in write_data_term () > #7 0x0805a768 in mainloop () > #8 0x0805ab37 in main () > > The auto.home file to auto mount user home on Fedore 12 boxes look like: > > * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount > > All the server / clients run mfs 1.6.19. And all core files are dumped > from those mounts with Read/Write access. By reading time log of the > core dump listed above, I found it's dumped at when autofs timeouts > (the default timeout is 300s on CentOS 5.5). > > So I tried manually copy a file (about 80MB) to a user folder which > haven't been auto mounted, then wait 300s until the folder is auto > unmounted, the core was dumped as expected. > > Does anyone has the same issue? Am I doing the right thing to auto > mount with Moosefs? > > Thanks > Flow |
From: Michal B. <mic...@ge...> - 2011-03-01 13:00:58
|
Hi! This error is not a serious one. It may happen only upon exits. If these errors are annoying a quick solution is to comment out the "free(freecblockshead)" line, recompile mfsmount and run again. We'll prepare a better solution in the next release. Kind regards Michał -----Original Message----- From: Flow Jiang [mailto:fl...@gm...] Sent: Saturday, February 26, 2011 12:19 PM To: moo...@li... Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs I found a more easy way to re-create the issue. Just keep the auto.home as: * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount Then copy a file to an unmounted user folder, and run: service autofs stop The core file will be dumped. Flow On 02/26/2011 04:44 PM, Flow Jiang wrote: > Hi, > > After merging moosefs into our production environment for about 2 > weeks, I now found there are a lot of core files dumped from mfsmount > and remaining in "/" directory. All the back traces look simulator, > which dies at free(freecblockshead) call in write_data_term (), when > mainloop() ends. > > Fedora 12 x64 (mfsmount is compiled from source): > > Core was generated by `mfsmount /home/fwjiang -o > rw,mfssubfolder=UserHome/fwjiang'. > Program terminated with signal 6, Aborted. > #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 > libgcc-4.4.4-10.fc12.x86_64 > (gdb) bt > #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 > #1 0x00000039ab433fd5 in abort () from /lib64/libc.so.6 > #2 0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6 > #3 0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6 > #4 0x000000000040eb12 in write_data_term () at writedata.c:906 > #5 0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 > "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599 > #6 0x0000000000412c48 in main (argc=<value optimized out>, > argv=0x7fff49f485f8) at main.c:819 > > Centos 5.5 x86 (mfsmount is from DAG repository): > > Core was generated by `mfsmount /project/ui -o > rw,mfssubfolder=ProjectData/project/ui'. > Program terminated with signal 6, Aborted. > #0 0x00417410 in __kernel_vsyscall () > (gdb) bt > #0 0x00417410 in __kernel_vsyscall () > #1 0x00a8ddf0 in raise () from /lib/libc.so.6 > #2 0x00a8f701 in abort () from /lib/libc.so.6 > #3 0x00ac628b in __libc_message () from /lib/libc.so.6 > #4 0x00ace5a5 in _int_free () from /lib/libc.so.6 > #5 0x00ace9e9 in free () from /lib/libc.so.6 > #6 0x08056cc3 in write_data_term () > #7 0x0805a768 in mainloop () > #8 0x0805ab37 in main () > > The auto.home file to auto mount user home on Fedore 12 boxes look like: > > * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount > > All the server / clients run mfs 1.6.19. And all core files are dumped > from those mounts with Read/Write access. By reading time log of the > core dump listed above, I found it's dumped at when autofs timeouts > (the default timeout is 300s on CentOS 5.5). > > So I tried manually copy a file (about 80MB) to a user folder which > haven't been auto mounted, then wait 300s until the folder is auto > unmounted, the core was dumped as expected. > > Does anyone has the same issue? Am I doing the right thing to auto > mount with Moosefs? > > Thanks > Flow ---------------------------------------------------------------------------- -- Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Flow J. <fl...@gm...> - 2011-03-01 15:37:30
|
Michal, Glad to know that this error could be simply solved by commenting out that line and will try tomorrow to see if it fixes this issue. It does annoying since each core file takes about 170M and I tried to disable the core dump but failed. So hopefully we can have a better solution in the next release. Thanks Flow On 03/01/2011 09:00 PM, Michal Borychowski wrote: > Hi! > > This error is not a serious one. It may happen only upon exits. If these > errors are annoying a quick solution is to comment out the > "free(freecblockshead)" line, recompile mfsmount and run again. We'll > prepare a better solution in the next release. > > > Kind regards > Michał > > -----Original Message----- > From: Flow Jiang [mailto:fl...@gm...] > Sent: Saturday, February 26, 2011 12:19 PM > To: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > I found a more easy way to re-create the issue. > > Just keep the auto.home as: > > * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount > > Then copy a file to an unmounted user folder, and run: > > service autofs stop > > The core file will be dumped. > > Flow > > On 02/26/2011 04:44 PM, Flow Jiang wrote: >> Hi, >> >> After merging moosefs into our production environment for about 2 >> weeks, I now found there are a lot of core files dumped from mfsmount >> and remaining in "/" directory. All the back traces look simulator, >> which dies at free(freecblockshead) call in write_data_term (), when >> mainloop() ends. >> >> Fedora 12 x64 (mfsmount is compiled from source): >> >> Core was generated by `mfsmount /home/fwjiang -o >> rw,mfssubfolder=UserHome/fwjiang'. >> Program terminated with signal 6, Aborted. >> #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 >> Missing separate debuginfos, use: debuginfo-install >> fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 >> libgcc-4.4.4-10.fc12.x86_64 >> (gdb) bt >> #0 0x00000039ab4327f5 in raise () from /lib64/libc.so.6 >> #1 0x00000039ab433fd5 in abort () from /lib64/libc.so.6 >> #2 0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6 >> #3 0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6 >> #4 0x000000000040eb12 in write_data_term () at writedata.c:906 >> #5 0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 >> "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599 >> #6 0x0000000000412c48 in main (argc=<value optimized out>, >> argv=0x7fff49f485f8) at main.c:819 >> >> Centos 5.5 x86 (mfsmount is from DAG repository): >> >> Core was generated by `mfsmount /project/ui -o >> rw,mfssubfolder=ProjectData/project/ui'. >> Program terminated with signal 6, Aborted. >> #0 0x00417410 in __kernel_vsyscall () >> (gdb) bt >> #0 0x00417410 in __kernel_vsyscall () >> #1 0x00a8ddf0 in raise () from /lib/libc.so.6 >> #2 0x00a8f701 in abort () from /lib/libc.so.6 >> #3 0x00ac628b in __libc_message () from /lib/libc.so.6 >> #4 0x00ace5a5 in _int_free () from /lib/libc.so.6 >> #5 0x00ace9e9 in free () from /lib/libc.so.6 >> #6 0x08056cc3 in write_data_term () >> #7 0x0805a768 in mainloop () >> #8 0x0805ab37 in main () >> >> The auto.home file to auto mount user home on Fedore 12 boxes look like: >> >> * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount >> >> All the server / clients run mfs 1.6.19. And all core files are dumped >> from those mounts with Read/Write access. By reading time log of the >> core dump listed above, I found it's dumped at when autofs timeouts >> (the default timeout is 300s on CentOS 5.5). >> >> So I tried manually copy a file (about 80MB) to a user folder which >> haven't been auto mounted, then wait 300s until the folder is auto >> unmounted, the core was dumped as expected. >> >> Does anyone has the same issue? Am I doing the right thing to auto >> mount with Moosefs? >> >> Thanks >> Flow > ---------------------------------------------------------------------------- > -- > Free Software Download: Index, Search& Analyze Logs and other IT data in > Real-Time with Splunk. Collect, index and harness all the fast moving IT > data > generated by your applications, servers and devices whether physical, > virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunk-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
From: Flow J. <fl...@gm...> - 2011-03-04 15:20:55
|
I tried to re-compile mfsmount with the "free(freecblockshead)" line commented out. Now our servers (which keep running 7x24) are happy, no more core files. However, core files still gets generated on our workstations when they reboot. The core is generated from the "read_data_term" line right after the "write_data_term" line mentioned previously. Hopefully this will also get fixed in next release, and will even be better if I can have a quick solution / patch for the issue. Thanks Flow On 03/01/2011 11:37 PM, Flow Jiang wrote: > Michal, > > Glad to know that this error could be simply solved by commenting out > that line and will try tomorrow to see if it fixes this issue. > > It does annoying since each core file takes about 170M and I tried to > disable the core dump but failed. So hopefully we can have a better > solution in the next release. > > Thanks > Flow > > On 03/01/2011 09:00 PM, Michal Borychowski wrote: >> Hi! >> >> This error is not a serious one. It may happen only upon exits. If these >> errors are annoying a quick solution is to comment out the >> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >> prepare a better solution in the next release. >> >> >> Kind regards >> Michał |
From: Michal B. <mic...@ge...> - 2011-03-15 13:05:36
|
It'll be fixed in the next release. For the moment you may try this "patch": @@ -178,6 +178,7 @@ void read_data_end(void* rr) { } if (rrec->rbuff!=NULL) { free(rrec->rbuff); + rrec->rbuff=NULL; } pthread_mutex_lock(&glock); Kind regards Michal -----Original Message----- From: Flow Jiang [mailto:fl...@gm...] Sent: Friday, March 04, 2011 4:21 PM To: Michal Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs I tried to re-compile mfsmount with the "free(freecblockshead)" line commented out. Now our servers (which keep running 7x24) are happy, no more core files. However, core files still gets generated on our workstations when they reboot. The core is generated from the "read_data_term" line right after the "write_data_term" line mentioned previously. Hopefully this will also get fixed in next release, and will even be better if I can have a quick solution / patch for the issue. Thanks Flow On 03/01/2011 11:37 PM, Flow Jiang wrote: > Michal, > > Glad to know that this error could be simply solved by commenting out > that line and will try tomorrow to see if it fixes this issue. > > It does annoying since each core file takes about 170M and I tried to > disable the core dump but failed. So hopefully we can have a better > solution in the next release. > > Thanks > Flow > > On 03/01/2011 09:00 PM, Michal Borychowski wrote: >> Hi! >> >> This error is not a serious one. It may happen only upon exits. If these >> errors are annoying a quick solution is to comment out the >> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >> prepare a better solution in the next release. >> >> >> Kind regards >> Michał ------------------------------------------------------------------------------ What You Don't Know About Data Connectivity CAN Hurt You This paper provides an overview of data connectivity, details its effect on application quality, and explores various alternative solutions. http://p.sf.net/sfu/progress-d2d _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: 姜智华 <fl...@gm...> - 2011-03-16 03:31:58
|
Hi, I tried the patch but the core file still gets dumped (with mfs 1.6.20) Core was generated by `mfsmount /home/fwjiang -o rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'. Program terminated with signal 6, Aborted. #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 (gdb) bt #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 #1 0x00000031c1633fd5 in abort () from /lib64/libc.so.6 #2 0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6 #3 0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6 #4 0x000000000040e4ad in read_data_term () at readdata.c:224 #5 0x00000000004131b5 in mainloop (args=0x7fffc23bf030, mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600 #6 0x00000000004134f8 in main (argc=<value optimized out>, argv=0x7fffc23bf158) at main.c:819 Any clues? Thanks Flow On 3/15/11, Michal Borychowski <mic...@ge...> wrote: > It'll be fixed in the next release. For the moment you may try this "patch": > > > @@ -178,6 +178,7 @@ void read_data_end(void* rr) { > } > if (rrec->rbuff!=NULL) { > free(rrec->rbuff); > + rrec->rbuff=NULL; > } > > pthread_mutex_lock(&glock); > > > Kind regards > Michal > > -----Original Message----- > From: Flow Jiang [mailto:fl...@gm...] > Sent: Friday, March 04, 2011 4:21 PM > To: Michal Borychowski > Cc: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > I tried to re-compile mfsmount with the "free(freecblockshead)" line > commented out. Now our servers (which keep running 7x24) are happy, no > more core files. However, core files still gets generated on our > workstations when they reboot. The core is generated from the > "read_data_term" line right after the "write_data_term" line mentioned > previously. > > Hopefully this will also get fixed in next release, and will even be > better if I can have a quick solution / patch for the issue. > > Thanks > Flow > > On 03/01/2011 11:37 PM, Flow Jiang wrote: >> Michal, >> >> Glad to know that this error could be simply solved by commenting out >> that line and will try tomorrow to see if it fixes this issue. >> >> It does annoying since each core file takes about 170M and I tried to >> disable the core dump but failed. So hopefully we can have a better >> solution in the next release. >> >> Thanks >> Flow >> >> On 03/01/2011 09:00 PM, Michal Borychowski wrote: >>> Hi! >>> >>> This error is not a serious one. It may happen only upon exits. If these >>> errors are annoying a quick solution is to comment out the >>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >>> prepare a better solution in the next release. >>> >>> >>> Kind regards >>> Michał > > ------------------------------------------------------------------------------ > What You Don't Know About Data Connectivity CAN Hurt You > This paper provides an overview of data connectivity, details > its effect on application quality, and explores various alternative > solutions. http://p.sf.net/sfu/progress-d2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Michal B. <mic...@ge...> - 2011-04-05 08:46:45
|
Hi! We made tests on several different operating system and are not able to reproduce your error. Valgrind also doesn't fined anything bad. Without the patch we have: root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869== Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted connection with parameters: read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to root:root ==1869== Invalid free() / delete / delete[] ==1869== at 0x4C270BD: free (vg_replace_malloc.c:366) ==1869== by 0x41366C: read_data_term (readdata.c:224) ==1869== by 0x418CF4: mainloop (main.c:698) ==1869== by 0x41906B: main (main.c:941) ==1869== Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd ==1869== mfsmount[1869]: master: connection lost (1) (...) But after applying the patch we have: ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted connection with parameters: read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to root:root mfsmount[1947]: master: connection lost (1) (...) If your problem exists only while using mount with autofs, please send your autofs configuration - we'll try to recreate the problem again. You may again check if you applied our patch correctly. BTW. We strongly not recommend to use the "mfscachefiles" option. It forces to leave files in the cache forever. It's not good. From 1.6.21 its usage will be marked as deprecated and probably from 1.7 we will remove it completely. Regards Michal -----Original Message----- From: 姜智华 [mailto:fl...@gm...] Sent: Wednesday, March 16, 2011 4:32 AM To: Michal Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs Hi, I tried the patch but the core file still gets dumped (with mfs 1.6.20) Core was generated by `mfsmount /home/fwjiang -o rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'. Program terminated with signal 6, Aborted. #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 (gdb) bt #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 #1 0x00000031c1633fd5 in abort () from /lib64/libc.so.6 #2 0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6 #3 0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6 #4 0x000000000040e4ad in read_data_term () at readdata.c:224 #5 0x00000000004131b5 in mainloop (args=0x7fffc23bf030, mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600 #6 0x00000000004134f8 in main (argc=<value optimized out>, argv=0x7fffc23bf158) at main.c:819 Any clues? Thanks Flow On 3/15/11, Michal Borychowski <mic...@ge...> wrote: > It'll be fixed in the next release. For the moment you may try this "patch": > > > @@ -178,6 +178,7 @@ void read_data_end(void* rr) { > } > if (rrec->rbuff!=NULL) { > free(rrec->rbuff); > + rrec->rbuff=NULL; > } > > pthread_mutex_lock(&glock); > > > Kind regards > Michal > > -----Original Message----- > From: Flow Jiang [mailto:fl...@gm...] > Sent: Friday, March 04, 2011 4:21 PM > To: Michal Borychowski > Cc: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > I tried to re-compile mfsmount with the "free(freecblockshead)" line > commented out. Now our servers (which keep running 7x24) are happy, no > more core files. However, core files still gets generated on our > workstations when they reboot. The core is generated from the > "read_data_term" line right after the "write_data_term" line mentioned > previously. > > Hopefully this will also get fixed in next release, and will even be > better if I can have a quick solution / patch for the issue. > > Thanks > Flow > > On 03/01/2011 11:37 PM, Flow Jiang wrote: >> Michal, >> >> Glad to know that this error could be simply solved by commenting out >> that line and will try tomorrow to see if it fixes this issue. >> >> It does annoying since each core file takes about 170M and I tried to >> disable the core dump but failed. So hopefully we can have a better >> solution in the next release. >> >> Thanks >> Flow >> >> On 03/01/2011 09:00 PM, Michal Borychowski wrote: >>> Hi! >>> >>> This error is not a serious one. It may happen only upon exits. If these >>> errors are annoying a quick solution is to comment out the >>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >>> prepare a better solution in the next release. >>> >>> >>> Kind regards >>> Michał > > ------------------------------------------------------------------------------ > What You Don't Know About Data Connectivity CAN Hurt You > This paper provides an overview of data connectivity, details > its effect on application quality, and explores various alternative > solutions. http://p.sf.net/sfu/progress-d2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: 姜智华 <fl...@gm...> - 2011-04-06 12:12:08
|
Hi, Michal I tried to spent some time today on MFS and noticed both the core dump and the Valgrind trace you gave was generated from "read_data_term", but the patch you gave was at "read_data_end". So I tried to apply the same strategy in both "read_data_end" AND "read_data_term", here's the patch I made: @@ -178,6 +178,7 @@ } if (rrec->rbuff!=NULL) { free(rrec->rbuff); + rrec->rbuff=NULL; } pthread_mutex_lock(&glock); @@ -221,6 +222,7 @@ } if (rr->rbuff) { free(rr->rbuff); + rr->rbuff = NULL; } pthread_cond_destroy(&(rr->cond)); free(rr); Now no more core files!!! So could you please confirm if the change in "read_data_term" is also expected (or the patch you gave in "read_data_end" should actually happen in "read_data_term"?) If you don't think the patch I made is reasonable or still want to recreate the issue, I'll send my AutoFS configuration later. Thanks Flow On 4/5/11, Michal Borychowski <mic...@ge...> wrote: > Hi! > > We made tests on several different operating system and are not able to > reproduce your error. Valgrind also doesn't fined anything bad. > > Without the patch we have: > root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869== > Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU > GPL'd, by Julian Seward et al. > ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root ==1869== Invalid free() / delete / delete[] > ==1869== at 0x4C270BD: free (vg_replace_malloc.c:366) > ==1869== by 0x41366C: read_data_term (readdata.c:224) > ==1869== by 0x418CF4: mainloop (main.c:698) > ==1869== by 0x41906B: main (main.c:941) > ==1869== Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd > ==1869== > mfsmount[1869]: master: connection lost (1) > (...) > > > But after applying the patch we have: > ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009, > and GNU GPL'd, by Julian Seward et al. > ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root > mfsmount[1947]: master: connection lost (1) > (...) > > > If your problem exists only while using mount with autofs, please send your > autofs configuration - we'll try to recreate the problem again. You may > again check if you applied our patch correctly. > > BTW. We strongly not recommend to use the "mfscachefiles" option. It forces > to leave files in the cache forever. It's not good. From 1.6.21 its usage > will be marked as deprecated and probably from 1.7 we will remove it > completely. > > > Regards > Michal > > > -----Original Message----- > From: 姜智华 [mailto:fl...@gm...] > Sent: Wednesday, March 16, 2011 4:32 AM > To: Michal Borychowski > Cc: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > Hi, > > I tried the patch but the core file still gets dumped (with mfs 1.6.20) > > Core was generated by `mfsmount /home/fwjiang -o > rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'. > Program terminated with signal 6, Aborted. > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64 > glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 > (gdb) bt > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > #1 0x00000031c1633fd5 in abort () from /lib64/libc.so.6 > #2 0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6 > #3 0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6 > #4 0x000000000040e4ad in read_data_term () at readdata.c:224 > #5 0x00000000004131b5 in mainloop (args=0x7fffc23bf030, > mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600 > #6 0x00000000004134f8 in main (argc=<value optimized out>, > argv=0x7fffc23bf158) at main.c:819 > > Any clues? > > Thanks > Flow > > On 3/15/11, Michal Borychowski <mic...@ge...> wrote: >> It'll be fixed in the next release. For the moment you may try this >> "patch": >> >> >> @@ -178,6 +178,7 @@ void read_data_end(void* rr) { >> } >> if (rrec->rbuff!=NULL) { >> free(rrec->rbuff); >> + rrec->rbuff=NULL; >> } >> >> pthread_mutex_lock(&glock); >> >> >> Kind regards >> Michal >> >> -----Original Message----- >> From: Flow Jiang [mailto:fl...@gm...] >> Sent: Friday, March 04, 2011 4:21 PM >> To: Michal Borychowski >> Cc: moo...@li... >> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs >> >> I tried to re-compile mfsmount with the "free(freecblockshead)" line >> commented out. Now our servers (which keep running 7x24) are happy, no >> more core files. However, core files still gets generated on our >> workstations when they reboot. The core is generated from the >> "read_data_term" line right after the "write_data_term" line mentioned >> previously. >> >> Hopefully this will also get fixed in next release, and will even be >> better if I can have a quick solution / patch for the issue. >> >> Thanks >> Flow >> >> On 03/01/2011 11:37 PM, Flow Jiang wrote: >>> Michal, >>> >>> Glad to know that this error could be simply solved by commenting out >>> that line and will try tomorrow to see if it fixes this issue. >>> >>> It does annoying since each core file takes about 170M and I tried to >>> disable the core dump but failed. So hopefully we can have a better >>> solution in the next release. >>> >>> Thanks >>> Flow >>> >>> On 03/01/2011 09:00 PM, Michal Borychowski wrote: >>>> Hi! >>>> >>>> This error is not a serious one. It may happen only upon exits. If these >>>> errors are annoying a quick solution is to comment out the >>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >>>> prepare a better solution in the next release. >>>> >>>> >>>> Kind regards >>>> Michał >> >> ------------------------------------------------------------------------------ >> What You Don't Know About Data Connectivity CAN Hurt You >> This paper provides an overview of data connectivity, details >> its effect on application quality, and explores various alternative >> solutions. http://p.sf.net/sfu/progress-d2d >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users >> >> > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Michal B. <mic...@ge...> - 2011-04-07 09:45:18
|
Thanks for the information. We are now looking into it Regards Michal -----Original Message----- From: 姜智华 [mailto:fl...@gm...] Sent: Wednesday, April 06, 2011 2:12 PM To: Michal Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs Hi, Michal I tried to spent some time today on MFS and noticed both the core dump and the Valgrind trace you gave was generated from "read_data_term", but the patch you gave was at "read_data_end". So I tried to apply the same strategy in both "read_data_end" AND "read_data_term", here's the patch I made: @@ -178,6 +178,7 @@ } if (rrec->rbuff!=NULL) { free(rrec->rbuff); + rrec->rbuff=NULL; } pthread_mutex_lock(&glock); @@ -221,6 +222,7 @@ } if (rr->rbuff) { free(rr->rbuff); + rr->rbuff = NULL; } pthread_cond_destroy(&(rr->cond)); free(rr); Now no more core files!!! So could you please confirm if the change in "read_data_term" is also expected (or the patch you gave in "read_data_end" should actually happen in "read_data_term"?) If you don't think the patch I made is reasonable or still want to recreate the issue, I'll send my AutoFS configuration later. Thanks Flow On 4/5/11, Michal Borychowski <mic...@ge...> wrote: > Hi! > > We made tests on several different operating system and are not able to > reproduce your error. Valgrind also doesn't fined anything bad. > > Without the patch we have: > root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869== > Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU > GPL'd, by Julian Seward et al. > ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root ==1869== Invalid free() / delete / delete[] > ==1869== at 0x4C270BD: free (vg_replace_malloc.c:366) > ==1869== by 0x41366C: read_data_term (readdata.c:224) > ==1869== by 0x418CF4: mainloop (main.c:698) > ==1869== by 0x41906B: main (main.c:941) > ==1869== Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd > ==1869== > mfsmount[1869]: master: connection lost (1) > (...) > > > But after applying the patch we have: > ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009, > and GNU GPL'd, by Julian Seward et al. > ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for > copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted > connection with parameters: > read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to > root:root > mfsmount[1947]: master: connection lost (1) > (...) > > > If your problem exists only while using mount with autofs, please send your > autofs configuration - we'll try to recreate the problem again. You may > again check if you applied our patch correctly. > > BTW. We strongly not recommend to use the "mfscachefiles" option. It forces > to leave files in the cache forever. It's not good. From 1.6.21 its usage > will be marked as deprecated and probably from 1.7 we will remove it > completely. > > > Regards > Michal > > > -----Original Message----- > From: 姜智华 [mailto:fl...@gm...] > Sent: Wednesday, March 16, 2011 4:32 AM > To: Michal Borychowski > Cc: moo...@li... > Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs > > Hi, > > I tried the patch but the core file still gets dumped (with mfs 1.6.20) > > Core was generated by `mfsmount /home/fwjiang -o > rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'. > Program terminated with signal 6, Aborted. > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > Missing separate debuginfos, use: debuginfo-install > filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64 > glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64 > (gdb) bt > #0 0x00000031c16327f5 in raise () from /lib64/libc.so.6 > #1 0x00000031c1633fd5 in abort () from /lib64/libc.so.6 > #2 0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6 > #3 0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6 > #4 0x000000000040e4ad in read_data_term () at readdata.c:224 > #5 0x00000000004131b5 in mainloop (args=0x7fffc23bf030, > mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600 > #6 0x00000000004134f8 in main (argc=<value optimized out>, > argv=0x7fffc23bf158) at main.c:819 > > Any clues? > > Thanks > Flow > > On 3/15/11, Michal Borychowski <mic...@ge...> wrote: >> It'll be fixed in the next release. For the moment you may try this >> "patch": >> >> >> @@ -178,6 +178,7 @@ void read_data_end(void* rr) { >> } >> if (rrec->rbuff!=NULL) { >> free(rrec->rbuff); >> + rrec->rbuff=NULL; >> } >> >> pthread_mutex_lock(&glock); >> >> >> Kind regards >> Michal >> >> -----Original Message----- >> From: Flow Jiang [mailto:fl...@gm...] >> Sent: Friday, March 04, 2011 4:21 PM >> To: Michal Borychowski >> Cc: moo...@li... >> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs >> >> I tried to re-compile mfsmount with the "free(freecblockshead)" line >> commented out. Now our servers (which keep running 7x24) are happy, no >> more core files. However, core files still gets generated on our >> workstations when they reboot. The core is generated from the >> "read_data_term" line right after the "write_data_term" line mentioned >> previously. >> >> Hopefully this will also get fixed in next release, and will even be >> better if I can have a quick solution / patch for the issue. >> >> Thanks >> Flow >> >> On 03/01/2011 11:37 PM, Flow Jiang wrote: >>> Michal, >>> >>> Glad to know that this error could be simply solved by commenting out >>> that line and will try tomorrow to see if it fixes this issue. >>> >>> It does annoying since each core file takes about 170M and I tried to >>> disable the core dump but failed. So hopefully we can have a better >>> solution in the next release. >>> >>> Thanks >>> Flow >>> >>> On 03/01/2011 09:00 PM, Michal Borychowski wrote: >>>> Hi! >>>> >>>> This error is not a serious one. It may happen only upon exits. If these >>>> errors are annoying a quick solution is to comment out the >>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll >>>> prepare a better solution in the next release. >>>> >>>> >>>> Kind regards >>>> Michał >> >> ------------------------------------------------------------------------------ >> What You Don't Know About Data Connectivity CAN Hurt You >> This paper provides an overview of data connectivity, details >> its effect on application quality, and explores various alternative >> solutions. http://p.sf.net/sfu/progress-d2d >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users >> >> > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > ------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |