Thread: [Moosefs-users] Core Dumped from mfsmount with Autofs

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Brought to you by: jakub_kruszona, moosefs, oxide94

moosefs-users

[Moosefs-users] Core Dumped from mfsmount with Autofs

From: Flow J. <fl...@gm...> - 2011-02-26 08:44:57

Hi,

After merging moosefs into our production environment for about 2 weeks, 
I now found there are a lot of core files dumped from mfsmount and 
remaining in "/" directory. All the back traces look simulator, which 
dies at free(freecblockshead) call in write_data_term (), when 
mainloop() ends.

Fedora 12 x64 (mfsmount is compiled from source):

Core was generated by `mfsmount /home/fwjiang -o 
rw,mfssubfolder=UserHome/fwjiang'.
Program terminated with signal 6, Aborted.
#0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 
libgcc-4.4.4-10.fc12.x86_64
(gdb) bt
#0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
#1  0x00000039ab433fd5 in abort () from /lib64/libc.so.6
#2  0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6
#3  0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000000040eb12 in write_data_term () at writedata.c:906
#5  0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 
"/tmp/autotAfzu1", mt=1, fg=0) at main.c:599
#6  0x0000000000412c48 in main (argc=<value optimized out>, 
argv=0x7fff49f485f8) at main.c:819

Centos 5.5 x86 (mfsmount is from DAG repository):

Core was generated by `mfsmount /project/ui -o 
rw,mfssubfolder=ProjectData/project/ui'.
Program terminated with signal 6, Aborted.
#0  0x00417410 in __kernel_vsyscall ()
(gdb) bt
#0  0x00417410 in __kernel_vsyscall ()
#1  0x00a8ddf0 in raise () from /lib/libc.so.6
#2  0x00a8f701 in abort () from /lib/libc.so.6
#3  0x00ac628b in __libc_message () from /lib/libc.so.6
#4  0x00ace5a5 in _int_free () from /lib/libc.so.6
#5  0x00ace9e9 in free () from /lib/libc.so.6
#6  0x08056cc3 in write_data_term ()
#7  0x0805a768 in mainloop ()
#8  0x0805ab37 in main ()

The auto.home file to auto mount user home on Fedore 12 boxes look like:

* -fstype=fuse,mfssubfolder=UserHome/& :mfsmount

All the server / clients run mfs 1.6.19. And all core files are dumped 
from those mounts with Read/Write access. By reading time log of the 
core dump listed above, I found it's dumped at when autofs timeouts (the 
default timeout is 300s on CentOS 5.5).

So I tried manually copy a file (about 80MB) to a user folder which 
haven't been auto mounted, then wait 300s until the folder is auto 
unmounted, the core was dumped as expected.

Does anyone has the same issue? Am I doing the right thing to auto mount 
with Moosefs?

Thanks
Flow

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Flow J. <fl...@gm...> - 2011-02-26 11:18:57

I found a more easy way to re-create the issue.

Just keep the auto.home as:

* -fstype=fuse,mfssubfolder=UserHome/& :mfsmount

Then copy a file to an unmounted user folder, and run:

service autofs stop

The core file will be dumped.

Flow

On 02/26/2011 04:44 PM, Flow Jiang wrote:
> Hi,
>
> After merging moosefs into our production environment for about 2 
> weeks, I now found there are a lot of core files dumped from mfsmount 
> and remaining in "/" directory. All the back traces look simulator, 
> which dies at free(freecblockshead) call in write_data_term (), when 
> mainloop() ends.
>
> Fedora 12 x64 (mfsmount is compiled from source):
>
> Core was generated by `mfsmount /home/fwjiang -o 
> rw,mfssubfolder=UserHome/fwjiang'.
> Program terminated with signal 6, Aborted.
> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 
> libgcc-4.4.4-10.fc12.x86_64
> (gdb) bt
> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
> #1  0x00000039ab433fd5 in abort () from /lib64/libc.so.6
> #2  0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6
> #3  0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000040eb12 in write_data_term () at writedata.c:906
> #5  0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 
> "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599
> #6  0x0000000000412c48 in main (argc=<value optimized out>, 
> argv=0x7fff49f485f8) at main.c:819
>
> Centos 5.5 x86 (mfsmount is from DAG repository):
>
> Core was generated by `mfsmount /project/ui -o 
> rw,mfssubfolder=ProjectData/project/ui'.
> Program terminated with signal 6, Aborted.
> #0  0x00417410 in __kernel_vsyscall ()
> (gdb) bt
> #0  0x00417410 in __kernel_vsyscall ()
> #1  0x00a8ddf0 in raise () from /lib/libc.so.6
> #2  0x00a8f701 in abort () from /lib/libc.so.6
> #3  0x00ac628b in __libc_message () from /lib/libc.so.6
> #4  0x00ace5a5 in _int_free () from /lib/libc.so.6
> #5  0x00ace9e9 in free () from /lib/libc.so.6
> #6  0x08056cc3 in write_data_term ()
> #7  0x0805a768 in mainloop ()
> #8  0x0805ab37 in main ()
>
> The auto.home file to auto mount user home on Fedore 12 boxes look like:
>
> * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount
>
> All the server / clients run mfs 1.6.19. And all core files are dumped 
> from those mounts with Read/Write access. By reading time log of the 
> core dump listed above, I found it's dumped at when autofs timeouts 
> (the default timeout is 300s on CentOS 5.5).
>
> So I tried manually copy a file (about 80MB) to a user folder which 
> haven't been auto mounted, then wait 300s until the folder is auto 
> unmounted, the core was dumped as expected.
>
> Does anyone has the same issue? Am I doing the right thing to auto 
> mount with Moosefs?
>
> Thanks
> Flow

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Michal B. <mic...@ge...> - 2011-03-01 13:00:58

Hi!

This error is not a serious one. It may happen only upon exits. If these
errors are annoying a quick solution is to comment out the
"free(freecblockshead)" line, recompile mfsmount and run again. We'll
prepare a better solution in the next release.


Kind regards
Michał 

-----Original Message-----
From: Flow Jiang [mailto:fl...@gm...] 
Sent: Saturday, February 26, 2011 12:19 PM
To: moo...@li...
Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

I found a more easy way to re-create the issue.

Just keep the auto.home as:

* -fstype=fuse,mfssubfolder=UserHome/& :mfsmount

Then copy a file to an unmounted user folder, and run:

service autofs stop

The core file will be dumped.

Flow

On 02/26/2011 04:44 PM, Flow Jiang wrote:
> Hi,
>
> After merging moosefs into our production environment for about 2 
> weeks, I now found there are a lot of core files dumped from mfsmount 
> and remaining in "/" directory. All the back traces look simulator, 
> which dies at free(freecblockshead) call in write_data_term (), when 
> mainloop() ends.
>
> Fedora 12 x64 (mfsmount is compiled from source):
>
> Core was generated by `mfsmount /home/fwjiang -o 
> rw,mfssubfolder=UserHome/fwjiang'.
> Program terminated with signal 6, Aborted.
> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64 
> libgcc-4.4.4-10.fc12.x86_64
> (gdb) bt
> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
> #1  0x00000039ab433fd5 in abort () from /lib64/libc.so.6
> #2  0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6
> #3  0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000040eb12 in write_data_term () at writedata.c:906
> #5  0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0 
> "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599
> #6  0x0000000000412c48 in main (argc=<value optimized out>, 
> argv=0x7fff49f485f8) at main.c:819
>
> Centos 5.5 x86 (mfsmount is from DAG repository):
>
> Core was generated by `mfsmount /project/ui -o 
> rw,mfssubfolder=ProjectData/project/ui'.
> Program terminated with signal 6, Aborted.
> #0  0x00417410 in __kernel_vsyscall ()
> (gdb) bt
> #0  0x00417410 in __kernel_vsyscall ()
> #1  0x00a8ddf0 in raise () from /lib/libc.so.6
> #2  0x00a8f701 in abort () from /lib/libc.so.6
> #3  0x00ac628b in __libc_message () from /lib/libc.so.6
> #4  0x00ace5a5 in _int_free () from /lib/libc.so.6
> #5  0x00ace9e9 in free () from /lib/libc.so.6
> #6  0x08056cc3 in write_data_term ()
> #7  0x0805a768 in mainloop ()
> #8  0x0805ab37 in main ()
>
> The auto.home file to auto mount user home on Fedore 12 boxes look like:
>
> * -fstype=fuse,mfssubfolder=UserHome/& :mfsmount
>
> All the server / clients run mfs 1.6.19. And all core files are dumped 
> from those mounts with Read/Write access. By reading time log of the 
> core dump listed above, I found it's dumped at when autofs timeouts 
> (the default timeout is 300s on CentOS 5.5).
>
> So I tried manually copy a file (about 80MB) to a user folder which 
> haven't been auto mounted, then wait 300s until the folder is auto 
> unmounted, the core was dumped as expected.
>
> Does anyone has the same issue? Am I doing the right thing to auto 
> mount with Moosefs?
>
> Thanks
> Flow

----------------------------------------------------------------------------
--
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT
data 
generated by your applications, servers and devices whether physical,
virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Flow J. <fl...@gm...> - 2011-03-01 15:37:30

Michal,

Glad to know that this error could be simply solved by commenting out 
that line and will try tomorrow to see if it fixes this issue.

It does annoying since each core file takes about 170M and I tried to 
disable the core dump but failed. So hopefully we can have a better 
solution in the next release.

Thanks
Flow

On 03/01/2011 09:00 PM, Michal Borychowski wrote:
> Hi!
>
> This error is not a serious one. It may happen only upon exits. If these
> errors are annoying a quick solution is to comment out the
> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
> prepare a better solution in the next release.
>
>
> Kind regards
> Michał
>
> -----Original Message-----
> From: Flow Jiang [mailto:fl...@gm...]
> Sent: Saturday, February 26, 2011 12:19 PM
> To: moo...@li...
> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>
> I found a more easy way to re-create the issue.
>
> Just keep the auto.home as:
>
> * -fstype=fuse,mfssubfolder=UserHome/&  :mfsmount
>
> Then copy a file to an unmounted user folder, and run:
>
> service autofs stop
>
> The core file will be dumped.
>
> Flow
>
> On 02/26/2011 04:44 PM, Flow Jiang wrote:
>> Hi,
>>
>> After merging moosefs into our production environment for about 2
>> weeks, I now found there are a lot of core files dumped from mfsmount
>> and remaining in "/" directory. All the back traces look simulator,
>> which dies at free(freecblockshead) call in write_data_term (), when
>> mainloop() ends.
>>
>> Fedora 12 x64 (mfsmount is compiled from source):
>>
>> Core was generated by `mfsmount /home/fwjiang -o
>> rw,mfssubfolder=UserHome/fwjiang'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
>> Missing separate debuginfos, use: debuginfo-install
>> fuse-libs-2.8.5-2.fc12.x86_64 glibc-2.11.2-3.x86_64
>> libgcc-4.4.4-10.fc12.x86_64
>> (gdb) bt
>> #0  0x00000039ab4327f5 in raise () from /lib64/libc.so.6
>> #1  0x00000039ab433fd5 in abort () from /lib64/libc.so.6
>> #2  0x00000039ab46fa1b in __libc_message () from /lib64/libc.so.6
>> #3  0x00000039ab475336 in malloc_printerr () from /lib64/libc.so.6
>> #4  0x000000000040eb12 in write_data_term () at writedata.c:906
>> #5  0x000000000041282d in mainloop (args=0x7fff49f484d0, mp=0x1bb72e0
>> "/tmp/autotAfzu1", mt=1, fg=0) at main.c:599
>> #6  0x0000000000412c48 in main (argc=<value optimized out>,
>> argv=0x7fff49f485f8) at main.c:819
>>
>> Centos 5.5 x86 (mfsmount is from DAG repository):
>>
>> Core was generated by `mfsmount /project/ui -o
>> rw,mfssubfolder=ProjectData/project/ui'.
>> Program terminated with signal 6, Aborted.
>> #0  0x00417410 in __kernel_vsyscall ()
>> (gdb) bt
>> #0  0x00417410 in __kernel_vsyscall ()
>> #1  0x00a8ddf0 in raise () from /lib/libc.so.6
>> #2  0x00a8f701 in abort () from /lib/libc.so.6
>> #3  0x00ac628b in __libc_message () from /lib/libc.so.6
>> #4  0x00ace5a5 in _int_free () from /lib/libc.so.6
>> #5  0x00ace9e9 in free () from /lib/libc.so.6
>> #6  0x08056cc3 in write_data_term ()
>> #7  0x0805a768 in mainloop ()
>> #8  0x0805ab37 in main ()
>>
>> The auto.home file to auto mount user home on Fedore 12 boxes look like:
>>
>> * -fstype=fuse,mfssubfolder=UserHome/&  :mfsmount
>>
>> All the server / clients run mfs 1.6.19. And all core files are dumped
>> from those mounts with Read/Write access. By reading time log of the
>> core dump listed above, I found it's dumped at when autofs timeouts
>> (the default timeout is 300s on CentOS 5.5).
>>
>> So I tried manually copy a file (about 80MB) to a user folder which
>> haven't been auto mounted, then wait 300s until the folder is auto
>> unmounted, the core was dumped as expected.
>>
>> Does anyone has the same issue? Am I doing the right thing to auto
>> mount with Moosefs?
>>
>> Thanks
>> Flow
> ----------------------------------------------------------------------------
> --
> Free Software Download: Index, Search&  Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT
> data
> generated by your applications, servers and devices whether physical,
> virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Flow J. <fl...@gm...> - 2011-03-04 15:20:55

I tried to re-compile mfsmount with the "free(freecblockshead)" line 
commented out. Now our servers (which keep running 7x24) are happy, no 
more core files. However, core files still gets generated on our 
workstations when they reboot. The core is generated from the 
"read_data_term" line right after the "write_data_term" line mentioned 
previously.

Hopefully this will also get fixed in next release, and will even be 
better if I can have a quick solution / patch for the issue.

Thanks
Flow

On 03/01/2011 11:37 PM, Flow Jiang wrote:
> Michal,
>
> Glad to know that this error could be simply solved by commenting out 
> that line and will try tomorrow to see if it fixes this issue.
>
> It does annoying since each core file takes about 170M and I tried to 
> disable the core dump but failed. So hopefully we can have a better 
> solution in the next release.
>
> Thanks
> Flow
>
> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>> Hi!
>>
>> This error is not a serious one. It may happen only upon exits. If these
>> errors are annoying a quick solution is to comment out the
>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>> prepare a better solution in the next release.
>>
>>
>> Kind regards
>> Michał

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Michal B. <mic...@ge...> - 2011-03-15 13:05:36

It'll be fixed in the next release. For the moment you may try this "patch":


@@ -178,6 +178,7 @@ void read_data_end(void* rr) {
        }
        if (rrec->rbuff!=NULL) {
                free(rrec->rbuff);
+               rrec->rbuff=NULL;
        }
 
        pthread_mutex_lock(&glock);


Kind regards
Michal 

-----Original Message-----
From: Flow Jiang [mailto:fl...@gm...] 
Sent: Friday, March 04, 2011 4:21 PM
To: Michal Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

I tried to re-compile mfsmount with the "free(freecblockshead)" line 
commented out. Now our servers (which keep running 7x24) are happy, no 
more core files. However, core files still gets generated on our 
workstations when they reboot. The core is generated from the 
"read_data_term" line right after the "write_data_term" line mentioned 
previously.

Hopefully this will also get fixed in next release, and will even be 
better if I can have a quick solution / patch for the issue.

Thanks
Flow

On 03/01/2011 11:37 PM, Flow Jiang wrote:
> Michal,
>
> Glad to know that this error could be simply solved by commenting out 
> that line and will try tomorrow to see if it fixes this issue.
>
> It does annoying since each core file takes about 170M and I tried to 
> disable the core dump but failed. So hopefully we can have a better 
> solution in the next release.
>
> Thanks
> Flow
>
> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>> Hi!
>>
>> This error is not a serious one. It may happen only upon exits. If these
>> errors are annoying a quick solution is to comment out the
>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>> prepare a better solution in the next release.
>>
>>
>> Kind regards
>> Michał

------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: 姜智华 <fl...@gm...> - 2011-03-16 03:31:58

Hi,

I tried the patch but the core file still gets dumped (with mfs 1.6.20)

Core was generated by `mfsmount /home/fwjiang -o
rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'.
Program terminated with signal 6, Aborted.
#0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64
glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64
(gdb) bt
#0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
#1  0x00000031c1633fd5 in abort () from /lib64/libc.so.6
#2  0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6
#3  0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000000040e4ad in read_data_term () at readdata.c:224
#5  0x00000000004131b5 in mainloop (args=0x7fffc23bf030,
    mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600
#6  0x00000000004134f8 in main (argc=<value optimized out>,
    argv=0x7fffc23bf158) at main.c:819

Any clues?

Thanks
Flow

On 3/15/11, Michal Borychowski <mic...@ge...> wrote:
> It'll be fixed in the next release. For the moment you may try this "patch":
>
>
> @@ -178,6 +178,7 @@ void read_data_end(void* rr) {
>         }
>         if (rrec->rbuff!=NULL) {
>                 free(rrec->rbuff);
> +               rrec->rbuff=NULL;
>         }
>
>         pthread_mutex_lock(&glock);
>
>
> Kind regards
> Michal
>
> -----Original Message-----
> From: Flow Jiang [mailto:fl...@gm...]
> Sent: Friday, March 04, 2011 4:21 PM
> To: Michal Borychowski
> Cc: moo...@li...
> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>
> I tried to re-compile mfsmount with the "free(freecblockshead)" line
> commented out. Now our servers (which keep running 7x24) are happy, no
> more core files. However, core files still gets generated on our
> workstations when they reboot. The core is generated from the
> "read_data_term" line right after the "write_data_term" line mentioned
> previously.
>
> Hopefully this will also get fixed in next release, and will even be
> better if I can have a quick solution / patch for the issue.
>
> Thanks
> Flow
>
> On 03/01/2011 11:37 PM, Flow Jiang wrote:
>> Michal,
>>
>> Glad to know that this error could be simply solved by commenting out
>> that line and will try tomorrow to see if it fixes this issue.
>>
>> It does annoying since each core file takes about 170M and I tried to
>> disable the core dump but failed. So hopefully we can have a better
>> solution in the next release.
>>
>> Thanks
>> Flow
>>
>> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>>> Hi!
>>>
>>> This error is not a serious one. It may happen only upon exits. If these
>>> errors are annoying a quick solution is to comment out the
>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>>> prepare a better solution in the next release.
>>>
>>>
>>> Kind regards
>>> Michał
>
> ------------------------------------------------------------------------------
> What You Don't Know About Data Connectivity CAN Hurt You
> This paper provides an overview of data connectivity, details
> its effect on application quality, and explores various alternative
> solutions. http://p.sf.net/sfu/progress-d2d
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Michal B. <mic...@ge...> - 2011-04-05 08:46:45

Hi!

We made tests on several different operating system and are not able to reproduce your error. Valgrind also doesn't fined anything bad.

Without the patch we have:
root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869== Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted connection with parameters: read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to root:root ==1869== Invalid free() / delete / delete[]
==1869==    at 0x4C270BD: free (vg_replace_malloc.c:366)
==1869==    by 0x41366C: read_data_term (readdata.c:224)
==1869==    by 0x418CF4: mainloop (main.c:698)
==1869==    by 0x41906B: main (main.c:941)
==1869==  Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd ==1869==
mfsmount[1869]: master: connection lost (1)
(...)

But after applying the patch we have:
==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted connection with parameters: read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to root:root
mfsmount[1947]: master: connection lost (1)
(...)

If your problem exists only while using mount with autofs, please send your autofs configuration - we'll try to recreate the problem again. You may again check if you applied our patch correctly.

BTW. We strongly not recommend to use the "mfscachefiles" option. It forces to leave files in the cache forever. It's not good. From 1.6.21 its usage will be marked as deprecated and probably from 1.7 we will remove it completely.

Regards
Michal

-----Original Message-----
From: 姜智华 [mailto:fl...@gm...] 
Sent: Wednesday, March 16, 2011 4:32 AM
To: Michal Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

Hi,

I tried the patch but the core file still gets dumped (with mfs 1.6.20)

Core was generated by `mfsmount /home/fwjiang -o
rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'.
Program terminated with signal 6, Aborted.
#0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64
glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64
(gdb) bt
#0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
#1  0x00000031c1633fd5 in abort () from /lib64/libc.so.6
#2  0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6
#3  0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000000040e4ad in read_data_term () at readdata.c:224
#5  0x00000000004131b5 in mainloop (args=0x7fffc23bf030,
    mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600
#6  0x00000000004134f8 in main (argc=<value optimized out>,
    argv=0x7fffc23bf158) at main.c:819

Any clues?

Thanks
Flow

On 3/15/11, Michal Borychowski <mic...@ge...> wrote:
> It'll be fixed in the next release. For the moment you may try this "patch":
>
>
> @@ -178,6 +178,7 @@ void read_data_end(void* rr) {
>         }
>         if (rrec->rbuff!=NULL) {
>                 free(rrec->rbuff);
> +               rrec->rbuff=NULL;
>         }
>
>         pthread_mutex_lock(&glock);
>
>
> Kind regards
> Michal
>
> -----Original Message-----
> From: Flow Jiang [mailto:fl...@gm...]
> Sent: Friday, March 04, 2011 4:21 PM
> To: Michal Borychowski
> Cc: moo...@li...
> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>
> I tried to re-compile mfsmount with the "free(freecblockshead)" line
> commented out. Now our servers (which keep running 7x24) are happy, no
> more core files. However, core files still gets generated on our
> workstations when they reboot. The core is generated from the
> "read_data_term" line right after the "write_data_term" line mentioned
> previously.
>
> Hopefully this will also get fixed in next release, and will even be
> better if I can have a quick solution / patch for the issue.
>
> Thanks
> Flow
>
> On 03/01/2011 11:37 PM, Flow Jiang wrote:
>> Michal,
>>
>> Glad to know that this error could be simply solved by commenting out
>> that line and will try tomorrow to see if it fixes this issue.
>>
>> It does annoying since each core file takes about 170M and I tried to
>> disable the core dump but failed. So hopefully we can have a better
>> solution in the next release.
>>
>> Thanks
>> Flow
>>
>> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>>> Hi!
>>>
>>> This error is not a serious one. It may happen only upon exits. If these
>>> errors are annoying a quick solution is to comment out the
>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>>> prepare a better solution in the next release.
>>>
>>>
>>> Kind regards
>>> Michał
>
> ------------------------------------------------------------------------------
> What You Don't Know About Data Connectivity CAN Hurt You
> This paper provides an overview of data connectivity, details
> its effect on application quality, and explores various alternative
> solutions. http://p.sf.net/sfu/progress-d2d
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: 姜智华 <fl...@gm...> - 2011-04-06 12:12:08

Hi, Michal

I tried to spent some time today on MFS and noticed both the core dump
and the Valgrind trace you gave was generated from "read_data_term",
but the patch you gave was at "read_data_end". So I tried to apply the
same strategy in both "read_data_end" AND "read_data_term", here's the
patch I made:

@@ -178,6 +178,7 @@
        }
        if (rrec->rbuff!=NULL) {
                free(rrec->rbuff);
+               rrec->rbuff=NULL;
        }

        pthread_mutex_lock(&glock);
@@ -221,6 +222,7 @@
                        }
                        if (rr->rbuff) {
                                free(rr->rbuff);
+                               rr->rbuff = NULL;
                        }
                        pthread_cond_destroy(&(rr->cond));
                        free(rr);

Now no more core files!!!

So could you please confirm if the change in "read_data_term" is also
expected (or the patch you gave in "read_data_end" should actually
happen in "read_data_term"?)

If you don't think the patch I made is reasonable or still want to
recreate the issue, I'll send my AutoFS configuration later.

Thanks
Flow

On 4/5/11, Michal Borychowski <mic...@ge...> wrote:
> Hi!
>
> We made tests on several different operating system and are not able to
> reproduce your error. Valgrind also doesn't fined anything bad.
>
> Without the patch we have:
> root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869==
> Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU
> GPL'd, by Julian Seward et al.
> ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted
> connection with parameters:
> read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to
> root:root ==1869== Invalid free() / delete / delete[]
> ==1869==    at 0x4C270BD: free (vg_replace_malloc.c:366)
> ==1869==    by 0x41366C: read_data_term (readdata.c:224)
> ==1869==    by 0x418CF4: mainloop (main.c:698)
> ==1869==    by 0x41906B: main (main.c:941)
> ==1869==  Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd
> ==1869==
> mfsmount[1869]: master: connection lost (1)
> (...)
>
>
> But after applying the patch we have:
> ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009,
> and GNU GPL'd, by Julian Seward et al.
> ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted
> connection with parameters:
> read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to
> root:root
> mfsmount[1947]: master: connection lost (1)
> (...)
>
>
> If your problem exists only while using mount with autofs, please send your
> autofs configuration - we'll try to recreate the problem again. You may
> again check if you applied our patch correctly.
>
> BTW. We strongly not recommend to use the "mfscachefiles" option. It forces
> to leave files in the cache forever. It's not good. From 1.6.21 its usage
> will be marked as deprecated and probably from 1.7 we will remove it
> completely.
>
>
> Regards
> Michal
>
>
> -----Original Message-----
> From: 姜智华 [mailto:fl...@gm...]
> Sent: Wednesday, March 16, 2011 4:32 AM
> To: Michal Borychowski
> Cc: moo...@li...
> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>
> Hi,
>
> I tried the patch but the core file still gets dumped (with mfs 1.6.20)
>
> Core was generated by `mfsmount /home/fwjiang -o
> rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'.
> Program terminated with signal 6, Aborted.
> #0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64
> glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64
> (gdb) bt
> #0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
> #1  0x00000031c1633fd5 in abort () from /lib64/libc.so.6
> #2  0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6
> #3  0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000040e4ad in read_data_term () at readdata.c:224
> #5  0x00000000004131b5 in mainloop (args=0x7fffc23bf030,
>     mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600
> #6  0x00000000004134f8 in main (argc=<value optimized out>,
>     argv=0x7fffc23bf158) at main.c:819
>
> Any clues?
>
> Thanks
> Flow
>
> On 3/15/11, Michal Borychowski <mic...@ge...> wrote:
>> It'll be fixed in the next release. For the moment you may try this
>> "patch":
>>
>>
>> @@ -178,6 +178,7 @@ void read_data_end(void* rr) {
>>         }
>>         if (rrec->rbuff!=NULL) {
>>                 free(rrec->rbuff);
>> +               rrec->rbuff=NULL;
>>         }
>>
>>         pthread_mutex_lock(&glock);
>>
>>
>> Kind regards
>> Michal
>>
>> -----Original Message-----
>> From: Flow Jiang [mailto:fl...@gm...]
>> Sent: Friday, March 04, 2011 4:21 PM
>> To: Michal Borychowski
>> Cc: moo...@li...
>> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>>
>> I tried to re-compile mfsmount with the "free(freecblockshead)" line
>> commented out. Now our servers (which keep running 7x24) are happy, no
>> more core files. However, core files still gets generated on our
>> workstations when they reboot. The core is generated from the
>> "read_data_term" line right after the "write_data_term" line mentioned
>> previously.
>>
>> Hopefully this will also get fixed in next release, and will even be
>> better if I can have a quick solution / patch for the issue.
>>
>> Thanks
>> Flow
>>
>> On 03/01/2011 11:37 PM, Flow Jiang wrote:
>>> Michal,
>>>
>>> Glad to know that this error could be simply solved by commenting out
>>> that line and will try tomorrow to see if it fixes this issue.
>>>
>>> It does annoying since each core file takes about 170M and I tried to
>>> disable the core dump but failed. So hopefully we can have a better
>>> solution in the next release.
>>>
>>> Thanks
>>> Flow
>>>
>>> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>>>> Hi!
>>>>
>>>> This error is not a serious one. It may happen only upon exits. If these
>>>> errors are annoying a quick solution is to comment out the
>>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>>>> prepare a better solution in the next release.
>>>>
>>>>
>>>> Kind regards
>>>> Michał
>>
>> ------------------------------------------------------------------------------
>> What You Don't Know About Data Connectivity CAN Hurt You
>> This paper provides an overview of data connectivity, details
>> its effect on application quality, and explores various alternative
>> solutions. http://p.sf.net/sfu/progress-d2d
>> _______________________________________________
>> moosefs-users mailing list
>> moo...@li...
>> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>>
>>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

From: Michal B. <mic...@ge...> - 2011-04-07 09:45:18

Thanks for the information. We are now looking into it


Regards
Michal

-----Original Message-----
From: 姜智华 [mailto:fl...@gm...] 
Sent: Wednesday, April 06, 2011 2:12 PM
To: Michal Borychowski
Cc: moo...@li...
Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs

Hi, Michal

I tried to spent some time today on MFS and noticed both the core dump
and the Valgrind trace you gave was generated from "read_data_term",
but the patch you gave was at "read_data_end". So I tried to apply the
same strategy in both "read_data_end" AND "read_data_term", here's the
patch I made:

@@ -178,6 +178,7 @@
        }
        if (rrec->rbuff!=NULL) {
                free(rrec->rbuff);
+               rrec->rbuff=NULL;
        }

        pthread_mutex_lock(&glock);
@@ -221,6 +222,7 @@
                        }
                        if (rr->rbuff) {
                                free(rr->rbuff);
+                               rr->rbuff = NULL;
                        }
                        pthread_cond_destroy(&(rr->cond));
                        free(rr);

Now no more core files!!!

So could you please confirm if the change in "read_data_term" is also
expected (or the patch you gave in "read_data_end" should actually
happen in "read_data_term"?)

If you don't think the patch I made is reasonable or still want to
recreate the issue, I'll send my AutoFS configuration later.

Thanks
Flow

On 4/5/11, Michal Borychowski <mic...@ge...> wrote:
> Hi!
>
> We made tests on several different operating system and are not able to
> reproduce your error. Valgrind also doesn't fined anything bad.
>
> Without the patch we have:
> root@ubuntu10-64b:~/mfs-1.6.21/mfsmount# valgrind ./mfsmount -f ==1869==
> Memcheck, a memory error detector ==1869== Copyright (C) 2002-2009, and GNU
> GPL'd, by Julian Seward et al.
> ==1869== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==1869== Command: ./mfsmount -f ==1869== mfsmaster accepted
> connection with parameters:
> read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to
> root:root ==1869== Invalid free() / delete / delete[]
> ==1869==    at 0x4C270BD: free (vg_replace_malloc.c:366)
> ==1869==    by 0x41366C: read_data_term (readdata.c:224)
> ==1869==    by 0x418CF4: mainloop (main.c:698)
> ==1869==    by 0x41906B: main (main.c:941)
> ==1869==  Address 0x67f56e0 is not stack'd, malloc'd or (recently) free'd
> ==1869==
> mfsmount[1869]: master: connection lost (1)
> (...)
>
>
> But after applying the patch we have:
> ==1947== Memcheck, a memory error detector ==1947== Copyright (C) 2002-2009,
> and GNU GPL'd, by Julian Seward et al.
> ==1947== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==1947== Command: ./mfsmount -f ==1947== mfsmaster accepted
> connection with parameters:
> read-write,restricted_ip,ignore_gid,can_change_quota ; root mapped to
> root:root
> mfsmount[1947]: master: connection lost (1)
> (...)
>
>
> If your problem exists only while using mount with autofs, please send your
> autofs configuration - we'll try to recreate the problem again. You may
> again check if you applied our patch correctly.
>
> BTW. We strongly not recommend to use the "mfscachefiles" option. It forces
> to leave files in the cache forever. It's not good. From 1.6.21 its usage
> will be marked as deprecated and probably from 1.7 we will remove it
> completely.
>
>
> Regards
> Michal
>
>
> -----Original Message-----
> From: 姜智华 [mailto:fl...@gm...]
> Sent: Wednesday, March 16, 2011 4:32 AM
> To: Michal Borychowski
> Cc: moo...@li...
> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>
> Hi,
>
> I tried the patch but the core file still gets dumped (with mfs 1.6.20)
>
> Core was generated by `mfsmount /home/fwjiang -o
> rw,mfscachefiles,mfsentrycacheto=30,mfsattrcacheto=30'.
> Program terminated with signal 6, Aborted.
> #0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> filesystem-2.4.30-2.fc12.x86_64 fuse-libs-2.8.5-2.fc12.x86_64
> glibc-2.11.2-3.x86_64 libgcc-4.4.4-10.fc12.x86_64
> (gdb) bt
> #0  0x00000031c16327f5 in raise () from /lib64/libc.so.6
> #1  0x00000031c1633fd5 in abort () from /lib64/libc.so.6
> #2  0x00000031c166fa1b in __libc_message () from /lib64/libc.so.6
> #3  0x00000031c1675336 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000040e4ad in read_data_term () at readdata.c:224
> #5  0x00000000004131b5 in mainloop (args=0x7fffc23bf030,
>     mp=0xacc290 "/home/fwjiang", mt=1, fg=0) at main.c:600
> #6  0x00000000004134f8 in main (argc=<value optimized out>,
>     argv=0x7fffc23bf158) at main.c:819
>
> Any clues?
>
> Thanks
> Flow
>
> On 3/15/11, Michal Borychowski <mic...@ge...> wrote:
>> It'll be fixed in the next release. For the moment you may try this
>> "patch":
>>
>>
>> @@ -178,6 +178,7 @@ void read_data_end(void* rr) {
>>         }
>>         if (rrec->rbuff!=NULL) {
>>                 free(rrec->rbuff);
>> +               rrec->rbuff=NULL;
>>         }
>>
>>         pthread_mutex_lock(&glock);
>>
>>
>> Kind regards
>> Michal
>>
>> -----Original Message-----
>> From: Flow Jiang [mailto:fl...@gm...]
>> Sent: Friday, March 04, 2011 4:21 PM
>> To: Michal Borychowski
>> Cc: moo...@li...
>> Subject: Re: [Moosefs-users] Core Dumped from mfsmount with Autofs
>>
>> I tried to re-compile mfsmount with the "free(freecblockshead)" line
>> commented out. Now our servers (which keep running 7x24) are happy, no
>> more core files. However, core files still gets generated on our
>> workstations when they reboot. The core is generated from the
>> "read_data_term" line right after the "write_data_term" line mentioned
>> previously.
>>
>> Hopefully this will also get fixed in next release, and will even be
>> better if I can have a quick solution / patch for the issue.
>>
>> Thanks
>> Flow
>>
>> On 03/01/2011 11:37 PM, Flow Jiang wrote:
>>> Michal,
>>>
>>> Glad to know that this error could be simply solved by commenting out
>>> that line and will try tomorrow to see if it fixes this issue.
>>>
>>> It does annoying since each core file takes about 170M and I tried to
>>> disable the core dump but failed. So hopefully we can have a better
>>> solution in the next release.
>>>
>>> Thanks
>>> Flow
>>>
>>> On 03/01/2011 09:00 PM, Michal Borychowski wrote:
>>>> Hi!
>>>>
>>>> This error is not a serious one. It may happen only upon exits. If these
>>>> errors are annoying a quick solution is to comment out the
>>>> "free(freecblockshead)" line, recompile mfsmount and run again. We'll
>>>> prepare a better solution in the next release.
>>>>
>>>>
>>>> Kind regards
>>>> Michał
>>
>> ------------------------------------------------------------------------------
>> What You Don't Know About Data Connectivity CAN Hurt You
>> This paper provides an overview of data connectivity, details
>> its effect on application quality, and explores various alternative
>> solutions. http://p.sf.net/sfu/progress-d2d
>> _______________________________________________
>> moosefs-users mailing list
>> moo...@li...
>> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>>
>>
>
> ------------------------------------------------------------------------------
> Colocation vs. Managed Hosting
> A question and answer guide to determining the best fit
> for your organization - today and in the future.
> http://p.sf.net/sfu/internap-sfd2d
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users
>
>

------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
moosefs-users mailing list
moo...@li...
https://lists.sourceforge.net/lists/listinfo/moosefs-users