#18 curlftpfs hangs when making differential backups using DAR

closed-fixed
None
5
2007-01-22
2007-01-14
Anonymous
No

Hello,

let me first thank you for this cute file system! I use it every day on a production site and it's quite nice as long as we deal with small files (big ones: http://sourceforge.net/tracker/index.php?func=detail&aid=1500232&group_id=160565&atid=816357

I'm doing differential backups with DAR (http://dar.linux.free.fr) which are stored on a curlftpfs mount. The FTP Server is on a 100MBit LAN.

While doing full backups all is going well.
When doing the diff backup the next day, curlftpfs hangs while reading the master backup from the FTP server.
Every process accessing the curlftpfs Mountpoint will also hang forever. Only a `killall -9 curlftpfs` and a following `fusermount -u /mountpoint` helps.

Experimenting with the following options didn't help:
-o sync_read
-o intr
-o direct_io
-s
-f
-o debug
-o ftpfs_debug
-v
-o max_readahead=16
-o noauto_cache

A detailed debug log is attached

Please let me know if I can do anything to help you debugging this! Patching and recompiling the source is not a problem.

Discussion

  • Nobody/Anonymous

    Detailed debug log with comments

     
  • Robson Braga Araujo

    • assigned_to: nobody --> braga
     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    I can't figure what happened based on this log. Can you attach a debugger to the process and see what's going on? If not, can you help me reproduce this in a controlled environment? As a last resource, can I have access to the ftp server to debug it?

     
  • flixfe

    flixfe - 2007-01-16

    Logged In: YES
    user_id=1693442
    Originator: NO

    Thanks for looking into this, I'm the not-logged-in reporter from above.

    I reproduced the problem under strace. You can get the strace output here: http://xilef.gotdns.org/misc/curlftpfs_strace.bz2

    As for reproducing the problem:
    I can trigger it reliably making differential backups with DAR. The master backup (against which the diff is being made) must be on the curlftpfs mount.
    In my strace output the master backup is called dar_vmail-20070113-full.<counter>.dar
    I'm also storing the to be made backup on curlftpfs, but writing to the share does not seem to be the problem. I draw that conclusion from the fact, that full backups work fine.

    So all you need to reproduce it should be DAR from http://dar.linux.free.fr and curlftpfs. After making a full backup of something, try to make a diff against it using dar's "-A <full backup>" option.
    Here are the complete DAR commands I used:
    full backup:
    dar -c /mnt/ftpbackup/asterix/dar_www-20070115-full -R /mnt/data/www -y9 -s 50M -m200 -Plost+found
    -Z*.[gG][zZ] -Z*.[bB][zZ]2 -Z*.[zZ][iI][pP] -Z*.[pP][nN][gG] -Z*.[jJ][pP][gG]
    -Z*.[aA][vV][iI] -Z*.[mM][pP][gG] -Z*.[mM][pP]3 -Z*.[oO][gG][gG] -Z*.[fF][lL][aA][cC]

    diff backup:
    dar -c /mnt/ftpbackup/asterix/dar_www-20070116-diff -R /mnt/data/www -y9 -s 50M -m200 -Plost+found
    -Z*.[gG][zZ] -Z*.[bB][zZ]2 -Z*.[zZ][iI][pP] -Z*.[pP][nN][gG] -Z*.[jJ][pP][gG]
    -Z*.[aA][vV][iI] -Z*.[mM][pP][gG] -Z*.[mM][pP]3 -Z*.[oO][gG][gG] -Z*.[fF][lL][aA][cC] -A /mnt/ftpbackup/asterix/dar_www-20070115-full

    I tried the following software versions, all give the same problem:
    CPU: AMD Athlond 64
    Linux kernel: 2.6.17 - 2.6.19.2
    DAR: 2.2.6 and 2.3.2
    FTP Server: my ISP's ProFTPD 1.2.10
    Curlftpfs: 0.9
    curl: 7.15.5

    Note that all software is exclusively using 64bit mode. 32bit compatibility has been turned off in the kernel.

    I'm sorry, but I'll not be able to give you SSH access to the server.
    Please let me know if you need anything else.

     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    I just tried using dar for doing the same thing that you did. It worked both for the full backup as for the diff backup. What version of FUSE are you using? The only other thing I see is different in your case is that you're using 64 bit binaries, but I don't have a computer for testing this.

    Can you attach gdb to the running process when it blocks and try to debug it?

     
  • flixfe

    flixfe - 2007-01-20

    Logged In: YES
    user_id=1693442
    Originator: NO

    Thanks again. I have fuse built into the kernel (2.6.19.2) and I tried fuse-2.6.0 and fuse-2.6.1 both from the Gentoo repository.
    I will attach gdb to the process when I get a chance in a few days.
    I'll be using gdb for the first time. Here's what I have in mind, please notify me if I should do something different:
    1) Recompile curlftpfs with gcc -g and without stripping it
    2) #gdb
    3) file /usr/bin/curlftpfs
    4) run <all arguments here>
    5) trigger the freeze

     
  • flixfe

    flixfe - 2007-01-21

    Logged In: YES
    user_id=1693442
    Originator: NO

    Attaching gdb in the way I described below didn't yield any output.
    But I found something else:

    I started curlftpfs with debug output and fuse debug output and then logged the output of:
    1) a "cat /file > /dev/null" which works as expected
    2) a differential dar which hangs

    Diffing those logs shows that everything goes exactly the same in both outputs until the middle of reading the file.
    Then the cat just continues to read the file while the dar issues a GETATTR and soon after hangs before handing over the file to fuse.
    Please consult those files for the details:
    1) http://xilef.gotdns.org/misc/curlftpfs-cat.log.bz2
    2) http://xilef.gotdns.org/misc/curlftpfs-dar.log.bz2
    3) diff between 1) and 2) with comment: http://xilef.gotdns.org/misc/curlftpfs-cat-dar.diff.bz2

    Also note, that the dar against which the diff is made must be bigger than a certain size to trigger the hang.
    With sizes of a few bytes to a few kilobytes all works fine. The log above is made with a size of 100kb which is enough to trigger the problem.

    Does this help u finding the problem?
    It's not a DAR-only problem, as I did exactly the same backups for a few years against a SAMBA share and never had any problem.

    Thanks!

     
  • flixfe

    flixfe - 2007-01-21

    Logged In: YES
    user_id=1693442
    Originator: NO

    FYI, compiled and tested with FUSE 2.5.3 but the result is the same.

     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    Thanks for putting so much effort into this.

    I still can't figure what happened. What gave me a clue was the strace output that you sent me a while back that showed curlftpfs looping around the select call. And you're the second guy reporting this loop, but I can't reproduce it. I did the same steps as you and everything worked fine. Dar starts reading the file, does a GETATTR to find out the file size, and then reads the final bytes of the file and continues working. I'm starting to think that this is related to the 64 bit architecture.

    What you can do to help me is the following:

    - make curlftfs hang again, it will probably be looping around the select call
    - find out what is the PID of the process
    - attach gdb to it: gdb curlftpfs <PID>
    - see if it is inside the ftpfs_read_chunk function. You can check that by doing the "bt" command in gdb
    - if it is, try to print the values of some variables:
    - print running_handles
    - print fh->buf
    - print size
    - print offset

     
  • flixfe

    flixfe - 2007-01-22

    Logged In: YES
    user_id=1693442
    Originator: NO

    Somehow I didn't succeed in including debugging symbols for the gdb run. Maybe you can point out my mistake?

    # export CFLAGS="-g -ggdb"; export CXXFLAGS="$CFLAGS"
    # make clean; ./configure --prefix=/usr
    # make
    # make install
    (same for fuse)
    (make curlftpfs hang)
    # gdb /usr/bin/curlftpfs `pidof curlftpfs`
    GNU gdb 6.6
    Copyright (C) 2006 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB. Type "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu"...
    Using host libthread_db library "/lib/libthread_db.so.1".
    Attaching to program: /usr/bin/curlftpfs, process 26374
    0x000030beeadf52a5 in _start () from /lib64/ld-linux-x86-64.so.2
    (gdb) bt
    #0 0x000030beeadf52a5 in _start () from /lib64/ld-linux-x86-64.so.2
    #1 0x000007d9833c4ce2 in ?? ()
    #2 0x000007d9834fc534 in ?? ()
    #3 0x000007d9834fc4d0 in ?? ()
    #4 0x000007d9834fc534 in ?? ()
    #5 0x0000000000000000 in ?? ()
    (gdb) print running_handles
    No symbol "running_handles" in current context.

    Any idea what I am missing?
    It's not curlftpfs specific as I didn't suceed with a 10-line test.c program either. Googling around didn't help either.
    Alternatively, if you have a rough idea where the problem is I could add debug printf's to the code?

     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    I think that "make install" strips the binary of its debug symbols. You can run the curlftpfs binary from the directory where curlftpfs was compiled or copy it manually.

    The problem seems to be in the ftpfs_read_chunk function. There is a select call there inside a loop and it seems that for some reason the exit condition is never reached. Some other user reported to me that he saw the running_handles variable having a value of 5, but it should always be 0 or 1. You can see if you have the same problem.

     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    Ok, I could reproduce it only with curl-7.15.5. I tried both 7.15.4 and 7.16.1-20070122 and it worked fine. Can you test one of these? Remember that 7.16.0 also has a bug that makes it enter an infinite loop.

     
  • flixfe

    flixfe - 2007-01-22

    Logged In: YES
    user_id=1693442
    Originator: NO

    Hey braga THAT'S IT! :-)
    Downgraded to curl-7.15.4 and completed a 500MB diff backup against a 10GB full! Fast and without any hang.
    I _think_ I tried with curl 7.15.1 before reporting this bug and had the same problem.

    If you want me to do any more tests I'd help happily, otherwise I just owe you a beer.

    Thanks for fixing this (kinda).

     
  • flixfe

    flixfe - 2007-01-22

    Logged In: YES
    user_id=1693442
    Originator: NO

    I'm also removing the debug files linked in previous comments. If anyone needs them for fixing the curl bug send me a PM.

     
  • Robson Braga Araujo

    • status: open --> closed-fixed
     
  • Robson Braga Araujo

    Logged In: YES
    user_id=307089
    Originator: NO

    I'm glad that it works now. Since version 0.8, curlftpfs depends on curl 7.15.2 or later. Versions 7.15.2 and 7.15.3 have a bug that makes curlftpfs reconnect more often than necessary to the server. Versions 7.15.5 and 7.16.0 have bugs that make curlftpfs enter an infinite loop. So right now the recommended version of libcurl is either 7.15.4 or the development branch.

    Thanks for testing this. I'm closing the bug now.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks