#23 kfs_fuse causes segfault for large files

open
nobody
5
2008-12-28
2008-12-28
No

kfs_fuse Segfault for large files
---------------------------------
When processing large files using the kfs_fuse module, the code will dereference a null point after approximately 5 minutes of processing.:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x409ab950 (LWP 9351)]
0x000000000044f665 in KFS::KfsClientImpl::Sync (this=0x884680, fd=5) at ~/kfs/trunk/src/cc/libkfsClient/KfsClient.cc:1298
1298 if (mFileTable[fd]->buffer.dirty) {

Steps to Reproduce:
1. Place a large file (I used a 12GB text file) on the file system
2. Open several command shells and run the following command in sequence
a. ./trunk/scripts/gdb --args kfs_fuse kfs-fuse -f
b. ./trunk/scripts/kfs-fuse/md5sum large-file.txt

Version tested:
SVN Rev 230, Last Changed Date: 2008-12-26 16:56:40 -0700 (Fri, 26 Dec 2008)

System info:
Ubuntu 8.10, Kernel 2.6.27-9-generic, amd64

Build settings:
g++ 4.3.2
cmake -Wno-dev -DJAVA_INCLUDE_PATH=/usr/lib/jvm/java-6-sun-1.6.0.10/include -DJAVA_INCLUDE_PATH2=/usr/lib/jvm/java-6-sun-1.6.0.10/include/linux -DUSE_STATIC_LIB_LINKAGE=off -DCMAKE_BUILD_TYPE=Debug -DDEBUG=1 ../

Starting program: ~/kfs/trunk/build/bin/kfs_fuse kfs-fuse -f
[Thread debugging using libthread_db enabled]
[New Thread 0x7f3e678ae6f0 (LWP 9345)]
[New Thread 0x409ab950 (LWP 9351)]
[New Thread 0x41874950 (LWP 9352)]
[New Thread 0x42075950 (LWP 9358)]
[New Thread 0x42876950 (LWP 19889)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x409ab950 (LWP 9351)]
0x000000000044f665 in KFS::KfsClientImpl::Sync (this=0x884680, fd=5) at ~/kfs/trunk/src/cc/libkfsClient/KfsClient.cc:1298
1298 if (mFileTable[fd]->buffer.dirty) {
(gdb) backtrace
#0 0x000000000044f665 in KFS::KfsClientImpl::Sync (this=0x884680, fd=5) at ~/kfs/trunk/src/cc/libkfsClient/KfsClient.cc:1298
#1 0x000000000044f71c in KFS::KfsClient::Sync (this=0x84cc60, fd=5) at ~/kfs/trunk/src/cc/libkfsClient/KfsClient.cc:330
#2 0x000000000044764b in fuse_flush (path=0x7f3e60021730 "/dtn_daily2.tsv", finfo=0x409ab0a0) at ~/kfs/trunk/src/cc/fuse/kfs_fuse_main.cc:156
#3 0x00007f3e6749d8cc in ?? () from /lib/libfuse.so.2
#4 0x00007f3e6749da81 in ?? () from /lib/libfuse.so.2
#5 0x00007f3e674a4ca6 in ?? () from /lib/libfuse.so.2
#6 0x00007f3e674a283f in ?? () from /lib/libfuse.so.2
#7 0x00007f3e6703e3ea in start_thread () from /lib/libpthread.so.0
#8 0x00007f3e661e0c6d in clone () from /lib/libc.so.6
#9 0x0000000000000000 in ?? ()
(gdb) q

Discussion

  • Eric Holmberg

    Eric Holmberg - 2008-12-28

    I have a work-around patch for this which I will upload shortly.

     
  • Eric Holmberg

    Eric Holmberg - 2008-12-28

    I added some additional logging and found that the file was being closed by KfsRead / KfsClient and THEN kfs_fuse was attempting to do a final sync on the already closed file (which resulted in dereferencing a null pointer). Here is the additional trace showing the failure:

    12-28-2008 12:34:38.836 DEBUG - (KfsRead.cc:117) Current pointer (12550397724) is past EOF (12550397724) ...so, done
    12-28-2008 12:34:38.837 DEBUG - (KfsClient.cc:2399) Closing filetable entry: 5, openmode = 0, path = /dtn_daily2.tsv
    12-28-2008 12:34:38.838 DEBUG - (KfsClient.cc:2503) file-id for dir: / (file = dtn_daily2.tsv) is 2
    12-28-2008 12:34:38.838 DEBUG - (KfsClient.cc:2317) Entry for <2, dtn_daily2.tsv> is likely stale; forcing revalidation
    12-28-2008 12:34:38.838 DEBUG - (KfsClient.cc:2399) Closing filetable entry: 1, openmode = 0, path = /dtn_daily2.tsv
    12-28-2008 12:34:38.852 DEBUG - (kfs_fuse_main.cc:160) flusing file '/dtn_daily2.tsv'; fd=00000005
    12-28-2008 12:34:38.852 DEBUG - (kfs_fuse_main.cc:161) fuse_flush called
    12-28-2008 12:34:38.852 DEBUG - (KfsClient.cc:1292) Syncing fd: 5
    12-28-2008 12:34:38.852 DEBUG - (KfsClient.cc:1300) Filetable[fd] is null fd: 5

    Note that the fuse_flush called message is being called fter the file has been closed. I've added code that verifies that the Filetable[fd] item is not null and it logs a message when it is null (that's the last line in the example).

    The only other problem I'm seeing is a failure if the servers have just been brought up and they are in recovery mode. That's listed as a separate bug report.

    Changes in the attached patch:
    1. Added null-pointer checking to the logging since it was being called before the client initialized it in kfs_fuse
    2. Added debug flag pass-through the fuse CMakeLists.txt when debug build has been set
    3. Added logging to kfs_fuse

    Let me know if you have any questions.

    -Eric Holmberg

    File Added: kfs_fuse_crash.patch

     
  • Eric Holmberg

    Eric Holmberg - 2008-12-28

    Patch to fix crash when using fuse with large files

     
  • sriramsrao

    sriramsrao - 2009-01-14

    In 0.2.3/trunk, a check for filetable[fd] == NULL in sync has been added. This'll prevent the crash.

     

Log in to post a comment.