From: Dimitrios A. <ji...@gm...> - 2007-04-06 11:47:25
|
Hello list, sshfs is repeatedly crashing on me with segmentation fault, after about a day of intense read-only operations. This happens even after applying the latest cvs patches (revisions 1.79 and 1.81 of sshfs.c). The main problem is that I can't run sshfs under gdb (it crashes immediately for some reason) and I can't get a core dump, so I can't help with debugging. Finally I figured out that sshfs can't write a core dump because it chdir()'s to / where it has no write permissions. Is there a particular reason for doing that? How can I get a useful backtrace? Thanks in advance, Dimitris P.S. Please CC replies to me as I'm not subscribed |
From: Miklos S. <mi...@sz...> - 2007-04-10 21:54:20
|
> sshfs is repeatedly crashing on me with segmentation fault, after > about a day of intense read-only operations. Which fuse version? > This happens even after applying the latest cvs patches (revisions > 1.79 and 1.81 of sshfs.c). > > The main problem is that I can't run sshfs under gdb (it crashes > immediately for some reason) and I can't get a core dump, so I can't > help with debugging. Finally I figured out that sshfs can't write a > core dump because it chdir()'s to / where it has no write > permissions. Is there a particular reason for doing that? Yes, daemons usually change their working directory to / so that they let the current directory be unmounted. You can enable corefiles by doing something like this: echo "/var/tmp/core.%p" > /proc/sys/kernel/core_pattern Thanks, Miklos |
From: Dimitrios A. <ji...@gm...> - 2007-04-11 10:43:23
|
On Tue, 10 Apr 2007 23:54:03 +0200 Miklos Szeredi <mi...@sz...> wrote: > > sshfs is repeatedly crashing on me with segmentation fault, after > > about a day of intense read-only operations. > > Which fuse version? fuse 2.6.3 > > > This happens even after applying the latest cvs patches (revisions > > 1.79 and 1.81 of sshfs.c). > > > > The main problem is that I can't run sshfs under gdb (it crashes > > immediately for some reason) and I can't get a core dump, so I can't > > help with debugging. Finally I figured out that sshfs can't write a > > core dump because it chdir()'s to / where it has no write > > permissions. Is there a particular reason for doing that? > > Yes, daemons usually change their working directory to / so that they > let the current directory be unmounted. > > You can enable corefiles by doing something like this: > > echo "/var/tmp/core.%p" > /proc/sys/kernel/core_pattern > > Thanks, > Miklos Thanks Miklos, I have already done that (although I still don't agree with this behaviour because if I hadn't root access I wouldn't be able to do it) and got 2 core dumps, which are identically cryptic to me. First of all here is how I call sshfs: $ /var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -o workaround=all -o ro -o allow_other -o MACs=hmac-md5-96 -o Ciphers=arcfour user@host:/remote/dir /local/dir/ user@host's password: (after about 1 day of 5MB/s read-only traffic...) Segmentation fault (core dumped) The first core file is 216 MB and the other is 31 MB, I don't know why they differ so much in size, the only thing I changed on the second case is doubling the stack size from 8MB to 16MB. I should note that the executables and the libraries are not stripped (at least 'file' command reports so). Anyway here is what I did, please tell me if I can do more: $ ls -l /tmp/core.* -rw------- 1 jimis users 225640448 2007-04-10 10:37 /tmp/core.15212 -rw------- 1 jimis users 31981568 2007-04-11 05:53 /tmp/core.3482 $ gdb -c /tmp/core.15212 GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". (no debugging symbols found) Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `/var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -o workaround=all -o ro -o all'. Program terminated with signal 11, Segmentation fault. #0 0xb7e9bd87 in ?? () (gdb) info thread 12 process 15212 0xb7f11410 in __kernel_vsyscall () 11 process 15218 0xb7f11410 in __kernel_vsyscall () 10 process 26948 0xb7f11410 in __kernel_vsyscall () 9 process 26952 0xb7f11410 in __kernel_vsyscall () 8 process 26955 0xb7f11410 in __kernel_vsyscall () 7 process 26963 0xb7f11410 in __kernel_vsyscall () 6 process 26972 0xb7f11410 in __kernel_vsyscall () 5 process 26980 0xb7f11410 in __kernel_vsyscall () 4 process 26981 0xb7f11410 in __kernel_vsyscall () 3 process 26982 0xb7f11410 in __kernel_vsyscall () 2 process 27003 0xb7f11410 in __kernel_vsyscall () * 1 process 26967 0xb7e9bd87 in ?? () (gdb) thread apply all bt Thread 12 (process 15212): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4ee31 in ?? () Thread 11 (process 15218): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 10 (process 26948): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 9 (process 26952): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 9 (process 26952): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 8 (process 26955): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4d87e in ?? () Thread 7 (process 26963): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4d87e in ?? () Thread 6 (process 26972): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 5 (process 26980): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4d87e in ?? () Thread 4 (process 26981): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 3 (process 26982): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4e3fb in ?? () Thread 2 (process 27003): #0 0xb7f11410 in __kernel_vsyscall () #1 0xb7e4d87e in ?? () ---Type <return> to continue, or q <return> to quit--- Thread 1 (process 26967): #0 0xb7e9bd87 in ?? () (gdb) info registers eax 0x8054888 134563976 ecx 0xda6cc2e4 -630406428 edx 0x8054898 134563992 ebx 0xb7eeb3e0 -1209093152 esp 0xb52fe080 0xb52fe080 ebp 0xb52fe0b8 0xb52fe0b8 esi 0xa2627b0 170272688 edi 0x1 1 eip 0xb7e9bd87 0xb7e9bd87 eflags 0x10286 [ PF SF IF RF ] cs 0x73 115 ss 0x7b 123 ds 0xffff007b -65413 es 0xc010007b -1072693125 fs 0x0 0 gs 0xc0100033 -1072693197 (gdb) quit $ gdb -c /tmp/core.3482 GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu". Failed to read a valid object file image from memory. Core was generated by `/var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -o workaround=all -o ro -o all'. Program terminated with signal 11, Segmentation fault. #0 0xb7e96ff8 in ?? () (gdb) info thread 12 process 3482 0xb7f0d410 in ?? () 11 process 3488 0xb7f0d410 in ?? () 10 process 20925 0xb7f0d410 in ?? () 9 process 21064 0xb7f0d410 in ?? () 8 process 21067 0xb7f0d410 in ?? () 7 process 21068 0xb7f0d410 in ?? () 6 process 21072 0xb7f0d410 in ?? () 5 process 21073 0xb7f0d410 in ?? () 4 process 21075 0xb7f0d410 in ?? () 3 process 21076 0xb7f0d410 in ?? () 2 process 21077 0xb7f0d410 in ?? () * 1 process 21069 0xb7e96ff8 in ?? () (gdb) thread apply all bt Thread 12 (process 3482): #0 0xb7f0d410 in ?? () Thread 11 (process 3488): #0 0xb7f0d410 in ?? () Thread 10 (process 20925): #0 0xb7f0d410 in ?? () Thread 9 (process 21064): #0 0xb7f0d410 in ?? () Thread 8 (process 21067): #0 0xb7f0d410 in ?? () Thread 7 (process 21068): #0 0xb7f0d410 in ?? () Thread 6 (process 21072): #0 0xb7f0d410 in ?? () Thread 5 (process 21073): #0 0xb7f0d410 in ?? () Thread 4 (process 21075): #0 0xb7f0d410 in ?? () Thread 3 (process 21076): #0 0xb7f0d410 in ?? () Thread 2 (process 21077): #0 0xb7f0d410 in ?? () Thread 1 (process 21069): #0 0xb7e96ff8 in ?? () (gdb) info registers eax 0x0 0 ecx 0xaf8c57d0 -1349756976 edx 0xb217cc30 -1307063248 ebx 0xb7ee73e0 -1209109536 esp 0xb4cc8f60 0xb4cc8f60 ebp 0xb4cc8fb8 0xb4cc8fb8 esi 0xb23f3a10 -1304479216 edi 0xb2142f20 -1307300064 eip 0xb7e96ff8 0xb7e96ff8 eflags 0x10292 [ AF SF IF RF ] cs 0x73 115 ss 0x7b 123 ds 0x7b 123 es 0x7b 123 fs 0x0 0 gs 0xc0100033 -1072693197 Dimitris |
From: Miklos S. <mi...@sz...> - 2007-04-15 19:57:19
|
> Thanks Miklos, I have already done that (although I still don't > agree with this behaviour because if I hadn't root access I wouldn't > be able to do it) Hmm, yeah. Maybe changing to /tmp would be better. > and got 2 core dumps, which are identically cryptic to me. First of > all here is how I call sshfs: Yes, those are not helpful. > $ /var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -o workaround=all -o ro -o allow_other -o MACs=hmac-md5-96 -o Ciphers=arcfour user@host:/remote/dir /local/dir/ > user@host's password: > (after about 1 day of 5MB/s read-only traffic...) > Segmentation fault (core dumped) > Can you try running sshfs _in_ gdb? I don't think it'll make a difference, but maybe... Thanks, Miklos |
From: Dimitrios A. <ji...@gm...> - 2007-04-16 11:12:46
|
On Sunday 15 April 2007 21:12:56 Miklos Szeredi wrote: > > Thanks Miklos, I have already done that (although I still don't > > agree with this behaviour because if I hadn't root access I wouldn't > > be able to do it) > > Hmm, yeah. Maybe changing to /tmp would be better. Or perhaps changing to $HOME > Can you try running sshfs _in_ gdb? I tried that, and the behavior is completely different. Instead of crashing about 12 hours later it crashes immediately when heavy traffic starts. Here is exactly what happened: (gdb) run Starting program: /var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -o workaround=all -o ro -o allow_other -o MACs=hmac-md5-96 -o Ciphers=arcfour user@host:/remote/dir /local/dir/ [Thread debugging using libthread_db enabled] [New Thread -1210800448 (LWP 30347)] jimis@lxplus's password: [New Thread -1210942576 (LWP 30353)] [New Thread -1219474544 (LWP 30354)] [New Thread -1227867248 (LWP 30355)] [New Thread -1236399216 (LWP 30358)] ---- (here I start the heavy I/O) ---- [New Thread -1244931184 (LWP 30365)] protocol error protocol error fuse: writing device: Invalid argument sshfs: pthread_mutex_lock.c:82: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed. Program received signal SIGABRT, Aborted. [Switching to Thread -1210942576 (LWP 30353)] 0xb7f46410 in __kernel_vsyscall () (gdb) (gdb) (gdb) bt #0 0xb7f46410 in __kernel_vsyscall () #1 0xb7d74721 in raise () from /lib/libc.so.6 #2 0xb7d75ef8 in abort () from /lib/libc.so.6 #3 0xb7d6df3c in __assert_fail () from /lib/libc.so.6 #4 0xb7e7ea47 in pthread_mutex_lock () from /lib/libpthread.so.0 #5 0xb7f3510c in free_req () from /usr/lib/libfuse.so.2 #6 0xb7f352c8 in send_reply () from /usr/lib/libfuse.so.2 #7 0xb7f360c2 in send_reply_ok () from /usr/lib/libfuse.so.2 #8 0xb7f339d7 in fuse_read () from /usr/lib/libfuse.so.2 #9 0xb7f35b7c in do_read () from /usr/lib/libfuse.so.2 #10 0xb7f36aed in fuse_ll_process () from /usr/lib/libfuse.so.2 #11 0xb7f38326 in fuse_session_process () from /usr/lib/libfuse.so.2 #12 0xb7f3471b in fuse_do_work () from /usr/lib/libfuse.so.2 #13 0xb7e7c4a2 in start_thread () from /lib/libpthread.so.0 #14 0xb7e0b52e in clone () from /lib/libc.so.6 Although this probably is a different situation, I hope it is more helpful this time. Thanks in advance, Dimitris |
From: Miklos S. <mi...@sz...> - 2007-04-16 11:40:35
|
[snip] > Although this probably is a different situation, I hope it is more helpful > this time. Yes, that helps. Can you start sshfs like this: (gdb) run -odebug,sshfs_debug other_options... > /tmp/sshfs.log 2>&1 So the debug output is captured in a log file. Then please post the log file from about 100 lines above the first protocol error till the end. Or if it's not too large (<10k) then you can post the whole logfile. Thanks, Miklos |
From: Dimitrios A. <ji...@gm...> - 2007-04-16 12:29:11
|
OK, the crash was identical but because the log file is big and contains personal info, I send it to you personally. Here is the gdb output, almost the same as before: This GDB was configured as "i686-pc-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -f -odebug,sshfs_debug -o workaround=all -o ro -o allow_other -o MACs=hmac-md5-96 -o Ciphers=arcfour user@host:/remote/dir /local/dir/ >/tmp/sshfs.log 2>&1 Starting program: /var/abs/local/sshfs/src/sshfs-fuse-1.7/sshfs -f -odebug,sshfs_debug -o workaround=all -o ro -o allow_other -o MACs=hmac-md5-96 -o Ciphers=arcfour user@host:/remote/dir /local/dir/ >/tmp/sshfs.log 2>&1 [Thread debugging using libthread_db enabled] [New Thread -1210263872 (LWP 31783)] user@host's password: [New Thread -1210406000 (LWP 31789)] [New Thread -1218937968 (LWP 31790)] [New Thread -1227330672 (LWP 31791)] [New Thread -1237320816 (LWP 31835)] [New Thread -1245713520 (LWP 31836)] [New Thread -1254106224 (LWP 31837)] Program received signal SIGABRT, Aborted. [Switching to Thread -1237320816 (LWP 31835)] 0xb7fc9410 in __kernel_vsyscall () (gdb) (gdb) bt #0 0xb7fc9410 in __kernel_vsyscall () #1 0xb7df7721 in raise () from /lib/libc.so.6 #2 0xb7df8ef8 in abort () from /lib/libc.so.6 #3 0xb7df0f3c in __assert_fail () from /lib/libc.so.6 #4 0xb7f01a47 in pthread_mutex_lock () from /lib/libpthread.so.0 #5 0xb7fb810c in free_req () from /usr/lib/libfuse.so.2 #6 0xb7fb82c8 in send_reply () from /usr/lib/libfuse.so.2 #7 0xb7fb90c2 in send_reply_ok () from /usr/lib/libfuse.so.2 #8 0xb7fb69d7 in fuse_read () from /usr/lib/libfuse.so.2 #9 0xb7fb8b7c in do_read () from /usr/lib/libfuse.so.2 #10 0xb7fb9aed in fuse_ll_process () from /usr/lib/libfuse.so.2 #11 0xb7fbb326 in fuse_session_process () from /usr/lib/libfuse.so.2 #12 0xb7fb771b in fuse_do_work () from /usr/lib/libfuse.so.2 #13 0xb7eff4a2 in start_thread () from /lib/libpthread.so.0 #14 0xb7e8e52e in clone () from /lib/libc.so.6 Dimitris |