From: Harald R. <ha...@in...> - 2007-08-02 22:19:23
|
Hey Miklos and all, =20 I have noticed that while ssh/scp always works, sshfs can not handle large filesets. =20 1. Dragging 7 GiB of files using Nautilus 2.16.2 from an sshfs mount onto the local hard drive hung a CentOS5 machine before 5 GiB completed. No Ctrl-Alt-F1, Ctrl-Alt-Delete, and since I had removed reset power off was required. My first ever hang on that machine. >From Ubuntu 7.04, SSHFS version 1.6, FUSE library version: 2.6.3, fusermount version: 2.6.3, using FUSE kernel interface version 7.8 To CentOS5, SSHFS version 1.8, FUSE library version: 2.6.5, fusermount version: 2.6.5, using FUSE kernel interface version 7.8 The other machine, Ubuntu7, got issues with its gnome panels, the system monitor panel hung and could not subsequently be restarted. =20 2. I have also noticed, where /dev/sda1 is a 30 GiB ntfs partition ntfsclone -s -o ansshfsfilesystem /dev/sda1 Always looses its sshfs mount connection and returns a write error, while ntfsclone -s -o - /dev/sda1 | gzip -c | ssh user@host 'cat >file' works flawlessly. That is ssh works, sshfs fails. This is between Ubuntu 7.04 machines. =20 3. My third observation is that copying a large fileset in a CentOS command window (partition images, less than 10 files, like 20 GiB total) from the Ubuntu machine to the CentOS machine above hangs CentOS networking every time. The CentOS machine can no longer reach the network and has to be revived with ifconfig eth0 down, ifconfig eth0 up, and route add default gw... =20 So, large files using sshfs corrupts the kernel. Any suggestions for fix/trouble shooting? =20 Regards, =20 Harald Rudell |
From: Miklos S. <mi...@sz...> - 2007-08-03 08:30:24
|
> I have noticed that while ssh/scp always works, sshfs can not handle > large filesets. > > > > 1. Dragging 7 GiB of files using Nautilus 2.16.2 from an sshfs mount > onto the local hard drive hung a CentOS5 machine before 5 GiB completed. > No Ctrl-Alt-F1, Ctrl-Alt-Delete, and since I had removed reset power off > was required. My first ever hang on that machine. Which kernel version? Is it an SMP machine? We had a very rare SMP hang in fuse a long time ago. It was fixed in 2.6.19 and backported to 2.6.18.2 and 2.6.16.38. So unless it's a very old kernel, it's unlikely to be the reason for the hang. > >From Ubuntu 7.04, SSHFS version 1.6, FUSE library version: 2.6.3, > fusermount version: 2.6.3, using FUSE kernel interface version 7.8 The machine to which you connect with sshfs is basically irrelevant. > To CentOS5, This sounds old. Maybe this _is_ that SMP hang. > SSHFS version 1.8, FUSE library version: 2.6.5, fusermount > version: 2.6.5, using FUSE kernel interface version 7.8 > > The other machine, Ubuntu7, got issues with its gnome panels, the system > monitor panel hung and could not subsequently be restarted. > > > > 2. I have also noticed, where /dev/sda1 is a 30 GiB ntfs partition > > ntfsclone -s -o ansshfsfilesystem /dev/sda1 > > Always looses its sshfs mount connection and returns a write error, Can you enable debugging in sshfs (-odebug,sshfs_debug,loglevel=debug) and send the output? If you redirect to a file you need to redirect both stdout and stderr to the same file (this has been fixed in fuse-2.7.0). > while > > ntfsclone -s -o - /dev/sda1 | gzip -c | ssh user@host 'cat > >file' > > works flawlessly. That is ssh works, sshfs fails. This is between Ubuntu > 7.04 machines. > > > > 3. My third observation is that copying a large fileset in a CentOS > command window (partition images, less than 10 files, like 20 GiB total) > from the Ubuntu machine to the CentOS machine above hangs CentOS > networking every time. The CentOS machine can no longer reach the > network and has to be revived with ifconfig eth0 down, ifconfig eth0 up, > and route add default gw... > > > > So, large files using sshfs corrupts the kernel. Any suggestions for > fix/trouble shooting? If it is an old kernel, you could try compiling the fuse kernel module from the source package, to see if it fixes the issue. Thanks, Miklos |
From: Harald R. <ha...@in...> - 2007-08-13 06:26:54
|
Hi Miklos, Great news! I think I nailed this one. As Andrew Morton noted opening at Linuxworld, Linux bugs do rarely reproduce... The debug output can be found here http://q.gotdns.com/shout2.bz2 That is a 3.7 MiB archive I logrotated everything, there does not seem to be any relevant output in /var/log. Test Setup =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The client running sshfs with four command windows and a single sshfs mount. The server has one 9.6 GiB file and five 2.1 GiB files on the sshfs mount. window 1: reading 9.6 GiB, then re-reading same file again foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ cp: reading `/x/data/vistau': Transport endpoint is not connected cp: closing `/x/data/vistau': Transport endpoint is not connected Window 2: sshd logging ... sshdebug script, single line: sshfs foxyboy@1.0.0.8:/home/foxyboy/Desktop /x/data -o allow_other,debug,sshfs_debug,loglevel=3Ddebug ... root@foxyboy-laptop:~# ./sshdebug 2>&1 | cat >shout2 foxyboy@1.0.0.8's password:=20 root@foxyboy-laptop:~#=20 (when copy failed, this window back to prompt) Windows 3: reading a sequence of 2.1 GiB files root@foxyboy-laptop:~# cp /x/data/Vista_EN_Ultimat* Desktop/ cp: reading `/x/data/Vista_EN_Ultimateab': Input/output error cp: cannot stat `/x/data/Vista_EN_Ultimateac': Input/output error cp: cannot stat `/x/data/Vista_EN_Ultimatead': Input/output error cp: cannot stat `/x/data/Vista_EN_Ultimateae': Input/output error root@foxyboy-laptop:~# window 4: writing a read 2.1 GiB file back to the server folder under a new name foxyboy@foxyboy-laptop:~$ cp Desktop/Vista_EN_Ultimateaa /x/data/cpcp cp: writing `/x/data/cpcp': Software caused connection abort cp: closing `/x/data/cpcp': Transport endpoint is not connected foxyboy@foxyboy-laptop:~$=20 The timeline as command given to the client: Window2 mounts sshfs to /x/data Phase a. window 1 reads 9.6 GiB, concluded successfully Phase b-1. window 1 repeats read of 9.6 GiB, in and outfiles the same as first time Phase b-2. window 3 reads sequence of 2.1 GiB files simultaneous with b-1, between same folders, fails when on second file Phase b-3. simultaneously with b-1 and b-2, window 4: when first 2.1 GiB file read, that file is written back in the other direction to the server under another name Event c. at this point, with concurrently two reads and one write, sshfs fails More data =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Problem 1: all machines (below), when sequentially reading or writing a file of 3.7 GiB or larger, an i/o error occurs Problem 2: On ubuntu 7.04, when the i/o error occurs, all sshfs mounts are lost and you may get "endpoint disconnected" Problem 3: On Ubuntu, if three parallel reads and writes are initiated, sshfs may hang. Any sshfs access (such as filename completion) from a shell then hangs that shell. exit: unmount -f of all sshfs mounts by root. Note: In all cases, it appears sshfs fully recovers once all sshfs mounts have been unmounted. Issues encountered on three operating systems: Configuration 1 (a client not used above): Mac Intel Core Duo, Darwin kernel 8.10.1, mount_sshfs 2.5.0, OS X 10.4.10 Configuration 2 (the server, a desktop): Pentium-M, 2.6.20-16-generic #2 SMP i686, sshfs 1.6, fuse 2.6.3 fusermount 2.6.3, kernel interface 7.8, ubuntu 7.04 Configuration 3 (the client, a laptop): Pentium-M, 2.6.20-16-generic #2 SMP i686, 1.6/2.6.3/2.6.3/7.8, ubuntu 7.04 Additional note (a client not used above): Pentium-4 HT, CentOS5, 2.6.18-8.1.8.el5 #1 SMP: sshfs 1.8/2.6.5/2.6.5/7.8 corrupts the kernel causing system crash (likely known issue) Off you go... Regards, Harald Rudell -----Original Message----- From: Miklos Szeredi [mailto:mi...@sz...]=20 Sent: Friday, August 03, 2007 1:30 To: Harald Rudell Cc: fus...@li... Subject: Re: [sshfs] mounted sshfs disconnects or hangs > I have noticed that while ssh/scp always works, sshfs can not handle > large filesets. >=20 > =20 >=20 > 1. Dragging 7 GiB of files using Nautilus 2.16.2 from an sshfs mount > onto the local hard drive hung a CentOS5 machine before 5 GiB completed. > No Ctrl-Alt-F1, Ctrl-Alt-Delete, and since I had removed reset power off > was required. My first ever hang on that machine. Which kernel version? Is it an SMP machine? We had a very rare SMP hang in fuse a long time ago. It was fixed in 2.6.19 and backported to 2.6.18.2 and 2.6.16.38. So unless it's a very old kernel, it's unlikely to be the reason for the hang. > >From Ubuntu 7.04, SSHFS version 1.6, FUSE library version: 2.6.3, > fusermount version: 2.6.3, using FUSE kernel interface version 7.8 The machine to which you connect with sshfs is basically irrelevant. > To CentOS5, This sounds old. Maybe this _is_ that SMP hang. > SSHFS version 1.8, FUSE library version: 2.6.5, fusermount > version: 2.6.5, using FUSE kernel interface version 7.8 >=20 > The other machine, Ubuntu7, got issues with its gnome panels, the system > monitor panel hung and could not subsequently be restarted. >=20 > =20 >=20 > 2. I have also noticed, where /dev/sda1 is a 30 GiB ntfs partition >=20 > ntfsclone -s -o ansshfsfilesystem /dev/sda1 >=20 > Always looses its sshfs mount connection and returns a write error, Can you enable debugging in sshfs (-odebug,sshfs_debug,loglevel=3Ddebug) and send the output? If you redirect to a file you need to redirect both stdout and stderr to the same file (this has been fixed in fuse-2.7.0). > while >=20 > ntfsclone -s -o - /dev/sda1 | gzip -c | ssh user@host 'cat > >file' >=20 > works flawlessly. That is ssh works, sshfs fails. This is between Ubuntu > 7.04 machines. >=20 > =20 >=20 > 3. My third observation is that copying a large fileset in a CentOS > command window (partition images, less than 10 files, like 20 GiB total) > from the Ubuntu machine to the CentOS machine above hangs CentOS > networking every time. The CentOS machine can no longer reach the > network and has to be revived with ifconfig eth0 down, ifconfig eth0 up, > and route add default gw... >=20 > =20 >=20 > So, large files using sshfs corrupts the kernel. Any suggestions for > fix/trouble shooting? If it is an old kernel, you could try compiling the fuse kernel module from the source package, to see if it fixes the issue. Thanks, Miklos |
From: Miklos S. <mi...@sz...> - 2007-08-13 08:32:10
|
> Great news! I think I nailed this one. As Andrew Morton noted opening at > Linuxworld, Linux bugs do rarely reproduce... > > The debug output can be found here http://q.gotdns.com/shout2.bz2 > That is a 3.7 MiB archive Thanks for the report. > Test Setup =============== > The client running sshfs with four command windows and a single sshfs > mount. The server has one 9.6 GiB file and five 2.1 GiB files on the > sshfs mount. > > > window 1: reading 9.6 GiB, then re-reading same file again > foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ > foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ > cp: reading `/x/data/vistau': Transport endpoint is not connected > cp: closing `/x/data/vistau': Transport endpoint is not connected > > > Window 2: sshd logging > ... > sshdebug script, single line: > sshfs foxyboy@1.0.0.8:/home/foxyboy/Desktop /x/data -o > allow_other,debug,sshfs_debug,loglevel=debug > ... > root@foxyboy-laptop:~# ./sshdebug 2>&1 | cat >shout2 > foxyboy@1.0.0.8's password: > root@foxyboy-laptop:~# > (when copy failed, this window back to prompt) > > > Windows 3: reading a sequence of 2.1 GiB files > root@foxyboy-laptop:~# cp /x/data/Vista_EN_Ultimat* Desktop/ > cp: reading `/x/data/Vista_EN_Ultimateab': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimateac': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimatead': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimateae': Input/output error > root@foxyboy-laptop:~# > > > window 4: writing a read 2.1 GiB file back to the server folder under a > new name > foxyboy@foxyboy-laptop:~$ cp Desktop/Vista_EN_Ultimateaa /x/data/cpcp > cp: writing `/x/data/cpcp': Software caused connection abort > cp: closing `/x/data/cpcp': Transport endpoint is not connected > foxyboy@foxyboy-laptop:~$ This looks like a known bug in the sftp-server, that's been worked around in sshfs-1.8. Here's the relevant changelog entry: * OpenSSH sftp-server can read requests faster, than it processes them, when it's buffer is full it aborts. This can happen on a large upload to a slow server. Work around this by limiting the total size of outstanding reqests. Debian bug #365541. Tracked down by Thue Janus Kristensen Can you please try repeating the test with sshfs-1.8? > The timeline as command given to the client: > Window2 mounts sshfs to /x/data > Phase a. window 1 reads 9.6 GiB, concluded successfully > Phase b-1. window 1 repeats read of 9.6 GiB, in and outfiles the same as > first time > Phase b-2. window 3 reads sequence of 2.1 GiB files simultaneous with > b-1, between same folders, fails when on second file > Phase b-3. simultaneously with b-1 and b-2, window 4: when first 2.1 GiB > file read, that file is written back in the other direction to the > server under another name > Event c. at this point, with concurrently two reads and one write, sshfs > fails > > > More data ========== > Problem 1: all machines (below), when sequentially reading or writing a > file of 3.7 GiB or larger, an i/o error occurs > > Problem 2: On ubuntu 7.04, when the i/o error occurs, all sshfs mounts > are lost and you may get "endpoint disconnected" Both of these may be due to the sftp-server problem. > > Problem 3: On Ubuntu, if three parallel reads and writes are initiated, > sshfs may hang. Any sshfs access (such as filename completion) from a > shell then hangs that shell. exit: unmount -f of all sshfs mounts by > root. This doesn't ring a bell. Is it reproducible with sshfs-1.8? > Note: In all cases, it appears sshfs fully recovers once all sshfs > mounts have been unmounted. That's good. It means, that there's no problem with the kernel part of fuse, only in userspace. > Additional note (a client not used above): Pentium-4 HT, CentOS5, > 2.6.18-8.1.8.el5 #1 SMP: sshfs 1.8/2.6.5/2.6.5/7.8 corrupts the kernel > causing system crash (likely known issue) Looks like it. Can you try compliling the fuse kernel module from the fuse-2.6.5.tar.gz package: tar xfz fuse-2.6.5.tar.gz cd fuse-2.6.5 ./configure --enable-kernel-module make as root: make install rmmod fuse modprobe fuse dmesg | tail The dmesg should give you something like: fuse init (API version 7.8) fuse distribution version: 2.6.5 Thanks, Miklos |
From: Harald R. <ha...@in...> - 2007-08-14 13:13:34
|
Alrighty, So there were three issues on Linux, ntfsclone, concurrent copying, and CentOS5 crashing. I came up with some nasty shell scripts that I could run on various machines for repro. Here are solutions: Ubuntu 7.04 comes with sshfs 1.6 that is deficient, while sshfs 1.7 in gutsy 7.10 works fine. Do this: $ sshfs -V SSHFS version 1.6 --now, that's bad 1. Browse to http://packages.ubuntu.com/gutsy/allpackages 2. Click around and download three packages 2a. Libc6 (filename libc6_2.6.1-0ubuntu1_i386.deb) 2b. Libglib2.0-0 (filename libglib2.0-0_2.13.7-1ubuntu4_i386.deb) 2c. Sshfs (filename sshfs_1.7-2.1_i386.deb) 3. Install all three with dpkg -i filename $ sshfs -V SSHFS version 1.7 -- problem gone For the crashing Red Hat, I compiled the kernel module as outlined by Miklos. Sshfs -V and modinfo fuse lists the EXACT same information before/after, but the problem seems to be gone. So either somebody is not compiling right, or there is a version change not reflected? (the mac is still broken, but I guess that's not you guys...) Thanks for all help, Harald Rudell -----Original Message----- From: Miklos Szeredi [mailto:mi...@sz...]=20 Sent: Monday, August 13, 2007 1:32 To: Harald Rudell Cc: fus...@li... Subject: Re: [sshfs] mounted sshfs disconnects or hangs > Great news! I think I nailed this one. As Andrew Morton noted opening at > Linuxworld, Linux bugs do rarely reproduce... >=20 > The debug output can be found here http://q.gotdns.com/shout2.bz2 > That is a 3.7 MiB archive Thanks for the report. > Test Setup =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > The client running sshfs with four command windows and a single sshfs > mount. The server has one 9.6 GiB file and five 2.1 GiB files on the > sshfs mount. >=20 >=20 > window 1: reading 9.6 GiB, then re-reading same file again > foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ > foxyboy@foxyboy-laptop:~$ cp /x/data/vistau Desktop/ > cp: reading `/x/data/vistau': Transport endpoint is not connected > cp: closing `/x/data/vistau': Transport endpoint is not connected >=20 >=20 > Window 2: sshd logging > ... > sshdebug script, single line: > sshfs foxyboy@1.0.0.8:/home/foxyboy/Desktop /x/data -o > allow_other,debug,sshfs_debug,loglevel=3Ddebug > ... > root@foxyboy-laptop:~# ./sshdebug 2>&1 | cat >shout2 > foxyboy@1.0.0.8's password:=20 > root@foxyboy-laptop:~#=20 > (when copy failed, this window back to prompt) >=20 >=20 > Windows 3: reading a sequence of 2.1 GiB files > root@foxyboy-laptop:~# cp /x/data/Vista_EN_Ultimat* Desktop/ > cp: reading `/x/data/Vista_EN_Ultimateab': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimateac': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimatead': Input/output error > cp: cannot stat `/x/data/Vista_EN_Ultimateae': Input/output error > root@foxyboy-laptop:~# >=20 >=20 > window 4: writing a read 2.1 GiB file back to the server folder under a > new name > foxyboy@foxyboy-laptop:~$ cp Desktop/Vista_EN_Ultimateaa /x/data/cpcp > cp: writing `/x/data/cpcp': Software caused connection abort > cp: closing `/x/data/cpcp': Transport endpoint is not connected > foxyboy@foxyboy-laptop:~$=20 This looks like a known bug in the sftp-server, that's been worked around in sshfs-1.8. Here's the relevant changelog entry: * OpenSSH sftp-server can read requests faster, than it processes them, when it's buffer is full it aborts. This can happen on a large upload to a slow server. Work around this by limiting the total size of outstanding reqests. Debian bug #365541. Tracked down by Thue Janus Kristensen Can you please try repeating the test with sshfs-1.8? > The timeline as command given to the client: > Window2 mounts sshfs to /x/data > Phase a. window 1 reads 9.6 GiB, concluded successfully > Phase b-1. window 1 repeats read of 9.6 GiB, in and outfiles the same as > first time > Phase b-2. window 3 reads sequence of 2.1 GiB files simultaneous with > b-1, between same folders, fails when on second file > Phase b-3. simultaneously with b-1 and b-2, window 4: when first 2.1 GiB > file read, that file is written back in the other direction to the > server under another name > Event c. at this point, with concurrently two reads and one write, sshfs > fails >=20 >=20 > More data =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Problem 1: all machines (below), when sequentially reading or writing a > file of 3.7 GiB or larger, an i/o error occurs >=20 > Problem 2: On ubuntu 7.04, when the i/o error occurs, all sshfs mounts > are lost and you may get "endpoint disconnected" Both of these may be due to the sftp-server problem.=20 >=20 > Problem 3: On Ubuntu, if three parallel reads and writes are initiated, > sshfs may hang. Any sshfs access (such as filename completion) from a > shell then hangs that shell. exit: unmount -f of all sshfs mounts by > root. This doesn't ring a bell. Is it reproducible with sshfs-1.8? > Note: In all cases, it appears sshfs fully recovers once all sshfs > mounts have been unmounted. That's good. It means, that there's no problem with the kernel part of fuse, only in userspace. > Additional note (a client not used above): Pentium-4 HT, CentOS5, > 2.6.18-8.1.8.el5 #1 SMP: sshfs 1.8/2.6.5/2.6.5/7.8 corrupts the kernel > causing system crash (likely known issue) Looks like it. Can you try compliling the fuse kernel module from the fuse-2.6.5.tar.gz package: tar xfz fuse-2.6.5.tar.gz cd fuse-2.6.5 ./configure --enable-kernel-module make as root: make install rmmod fuse modprobe fuse dmesg | tail The dmesg should give you something like: fuse init (API version 7.8) fuse distribution version: 2.6.5 Thanks, Miklos |
From: Miklos S. <mi...@sz...> - 2007-08-27 13:18:21
|
> So there were three issues on Linux, ntfsclone, concurrent copying, and > CentOS5 crashing. I came up with some nasty shell scripts that I could > run on various machines for repro. Here are solutions: > > Ubuntu 7.04 comes with sshfs 1.6 that is deficient, while sshfs 1.7 in > gutsy 7.10 works fine. Do this: > $ sshfs -V > SSHFS version 1.6 > --now, that's bad > 1. Browse to http://packages.ubuntu.com/gutsy/allpackages > 2. Click around and download three packages > 2a. Libc6 (filename libc6_2.6.1-0ubuntu1_i386.deb) > 2b. Libglib2.0-0 (filename libglib2.0-0_2.13.7-1ubuntu4_i386.deb) > 2c. Sshfs (filename sshfs_1.7-2.1_i386.deb) > 3. Install all three with dpkg -i filename > $ sshfs -V > SSHFS version 1.7 > -- problem gone > > For the crashing Red Hat, I compiled the kernel module as outlined by > Miklos. Sshfs -V and modinfo fuse lists the EXACT same information > before/after, but the problem seems to be gone. So either somebody is > not compiling right, or there is a version change not reflected? The version change is only reflected in 'dmesg | grep fuse' output. But the fact that using the module from the source package cured it for you makes it very likely that it was indeed the suspected problem. Thanks for confirming that everything works OK. Miklos |