Thread: [JSch-users] Enormous traffic when uploading file to non-empty folder
Status: Alpha
Brought to you by:
ymnk
From: Milano N. <dra...@gm...> - 2016-08-23 10:24:57
|
Hi, I'm currently experiencing an issue with my company's file transfering osgi bundle using the jsch library. Normally it should only upload some log files archive files once per few minutes. But once there were more files present in the target folder, the data consumption on my device also grew proportionally. E.g. if there are only 40 files in the upload folder, everything seems to be OK. So nobody noticed any problem. But once there are 7000 files or so, the *incomming *data consumption gets really high. The log files can have 20-35 MB per day, but my download takes even about 2,5 GB data per day. With no other networking application active. Of course there may be an error in my code (which I'm trying to discover right now), but I wonder if anyone experienced similar problem. Thanks for any tips. |
From: Milano N. <dra...@gm...> - 2016-08-24 07:34:41
|
After further investigation I tested two scenarios of uploading a file to the SFTP server. In scenario one I only had a handful of files in the upload directory, in the second scenario there were 6400 tiny files more. *Scenario 1:* lstat takes about 1 sec to execute. Tcpdump shows 907020 bytes traffic for a 800 KB file... 2016-08-23 13:19:28,655 [TRACE] SFTP Operation - Get file attributes (lstat): /home/milano/upload_test/Clean_code.pptx 2016-08-23 13:19:30,844 [TRACE] SFTP Operation - Getting file attributes finished. 2016-08-23 13:19:30,877 [TRACE] SFTP Operation - Get file attributes (lstat): /home/milano/upload_test/Clean_code.pptx 2016-08-23 13:19:32,044 [TRACE] SFTP Operation - Getting file attributes finished. *Scenario 2:* lstat takes about 110 sec to execute. Tcpdump shows 4130593 bytes traffic for a 800 KB file... 2016-08-23 14:05:10,222 [TRACE] SFTP Operation - Get file attributes (lstat): /home/milano/upload_test/Clean_code.pptx 2016-08-23 14:06:58,868 [TRACE] SFTP Operation - Getting file attributes finished. 2016-08-23 14:06:59,336 [TRACE] SFTP Operation - Get file attributes (lstat): /home/milano/upload_test/Clean_code.pptx 2016-08-23 14:08:45,189 [TRACE] SFTP Operation - Getting file attributes finished. So the upload took about the same time in the end, but it was the lstat command using a lot of data. Now I have to find out how to avoid such behaviour... |
From: Milano N. <dra...@gm...> - 2016-08-24 09:57:12
|
Another symptom of the issue is that listing the directory content via sftp.ls(sftpAbsolutePath) call takes approx. 100 seconds, tcpdump says it transfers about 1555 KB server -> client and 58KB client -> server. If I connect to the server via SSH (using putty), run the ll command in console and redirect the command output to a textfile, the resulting file has 768 KB instantly. I understand that there is allways difference in the computing time, part of it depends on the fact computing is done on server/client side. (Even though the difference is huge). But where is the additional data traffic coming from? Approximately twice as much data is transfered. Any ideas? P.S. I wouldn't expect any issues with encoding as both machines run some unix derivate system. But who knows. Is there a possibility of such problem? |
From: Lothar K. <jo...@ki...> - 2016-08-24 10:33:23
|
Am 24.08.2016 um 11:57 schrieb Milano Nicolum: > Another symptom of the issue is that listing the directory content via sftp.ls > sftp.ls(sftpAbsolutePath) call takes approx. 100 seconds, tcpdump says it > transfers about 1555 KB server -> client and 58KB client -> server. > > If I connect to the server via SSH (using putty), run the ll command in console > and redirect the command output to a textfile, the resulting file has 768 KB > instantly. SFTP is a different thing than SSH-shell, so it's not easy to compare these. what was the exact command for listing the files in the shell? Since you ran tcpdump, where exactly were the 100 seconds happening? While waiting for data from the server or is it a slower transfer of bytes? > I understand that there is allways difference in the computing time, part of > it depends on the fact computing is done on server/client side. (Even though > the difference is huge). But where is the additional data traffic coming from? A reason might be that you compare apples to pears. In case you entered ls subdir in the shell and you are listing the directory with /home/user/subdir in SFTP, the SFTP server will most likely return the filenames including their paths while the shell only returns the names. This would easily explain the increased amount of data. Also there is additional data per file containing the attributes (see next paragraph). With short filenames this easily becomes the majority of tranfered data. In case the time got lost while waiting for the server the reason might be that the file-listing in SFTP provides more information than the resulting file listing in a shell. For every file the FileAttributes are retrieved from the file system containing things like last access time, "extended attributes", etc. Some of them are part of the directory-entry itself that can be accessed quickly. Others are not that easy to retrieve (or at least need an extra file-system-access per file) and might be the reason why your listing takes exponentially longer for longer files. This is not a SFTP-specific problem, you can run into the same problem when using the file-command with search-criterias for last access time, etc. Cheers, Lothar |
From: Milano N. <dra...@gm...> - 2016-08-25 05:53:00
|
OK, now I feel embarassed and stupid for not noticing earlier that in my code there is hidden the *ChannelSftp.ls* call. No idea why. It's been there since beginning of the app. It is called once when new file abstraction is requested and once for any other operation (e.g. requesting an OutputStream from the abstraction). *Let's do the math now:* If you have about 6400 files in the upload directory, it means about 1,5 MB of data to download if you want to list it. And it can take up to two minutes (120 seconds) to download such data depending on your connection. So if you want to *upload* one *4KB* file to the SFTP using streams, you are going to *download 3MB* of useless data first, *wait* up to about *four minutes* and then your file is downloaded. I'm not even speaking about the fact the operation is going to fail if your connection is slow since you're not going to simply download so much data on a bad connection. Thanks to Lothar for pointing me to the right direction! |