Menu

#37 Looses connection after long transfers

open
nobody
None
5
2014-09-07
2007-12-12
phlampe
No

Hi !

I'm using wput to transfer backup files over the Internet from one site to another geographically different. I have a small script that launches wput with the following command line:

%REP_WPUT%\wput --basename=\UnBackedUp\SauvegardesB\ %LOG_OPTIONS% %REP_FICHIERS%\ ftp://%FTP_USER%:%FTP_PASS%@%FTP_ADR%:%FTP_PORT%/Backups/

(LOG_OPTIONS is set at -v)

Version info: wput 0.6-w32 , running on Windows 2003 SBS server SP2

My setup is secure in the sense that the distant ftp server listens on a non standard port, requires authentication and uses PASV and SSL/TLS.

As long as the files are small (smaller than a few Megs), everything is fine and the file comparison part works well. It's only when the files are bigger (I don't know what the exact limit is) that something goes wrong: wput tranfers the big file (several hundreds megs, max 601 Mb) and then is unable to resume the list of files to transfer: it stops with a timeout error. Here's the log file, I'm leaving a file that was tested and not transfered, then the big one, and the error :

------------------------------------------------------

--09:56:10-- `\UnBackedUp\SauvegardesB\Retrospect/Iris B/1-Iris B\AA000005.rdb'
=> ftp://uuu:xxxxx@12.34.56.78:3333/Backups/Retrospect/Iris B/1-Iris B/
AA000005.rdb
==> SIZE AA000005.rdb ... done (629153792 bytes)
Skipping this file due to resume/upload/skip rules.
-- Skipping file: \UnBackedUp\SauvegardesB\Retrospect/Iris B/1-Iris B\AA000005.rdb
--09:56:10-- `\UnBackedUp\SauvegardesB\Retrospect/Iris B/1-Iris B\AA000006.rdb'
=> ftp://uuu:xxxxx@12.34.56.78:3333/Backups/Retrospect/Iris B/1-Iris B/
AA000006.rdb
==> SIZE AA000006.rdb ... done (254795776 bytes)
==> TYPE I ... done.
Setting data protection level to private ... done.
==> PASV ... done.
==> REST 254795776 ... done.
==> STOR AA000006.rdb ... done.
Length: 628,011,008 [373,215,232 to go]
100%[++++++++++++++====================] 628,011,008 44.73K/s
Error: recv() timed out. No data received
Receive-Warning: read() timed out. Read '' so far.
11:51:58 (AA000006.rdb) - `52.47K/s' [628011008]

Waiting 10 seconds... Error: recv() timed out. No data received
Receive-Warning: read() timed out. Read '' so far.
--11:52:18-- `\UnBackedUp\SauvegardesB\Retrospect/Iris B/1-Iris B\AA000007.rdb'
=> ftp://uuu:xxxxx@12.34.56.78:3333/Backups/Retrospect/Iris B/1-Iris B/
AA000007.rdb

------------------------------------------------------

And I get an error log in the Windows event manager:

Type de l'événement : Informations
Source de l'événement : Application Error
Catégorie de l'événement : (100)
ID de l'événement : 1004
Date : 29/10/2007
Heure : 18:28:46
Utilisateur : N/A
Ordinateur : XXXXXX
Description :
Erreur de file d'attente de rapport : application défaillante wput.exe, version 0.0.0.0, module défaillant wput.exe, version 0.0.0.0, adresse de défaillance 0x0000edba.

Pour plus d'informations, consultez le centre Aide et support à l'adresse http://go.microsoft.com/fwlink/events.asp.
Données :
0000: 41 70 70 6c 69 63 61 74 Applicat
0008: 69 6f 6e 20 46 61 69 6c ion Fail
0010: 75 72 65 20 20 77 70 75 ure wpu
0018: 74 2e 65 78 65 20 30 2e t.exe 0.
0020: 30 2e 30 2e 30 20 69 6e 0.0.0 in
0028: 20 77 70 75 74 2e 65 78 wput.ex
0030: 65 20 30 2e 30 2e 30 2e e 0.0.0.
0038: 30 20 61 74 20 6f 66 66 0 at off
0040: 73 65 74 20 30 30 30 30 set 0000
0048: 65 64 62 61 edba

It worked fine when I was testing everything on my local network. So I don't know if the problem comes from a timeout somewhere, or a setup problem between the local and the distant servers (there are several firewall and routers to go through, so maybe wput needs some kind of port to keep the main connection alive during the transfer).

I haven't made any special changes to wputrc (except tried changing the timeout and wait_retry parameters, but no success.

I have reverted to sending manually my files with Filezilla, which works well in the same conditions, since it reconnects completely whenever the connection drops, but I haven't been able to fully automate it with FZ.

I hope you can help me, since wput is really great and I'd like to use it daily :)
Paul-Henri

Discussion

  • Rumpeltux

    Rumpeltux - 2007-12-12

    Logged In: YES
    user_id=989758
    Originator: NO

    I have no idea what that event-log thing is about. Can you try to run the current version? Can you give debug-output? Does wput crash or just wait forever?
    Looks for me as if the transfer gets completed.

     
  • phlampe

    phlampe - 2007-12-12

    Logged In: YES
    user_id=1892147
    Originator: YES

    Wput crashes (that's because of the event log I get afterwards), but silently during the script's execution. I don't get any special error: only the timeout in recv().

    The (big) file currently transfered is transfered correctly (file AA000006.rdb in my log). The problem I have is that the next files in the list (beginning with AA000007.rdb, and I have several hundreds of them to test) isn't transfered at all because wput has exited.

    I'll try the latest version, and I'll report back.
    Paul-Henri

     
  • Rumpeltux

    Rumpeltux - 2007-12-12

    Logged In: YES
    user_id=989758
    Originator: NO

    If it crashes, I need a stack-backtrace (to know exactly where the crash happend). If you have access to a C development environment, you could run/debug wput inside there to see where the issue is located.

     
  • phlampe

    phlampe - 2007-12-12

    Logged In: YES
    user_id=1892147
    Originator: YES

    Hi again !
    I am currently downloading a debugger for Windows and its symbol files from Microsoft. That's because I haven't got a development environment. I'll have to use wput 0.60 since I cannot compile version 0.6.1 for W32.
    I'll keep you posted, and hope I can send you a dump file.
    Paul-Henri

     
  • phlampe

    phlampe - 2007-12-12

    Logged In: YES
    user_id=1892147
    Originator: YES

    Well, I haven't got much (no stack trace) since I ran the debbuger without source files because I don't have a dev environment on my server (unless you know a way to get more information with Windbg: I got it from here http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx .

    Here's what Windbg got when wput crashed:

    Microsoft (R) Windows Debugger Version 6.8.0004.0 X86
    Copyright (c) Microsoft Corporation. All rights reserved.

    *** wait with pending attach
    Symbol search path is: C:\WINDOWS\Symbols
    Executable search path is:
    ModLoad: 00400000 0041e000 D:\Tools\wput\wput.exe
    ModLoad: 7c920000 7c9e6000 C:\WINDOWS\system32\ntdll.dll
    ModLoad: 7c800000 7c912000 C:\WINDOWS\system32\kernel32.dll
    ModLoad: 71a80000 71a8a000 C:\WINDOWS\system32\WSOCK32.dll
    ModLoad: 71ad0000 71ae7000 C:\WINDOWS\system32\WS2_32.dll
    ModLoad: 77b70000 77bca000 C:\WINDOWS\system32\msvcrt.dll
    ModLoad: 71ac0000 71ac8000 C:\WINDOWS\system32\WS2HELP.dll
    ModLoad: 77d70000 77e1d000 C:\WINDOWS\system32\ADVAPI32.dll
    ModLoad: 77c20000 77cbf000 C:\WINDOWS\system32\RPCRT4.dll
    ModLoad: 76f00000 76f13000 C:\WINDOWS\system32\Secur32.dll
    ModLoad: 6b080000 6b0ab000 D:\Tools\wput\SSLEAY32.DLL
    ModLoad: 61d80000 61e2d000 D:\Tools\wput\libeay32.dll
    ModLoad: 77bd0000 77c18000 C:\WINDOWS\system32\GDI32.dll
    ModLoad: 77f30000 77fc1000 C:\WINDOWS\system32\USER32.dll
    ModLoad: 719f0000 71a32000 C:\WINDOWS\system32\mswsock.dll
    ModLoad: 5d3d0000 5d42b000 C:\WINDOWS\system32\hnetcfg.dll
    ModLoad: 719b0000 719b8000 C:\WINDOWS\System32\wshtcpip.dll
    ModLoad: 68000000 68035000 C:\WINDOWS\system32\rsaenh.dll
    ModLoad: 76b20000 76b2b000 C:\WINDOWS\system32\PSAPI.DLL
    (1100.1634): Access violation - code c0000005 (!!! second chance !!!)
    eax=00380798 ebx=00000040 ecx=00443e9c edx=00000010 esi=00000004 edi=00000003
    eip=0040ee85 esp=0012fd70 ebp=0012fd90 iopl=0 nv up ei ng nz ac pe cy
    cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010297
    *** WARNING: Unable to verify checksum for D:\Tools\wput\wput.exe
    *** ERROR: Module load completed but symbols could not be loaded for D:\Tools\wput\wput.exe
    wput+0xee85:
    0040ee85 894a08 mov dword ptr [edx+8],ecx ds:0023:00000018=????????
    0:000> q
    quit:

    Paul-Henri

     
  • phlampe

    phlampe - 2007-12-13

    Logged In: YES
    user_id=1892147
    Originator: YES

    Hi again !

    Here's a log file from wput with debug level on. It's quite big (32Mb) when decompressed. There are thousands of lines in it like this : ".Checking whether 1872 is writable... 1 (0:No error)" : 1 means it's in error or is it good ? Is 1872 a port number or a physical or network file descriptor id ?

    I only changed my connection information in the log file (user, pass, ip and port). I hope it helps.
    Can you compile a binary version of wput 0.6.1 for Windows ?

    My current explanation for what goes wrong, is that wput uses a first session to connect to the ftp server, negociates and checks for the files to transfer over that session, then initiates the transfer in PASV mode in another session, but since that transfer is very long (more than an hour), looses the control session it needs when the transfer is finished. Thus the recv() is empty error after the end of the transfer.
    What makes me think so is observing Filezilla in the same situation: FZ keeps sending I, A or PWD commands over the control session to keep it alive at all times, even when it is transfering one of my big files in another session.

    Thanks for your help !
    Paul-Henri
    File Added: wput_debug.zip

     
  • phlampe

    phlampe - 2007-12-13

    wput debug output (32Mb uncompressed)

     
  • Rumpeltux

    Rumpeltux - 2008-02-27
    • status: open --> pending
     
  • Rumpeltux

    Rumpeltux - 2008-02-27

    Logged In: YES
    user_id=989758
    Originator: NO

    Sorry for the long-time no-response, didn't get a notification on an update here.

    The Checking whether 1872 is writable... 1 (0:No error) is just fine, 1872 is the socket-file-descriptor number.
    However I can't compile a windows version, don't have a running windows, neither have the software to do so right now.

    If someone could reproduce this problem maybe with an easier setup (only a couple of files and not thousands) on a linux machine, this could be the key to tracking it down.
    If wput crashes in the end, it should be compiled with memory-debugging enabled to find the screwing buffer. Otherwise I don't see any point in finding out what's wrong here.

     
  • phlampe

    phlampe - 2008-02-27
    • status: pending --> open
     
  • phlampe

    phlampe - 2008-02-27

    Logged In: YES
    user_id=1892147
    Originator: YES

    Hi !
    Nice to hear from you :)

    I kept using wput since december, and still get the crashes on every big files. I had to write a workaround in my script to restart the script after each crash, so that's why I could live with it (even if getting after-crash dialogs at each login for *each* file was getting an annoyance).

    Lately I was able to set up rsync on my NAS and rewrote my script using rsync instead of wput. I should finish my tests at the end of this week, and will switch over to rsync (it has several advantages over ftp, mainly complete synchronisation including deletes, file transfer optimisation, respecting file creation date and time on the destination server, and no crashes).

    And I still can't recompile the last version for Windows.

    So I'll leave it there, hoping that someone can recompile version 0.6.1 for Windows. I'll be happy to test it in my environment in order to root out this bug.

    Paul-Henri

     
  • Lori Sulik

    Lori Sulik - 2009-01-21

    I suspect this is related to bugs 2527080 (using free ftp connection causes sigsegv) and 2524759 (unitialized memory reads causes sigsegv's) that I recently reported (with code fixes to fix the issue).

     

Log in to post a comment.

MongoDB Logo MongoDB