Menu

Problems over high packet loss WAN

Markw
2018-05-16
2018-05-30
  • Markw

    Markw - 2018-05-16

    I've been working with the app for a few days now and reading the posts here to try and work through these issues. I was having major problems at first due to packet loss on our WAN, after adjusting the RTT, Buffer, and Announce params it seemed to help a lot, until now.

    I have 2 issues that may or may not be related...

    Last night I thought I had everything working well. I had 2 uftp servers (on the same host) sending to 1 client-daemon (remote host), both were happening fast and running well when I went to bed last night. When I got up this morning and checked it, the large file copy I had been waiting on (160 GB ~7 hours with -R 48000) had timed out right at the end. I increased the buffer and re-ran this copy on it's own, with no other uftp processes and '-R 140000' to speed it up. Same result. After about 2.5 hours, right at what should have been the end of the copy, it timed out. The file appeared to be complete in a directory listing, but there were still Sections/NAK's being transfered and ultimately timed out / aborted.

    Second issue... Not only had the large copy timed out this morning, but the other uftp process I had copying smaller files in bulk had come to a complete crawl, almost no output from the process and only about 1-2mbps on the remote end, despite being in the middle of copying a 78 MB file. I killed the process and started it again, and was back to full speed. It is worth noting the Server side command passed to uftp for this copy included a list of probably 270 folders to copy (each with about 700 files within).

    Here are my server side commands:

    uftp.exe -R 140000 -M "server" -s 50 -r 1:0.2:100 -B 20097152 -m 5 -D \\remoteplace\stuff\morespecificstuff -z -X "C:\temp\exclude.txt" \\sourceplace\stuff\morespecificstuff\folders
    

    Client Daemon running this:

    uftpd.exe -D \\remoteplace\stuff  -L .\log.txt -x 2 -B 20097152
    

    Here is an example from the end of the Client Daemon side log when the large 160 GB file timed out.

    2018/05/16 17:23:06.247829: [92DC306E/00:0004]: Sent 33 NAKs for section 11545
    2018/05/16 17:23:06.247829: [92DC306E/00:0004]: Sent 703 NAKs for section 11558
    2018/05/16 17:23:07.529080: [92DC306E/00:0004]: Sent 2 NAKs for section 11584
    2018/05/16 17:23:07.529080: [92DC306E/00:0004]: Sent 1034 NAKs for section 11640
    2018/05/16 17:23:09.607211: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:23:10.232213: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:23:16.700974: [92DC306E/00:0004]: Sent 3 NAKs for section 669
    2018/05/16 17:23:23.997865: [92DC306E/00:0004]: Sent 17 NAKs for section 1131
    2018/05/16 17:23:25.935370: [92DC306E/00:0004]: Sent 31 NAKs for section 1444
    2018/05/16 17:23:25.935370: [92DC306E/00:0004]: Sent 54 NAKs for section 1465
    2018/05/16 17:23:27.654124: [92DC306E/00:0004]: Sent 225 NAKs for section 1829
    2018/05/16 17:23:34.544763: [92DC306E/00:0004]: Sent 864 NAKs for section 2394
    2018/05/16 17:23:39.857273: [92DC306E/00:0004]: Sent 210 NAKs for section 2888
    2018/05/16 17:23:44.904158: [92DC306E/00:0004]: Sent 368 NAKs for section 3364
    2018/05/16 17:23:48.622919: [92DC306E/00:0004]: Sent 167 NAKs for section 3724
    2018/05/16 17:23:48.622919: [92DC306E/00:0004]: Sent 133 NAKs for section 3725
    2018/05/16 17:23:49.263543: [92DC306E/00:0004]: Sent 191 NAKs for section 3770
    2018/05/16 17:23:51.326046: [92DC306E/00:0004]: Sent 1 NAKs for section 3837
    2018/05/16 17:23:57.201060: [92DC306E/00:0004]: Sent 377 NAKs for section 4404
    2018/05/16 17:24:00.544816: [92DC306E/00:0004]: Sent 412 NAKs for section 4744
    2018/05/16 17:24:01.544819: [92DC306E/00:0004]: Sent 113 NAKs for section 5001
    2018/05/16 17:24:03.701072: [92DC306E/00:0004]: Sent 112 NAKs for section 5217
    2018/05/16 17:24:04.591698: [92DC306E/00:0004]: Sent 401 NAKs for section 5239
    2018/05/16 17:24:07.060454: [92DC306E/00:0004]: Sent 125 NAKs for section 5567
    2018/05/16 17:24:13.466717: [92DC306E/00:0004]: Sent 232 NAKs for section 6111
    2018/05/16 17:24:14.372971: [92DC306E/00:0004]: Sent 88 NAKs for section 6291
    2018/05/16 17:24:15.451096: [92DC306E/00:0004]: Sent 24 NAKs for section 6324
    2018/05/16 17:24:15.451096: [92DC306E/00:0004]: Sent 36 NAKs for section 6402
    2018/05/16 17:24:15.451096: [92DC306E/00:0004]: Sent 129 NAKs for section 6406
    2018/05/16 17:24:19.388605: [92DC306E/00:0004]: Sent 372 NAKs for section 6824
    2018/05/16 17:24:19.982357: [92DC306E/00:0004]: Sent 139 NAKs for section 6870
    2018/05/16 17:24:21.216733: [92DC306E/00:0004]: Sent 622 NAKs for section 6980
    2018/05/16 17:24:21.216733: [92DC306E/00:0004]: Sent 680 NAKs for section 6985
    2018/05/16 17:24:23.966740: [92DC306E/00:0004]: Sent 550 NAKs for section 7145
    2018/05/16 17:24:24.591739: [92DC306E/00:0004]: Sent 977 NAKs for section 7280
    2018/05/16 17:24:34.966762: [92DC306E/00:0004]: Sent 44 NAKs for section 8232
    2018/05/16 17:24:35.904265: [92DC306E/00:0004]: Sent 180 NAKs for section 8394
    2018/05/16 17:24:35.904265: [92DC306E/00:0004]: Sent 434 NAKs for section 8395
    2018/05/16 17:24:39.732396: [92DC306E/00:0004]: Sent 162 NAKs for section 8674
    2018/05/16 17:24:45.388659: [92DC306E/00:0004]: Sent 325 NAKs for section 9080
    2018/05/16 17:24:54.091803: [92DC306E/00:0004]: Sent 58 NAKs for section 9894
    2018/05/16 17:24:54.091803: [92DC306E/00:0004]: Sent 138 NAKs for section 9895
    2018/05/16 17:24:54.091803: [92DC306E/00:0004]: Sent 1022 NAKs for section 9900
    2018/05/16 17:24:54.091803: [92DC306E/00:0004]: Sent 207 NAKs for section 9901
    2018/05/16 17:24:54.091803: [92DC306E/00:0004]: Sent 309 NAKs for section 9927
    2018/05/16 17:24:54.716803: [92DC306E/00:0004]: Sent 1109 NAKs for section 10151
    2018/05/16 17:24:59.998064: [92DC306E/00:0004]: Sent 77 NAKs for section 10569
    2018/05/16 17:25:00.998066: [92DC306E/00:0004]: Sent 467 NAKs for section 10728
    2018/05/16 17:25:04.576201: [92DC306E/00:0004]: Sent 97 NAKs for section 10942
    2018/05/16 17:25:04.576201: [92DC306E/00:0004]: Sent 207 NAKs for section 10943
    2018/05/16 17:25:04.576201: [92DC306E/00:0004]: Sent 136 NAKs for section 11064
    2018/05/16 17:25:10.060585: [92DC306E/00:0004]: Sent 294 NAKs for section 11480
    2018/05/16 17:25:12.826216: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:25:13.044967: [92DC306E/00:0004]: Sent 424 NAKs for section 11833
    2018/05/16 17:25:13.419970: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:25:26.794994: [92DC306E/00:0004]: Sent 1 NAKs for section 1131
    2018/05/16 17:25:47.373164: [92DC306E/00:0004]: Sent 157 NAKs for section 3769
    2018/05/16 17:25:47.982538: [92DC306E/00:0004]: Sent 659 NAKs for section 3774
    2018/05/16 17:26:07.998204: [92DC306E/00:0004]: Sent 43 NAKs for section 5996
    2018/05/16 17:26:09.607582: [92DC306E/00:0004]: Sent 73 NAKs for section 6286
    2018/05/16 17:26:14.388843: [92DC306E/00:0004]: Sent 38 NAKs for section 6820
    2018/05/16 17:26:14.732593: [92DC306E/00:0004]: Sent 243 NAKs for section 6824
    2018/05/16 17:26:27.185744: [92DC306E/00:0004]: Sent 740 NAKs for section 7572
    2018/05/16 17:26:34.717011: [92DC306E/00:0004]: Sent 757 NAKs for section 9078
    2018/05/16 17:26:44.701407: [92DC306E/00:0004]: Sent 445 NAKs for section 10148
    2018/05/16 17:26:59.467061: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:26:59.998313: [92DC306E/00:0004]: Got DONE message for section 11833
    2018/05/16 17:27:20.560855: [92DC306E/00:0004]: Transfer timed out
    2018/05/16 17:27:20.623355: [92DC306E/00:0004]: starting file close
    2018/05/16 17:27:20.638983: [92DC306E/00:0004]: done file close
    

    And of the Server side

    Sending section 11425
    Sending section 11480
    Sending section 11539
    Sending section 11545
    Sending section 11558
    Got 294 NAKs for section 11480 from client 0x0ACA0A3F
    Sending section 11584
    Sending section 11640
    Sending DONE 1.1
    Got 424 NAKs for section 11833 from client 0x0ACA0A3F
    Sending DONE 2.1
    Starting pass 7
    Sending section 669
    Sending section 1131
    Sending section 1444
    Sending section 1465
    Got 1 NAKs for section 1131 from client 0x0ACA0A3F
    Sending section 1829
    Sending section 2394
    Sending section 2888
    Sending section 3364
    Sending section 3724
    Sending section 3725
    Sending section 3770
    Got 157 NAKs for section 3769 from client 0x0ACA0A3F
    Sending section 3837
    Got 659 NAKs for section 3774 from client 0x0ACA0A3F
    Sending section 4404
    Sending section 4744
    Sending section 5001
    Sending section 5217
    Sending section 5239
    Sending section 5567
    Sending section 6111
    Got 43 NAKs for section 5996 from client 0x0ACA0A3F
    Sending section 6291
    Got 73 NAKs for section 6286 from client 0x0ACA0A3F
    Sending section 6324
    Sending section 6402
    Sending section 6406
    Sending section 6824
    Got 38 NAKs for section 6820 from client 0x0ACA0A3F
    Sending section 6870
    Got 243 NAKs for section 6824 from client 0x0ACA0A3F
    Sending section 6980
    Sending section 6985
    Sending section 7145
    Sending section 7280
    Sending section 8232
    Got 740 NAKs for section 7572 from client 0x0ACA0A3F
    Sending section 8394
    Sending section 8395
    Sending section 8674
    Sending section 9080
    Got 757 NAKs for section 9078 from client 0x0ACA0A3F
    Sending section 9894
    Sending section 9895
    Sending section 9900
    Sending section 9901
    Sending section 9927
    Sending section 10151
    Got 445 NAKs for section 10148 from client 0x0ACA0A3F
    Sending section 10569
    Sending section 10728
    Sending section 10942
    Sending section 10943
    Sending section 11064
    Sending section 11480
    Sending section 11833
    Sending DONE 1.1
    Sending DONE 2.1
    Starting pass 8
    Sending section 1131
    Transfer aborted by 0x0ACA0A3F: Transfer timed out
    Transfer status:
    Host: 0x0ACA0A3F       Status: Aborted
    Total elapsed time: 0.000 seconds
    Overall throughput: 0.00 KB/s
    uftp: Finishing at Wed May 16 17:27:21 2018
    

    Thank you!
    Mark

     

    Last edit: Markw 2018-05-16
  • Markw

    Markw - 2018-05-17

    I tried again over night, with max buffer and cache settings on both sides, almost limited to 50mbps... same problem :/ It gets through the whole file, then aborts at some point during replay of NAK's I believe

     
  • Dennis Bush

    Dennis Bush - 2018-05-17

    You may want to try using restart mode. You'll need to pass -f to the server to enable restart mode and also pass -T to the client to use a temp directory. Then if the session failed you can restart it by using the -F option and passing it the name of the server's restart file.

    This means you can't use a absolute pathname for the server's -D option, however since you only have one destination directory on the client you can remove the leading path elements up to the name of the destionation. So instead of -D \\remoteplace\stuff\morespecificstuff you would just pass -D morespecificstuff.

     
  • Markw

    Markw - 2018-05-24

    Still not finishing in success. I've run the restart a few times, and each time the result is similar:

    PS C:\uftp> .\uftp.exe -R 150000 -M Remoteclient -D FolderName -s 50 -r 1:0.2:100 -B 200971

    UFTP version 4.9.6 Copyright (C) 2001-2018 Dennis A. Bush
    Starting at Thu May 24 14:37:46 2018
    Transfer rate: 150000 Kbps (18750 KB/s)
    Wait between packets: 70 us
    Using private multicast address 1.2.2.1 Group ID: 9F7BC5A0
    Initializing group
    Sending ANNOUNCE 1.1
    Received REGISTER from client 0x0ACA140A
    Late registers:
    Sending REG_CONF 2.1
    ----- FileName.bak -----
    File ID: 0001 Name: Filename.bak
    sending as: FolderName
    Bytes: 160116334592 Blocks: 123166412 Sections: 11843
    Sending FILEINFO 1.1
    Received FILEINFO_ACK from client 0x0ACA140A
    Sending file
    Starting pass 1
    Sending DONE 1.1
    Got 30 NAKs for section 1320 from client 0x0ACA140A
    Got 6 NAKs for section 1380 from client 0x0ACA140A
    Got 3 NAKs for section 1375 from client 0x0ACA140A
    Got 24 NAKs for section 1732 from client 0x0ACA140A
    Got 177 NAKs for section 1753 from client 0x0ACA140A
    Got 129 NAKs for section 1767 from client 0x0ACA140A
    Got 38 NAKs for section 1925 from client 0x0ACA140A
    Got 164 NAKs for section 2049 from client 0x0ACA140A
    Got 62 NAKs for section 2098 from client 0x0ACA140A
    <repeat similar="" messages="" for="" another="" ~800="" lines="">
    Got 124 NAKs for section 11726 from client 0x0ACA140A
    Got 35 NAKs for section 11747 from client 0x0ACA140A
    Got 1 NAKs for section 11803 from client 0x0ACA140A
    Got 14 NAKs for section 11809 from client 0x0ACA140A
    Sending DONE 2.1
    Starting pass 2
    Transfer aborted by 0x0ACA140A: Transfer timed out
    Transfer status:
    Host: 0x0ACA140A Status: Aborted
    Total elapsed time: 0.000 seconds
    Overall throughput: 0.00 KB/s
    uftp: Finishing at Thu May 24 14:37:58 2018

    Client side:

    <snip>
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 76 NAKs for section 11508
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 62 NAKs for section 11522
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 41 NAKs for section 11680
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 66 NAKs for section 11687
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 168 NAKs for section 11699
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 8 NAKs for section 11719
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 124 NAKs for section 11726
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 35 NAKs for section 11747
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 1 NAKs for section 11803
    2018/05/24 14:36:23.590178: [9F7BC5A0/03:0001]: Sent 14 NAKs for section 11809
    2018/05/24 14:36:23.777679: [9F7BC5A0/03:0001]: Got DONE message for section 11842
    2018/05/24 14:36:25.230812: [9F7BC5A0/03:0001]: Transfer timed out
    2018/05/24 14:36:25.230812: [9F7BC5A0/03:0001]: starting file close
    2018/05/24 14:36:25.230812: [9F7BC5A0/03:0001]: done file close
    2018/05/24 14:36:25.230812: [9F7BC5A0/03:0001]: Writing restart file

    I am totally willing to admit that there's just way to much packet loss, but the initial copy runs at a solid 150mbps for 2-3 hours before failing, which gives me hope, then drops me :) lol

     
  • Dennis Bush

    Dennis Bush - 2018-05-24

    Mark,

    Part of the issue with sending large files is that after the first pass the server needs to seek long distances across the file to retrieve the relavant parts of the file to send. This seeking can take a long time on files several GB in size resulting the clients getting starved for packets and timing out.

    There are a few things you can do to address this. First on the client, you'll want to increase the cache size via the -c option from the default of 1048576 (1 MB). At speeds over 100 Mbps, this will reduce the number of NAKs by writing to disk fewer times but in larger chunks. You'll also want to use the -B option to increase the UDP receive buffer size from the default of 262144 (256K) to at least 2097152 (2 MB) or more.

    On the server, I see you've already set the robustness factor to its maximum and increased the minimum GRTT to 0.2 seconds. You may want to increase the minumum GRTT further. This will give the server more time to perform large seeks.

    One thing I might try doing is to seek and read in a separate thread and periodically send heartbeat messages if it's taking too long.

    Regards,
    Dennis

     
  • Dennis Bush

    Dennis Bush - 2018-05-27

    Mark,

    I just released version 4.9.7 which should address the timout issue when sending large files. There was a delay in looping through the list of blocks to send which has been fixed.

    With this version, it shouldn't be necessary to artifically increase the GRTT to prevent timeouts. You'll still want to tune the cache size and UDP buffer size on the client however.

    Regards,
    Dennis

     
  • Markw

    Markw - 2018-05-30

    " on files several GB in size"

    This one is a 156 GB file...

    I will give the new version a try with the above advice and new version.

    Thank you for your help with this! Better support than some commercial products I've used

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.