Menu

Transfer aborted by ... Transfer timed out

2019-10-21
2019-10-28
  • restlessmindsstudio

    First off I want to thank you for all the work you've put into UFTP. I've been using it for years now and it's been very reliable.

    I'm looking for some guidance as to how to resolve an issue that has suddenly popped up at one site where I use UFTP. I use it to transfer around 15GB of files each week to Linux-based systems on a local network. All systems are hard-wired to a 100Mbps switch. I do this at around 70 sites and have never encountered a problem like this. Based on the log, it appears the data transfer is timing out and then all the clients are aborting.

    I retried the transfer multiple times and even tried rebooting the router and all systems on the network. I then tried half the transfer speed and when that also failed I increased the transfer speed but still had the same result. I went ahead and compiled the latest version of UFTP (4.10) and put this on all systems at the site but still no change. I also tried adjusting the init_grtt using: "-r 0.5:0.1:1000" but that had no effect either.

    The final thing I tried was to limit it to just one system which also timed out. This particular site usually has 16 systems that are part of the multicast. Based on the log, it seems to time out when it's nearly done transferring all files. Below is the final output from limiting it to just one system. The time out occured after about 2 hours and again most of the files had been transferred before it timed out.

    ...
    2019/10/21 03:00:57.324062: [21CB473E/00:0016]: Sending section 144
    2019/10/21 03:01:08.269944: [21CB473E/00:0016]: Sending section 145
    2019/10/21 03:01:19.215961: [21CB473E/00:0016]: Sending section 146
    2019/10/21 03:01:30.161832: [21CB473E/00:0016]: Sending section 147
    2019/10/21 03:01:38.484942: [21CB473E/00:0016]: Transfer aborted by 0x0A000015: Transfer timed out
    2019/10/21 03:01:40.909932: [21CB473E/00:0016]: Transfer status:
    2019/10/21 03:01:40.910037: [21CB473E/00:0016]: Host: 0x0A000015 Status: Aborted
    2019/10/21 03:01:40.910093: [21CB473E/00:0016]: Total elapsed time: 0.000 seconds
    2019/10/21 03:01:40.910145: [21CB473E/00:0016]: Overall throughput: 0.00 KB/s
    2019/10/21 03:01:40.910467: [21CB473E/00:0]: Failed to read directory 746183021684: Resource temporarily unavailable

     
  • Dennis Bush

    Dennis Bush - 2019-10-23

    These logs indicate that there is a roughly 8.3 second gap where the receiver doesn't receive any valid packets for the group.

    Start by setting the log level to 3 on the receiver. That will tell you if packets are being dropped by the receiver due to a high level of packet reordering. If that doesn't reveal anything, run wireshark/tcpdump on both the sender and receiver to see if packets are being lost en route or if the receiving server gets the packets but the UFTP receiver does not.

    You might also want to increase the UDP receive buffer size using the -B on the receiver. You may need to increase the OS limit on this buffer by running sysctl -w net.core.rmem_max={value}

     
  • restlessmindsstudio

    Hi Dennis thanks for the reply. So I went ahead and retried copying the same set of files to just one receiver and set its log level to 3. It timed out again so I reviewed the log but I'm unsure what I should be looking for that would indicate packets are being dropped. I've attached the log from the receiver's session.

    Thanks,
    Jim

     
  • Dennis Bush

    Dennis Bush - 2019-10-24

    Jim,

    I don't see any indication of out of order packets that could be dropped here. The next step would be a wireshark trace on both sides.

    Regards,
    Dennis

     
  • restlessmindsstudio

    Ok I didn't have enough time to get Wireshark on these machines due to time constraints. I noticed the exit code was 9 on the sender side which according to the docs means: "All client either dropped out of the session or aborted. Also returned if one client drops out or aborts when -q is specified."

    I'm starting uftp from my own software and it uses the exit code to determine if it was successful or if it can restart the copy process due to some error. I changed my software to also try and use the restart file when an exit code of 9 is encountered and it was able to finish copying all files to receives on the second attempt.

    So I'll see how things go with the next batch of files, I might have time to run Wireshark next time.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.