Hello,
I have been testing UFTP 5.0 on the embedded Linux with OpenSSL 1.1.1o. I was trying to use it over a narrowband wireless link. The idea is to have it run in the background with a severely limited maximal rate (as low as 1Kbps) and to rely on session restart/resume if endpoints get disconnected or rebooted.
The results are looking promising, thank you for creating UFTP.
I have run into several issues and created patches to address them. I am sending them in the attachments in case you would be interested in using them.
1) 01_uftp-5.0-client_atexit_order.patch
UFPT client uftpd did not store its restart state if terminated by a signal during a running session.
Client and proxy are using atexit() to register a cleanup function to be executed when the program terminates. OpenSSL registers its own cleanup function with atexit() during its initialization. The order of their registration determines the order of their execution. Since the cleanup function of uftpd uses OpenSSL to encrypt ABORT packet it sends back to server when terminating session, it must be executed before the cleanup function of OpenSSL. Therefore it must be registered with atexit() after the initialization of OpenSSL. Otherwise the client gets an error when encrypting ABORT and fails before writing the restart file(s). The patch changes the order of atexit() calls and resolves the issue for the client. I have made the same change for the proxy, but I have not tested it. The server is already using the correct order of calls.
2) 02_uftp-5.0-server_status_flush.patch
While experimenting with status files (-S), I have noticed that server state is written at once when it terminates, not immediately when an event is reported. I compared its code with the client and added calls to fflush().
3) 03_uftp-5.0-client_cc_rtt.patch
When using Congestion Control (TFMCC), I experienced absurdly large growth to measured RTTs that has made the session hang during its completion phase. This was caused by missing update of group->last_server_ts in the client when processing CONG_CTRL packet with version 5.0. When the timestamp was sent back to server in CC_ACK, it was using an old timestamp from some previous exchange and that made it look like RTT was growing up. Big RTT advertised by server has lead to a slower rate and longer timeouts, creating a feedback loop. This was probably not noticeable on higher transmission rates.
4) 04_uftp-5.0-server_cc_rtt.patch
Tied to the previous patch, there is a fast update of GRTT in the server when receiving CC_ACK (TFMCC_ACK_INFO) with a higher RTT. I have added a constraint on it, so it cannot violate configured maximum of GRTT.
5) 05_uftp-5.0-packet_rate.patch
When using very low rates (<8Kbps) I have repeatedly failed to establish connection between the server and clients. The clients terminated the transmission because of timeout ("Transfer timed out"). This could be somewhat mitigated by severely increasing the robustness factor or initial value of GRTT. After checking the code I think the problem is that the clients are computing their timeouts from the advertised GRTT, but that does not take into account paket delays incurred because of rate limiting in the server (variable "packet_wait"). I have reworked the computation of the advertised GRTT to always include packet_wait delay and this resolved the issue. It may need some refactoring (wrapping into function?), since it must be done in multiple places in the announce phase and the transmission phase.
I have also changed initial rate of the announcement phase (1000Kbps) when using CC TFMCC to respect the configured initial or maximal rate. When used in my narrowband setup, such aggressive announce phase has lead to a congestion.
I have also noticed that the first packet sent in the transmission phase was delayed by rate limiting mechanism, although there already was a delay after the preceding packet from the announcement phase. I have eliminated this delay by initializing the variable "overage" to "packet_wait" in the sending thread.
I also noticed that when the server is interrupted by signal, there is a delay before it writes its restart file and terminates. This is causes by the main thread waiting for the sending thread, which is sleeping between packets ("packet_wait"). On higher rates this is not noticeable, on a very low rates it may prevent uftp from saving its state in time (for example when the device is going to reboot). I have not attempted to resolve this, it is a more complex multithreading issue and its solution may be platform specific.
Eventually I may be experimenting with getting enhanced feedback about running session from both the server and the client. We expect that our sessions may be going for days and we will need some easy to access feedback on their progress.
One more thing I have run into and have not attempted to solve yet: If the client is configured to listen on a dynamic interface that can get removed (not just set DOWN) and created again, it will stop listening (loose its registration for multicast reception). From the standpoint of the list of interfaces, it is probably an entirely new interface with possibly a new index. The only solution that worked so far was restarting the client and then restarting the transaction from the server. I guess there is no other solution except modifying the code to reload the list of interfaces and re-register for multicast reception. This could be ordered by some outside script that manages the interface. The most complex solution would probably use a netlink socket to detect interface events.
A similar event on the server side would lead to sending errors and eventual termination of the server, from which a restart of the server may automatically follow and recover the transmission.
Sorry to bother you with a wall of text, but I would appreciate any insight into a possible solution.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the patches. Most of these I can include as is (with minor formatting changes). The changes to the advertised GRTT will need a little refactoring. One thing in particular I might do with that is to make the inter-packet wait time the lower limit on the advertised GRTT rather than just adding to it. That should still be enough for your use case while preventing any potential adverse affects on higher transfer speeds.
As for the issue of interfaces going up and down, there's probably not much that can realistically be done in that case. Probably best to have an external script to check for such cases.
Regards,
Dennis
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the reply. Fortunately my use case was simplified to using only one interface and it already has a script called whenever it is created or destroyed and it should be a rare event.
Regarding GRTT:
If I understand it correctly, then from the standpoint of client it represents the expected time of arrival of the next packet. Various timeouts are derived from this. If the timeout runs from reception of one packet to reception of another, it should be derived from sending delay + 1/2 * RTT. If it runs after sending packet to server and waits for a reception, it should be derived from delay + RTT. Enforcing this difference strictly is probably not worth the effort, since it would require sending GRTT and packet delay separately in packets.
The problem with using packet delay as a lower limit and not adding it to GRTT is that one part of the timeout is lost either way: either RTT is ignored, because packet delay is larger, or packet delay is ignored, because RTT is larger. In both cases the robustness of the algorithm is compromised because the timeout is shorter than it should be. Where once it had room for 5 lost packets, it will now have room for 4. The worst case scenario is when RTT and delay are of similar size, in which case the robustness may be effectively halved.
I am not certain if GRTT advertised to the client can adversely affect the transmission rate at higher speeds. The rate of sending is entirely controlled by the server, it is either fixed or computed by TFMCC. It will affect periods of repeating lost replies from the client.
Regards,
Vít
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have noticed a suspicious condition in client_fileinfo.c: handle_fileinfo():539: if ((group->fileinfo.ftype != FTYPE_DELETE) ||
(group->fileinfo.ftype != FTYPE_FREESPACE)) {
This condition is always fulfilled, which disables following code dealing with moving an existing target file to backup. There are similarly looking conditions in this section, that are now never evaluated. I have not attempted to do anything with this, since it requires further analysis and it lays outside of my use case anyway.
Regards,
Vít
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am sending another patch, but it is not a fix, it adds a new functionality. I do not propose to add it to the project, I am sending it only in case you find it interesting.
Since I will be running some long-term UFTP processes, I need a feedback. So I have devised a new parameter, for both the client and the server (I am not using a proxy at this moment): '-G progfile'. It allows to set up a file for progress reporting. This file can be the logging stream ("@LOG"). When the process receives signal SIGUSR1, it will dump a progress report to this file. If it is a regular file, it will be overwritten each time.
Progress report for the server:
It first lists all known clients (destinations) and their state. Then it lists all files. The file that is being currently sent will report its transmission rate [KB/s] , size [B] and estimated progress [%]. This estimate will steadily go up, but it may jump lower after receiving NAKs.
I have implemented this for Linux only, since I have no experience with signal handling in Windows and I do not have a compilation environment for it anyway.
Regards,
Vít
Hello,
I have been testing UFTP 5.0 on the embedded Linux with OpenSSL 1.1.1o. I was trying to use it over a narrowband wireless link. The idea is to have it run in the background with a severely limited maximal rate (as low as 1Kbps) and to rely on session restart/resume if endpoints get disconnected or rebooted.
The results are looking promising, thank you for creating UFTP.
I have run into several issues and created patches to address them. I am sending them in the attachments in case you would be interested in using them.
1) 01_uftp-5.0-client_atexit_order.patch
UFPT client uftpd did not store its restart state if terminated by a signal during a running session.
Client and proxy are using atexit() to register a cleanup function to be executed when the program terminates. OpenSSL registers its own cleanup function with atexit() during its initialization. The order of their registration determines the order of their execution. Since the cleanup function of uftpd uses OpenSSL to encrypt ABORT packet it sends back to server when terminating session, it must be executed before the cleanup function of OpenSSL. Therefore it must be registered with atexit() after the initialization of OpenSSL. Otherwise the client gets an error when encrypting ABORT and fails before writing the restart file(s). The patch changes the order of atexit() calls and resolves the issue for the client. I have made the same change for the proxy, but I have not tested it. The server is already using the correct order of calls.
2) 02_uftp-5.0-server_status_flush.patch
While experimenting with status files (-S), I have noticed that server state is written at once when it terminates, not immediately when an event is reported. I compared its code with the client and added calls to fflush().
3) 03_uftp-5.0-client_cc_rtt.patch
When using Congestion Control (TFMCC), I experienced absurdly large growth to measured RTTs that has made the session hang during its completion phase. This was caused by missing update of group->last_server_ts in the client when processing CONG_CTRL packet with version 5.0. When the timestamp was sent back to server in CC_ACK, it was using an old timestamp from some previous exchange and that made it look like RTT was growing up. Big RTT advertised by server has lead to a slower rate and longer timeouts, creating a feedback loop. This was probably not noticeable on higher transmission rates.
4) 04_uftp-5.0-server_cc_rtt.patch
Tied to the previous patch, there is a fast update of GRTT in the server when receiving CC_ACK (TFMCC_ACK_INFO) with a higher RTT. I have added a constraint on it, so it cannot violate configured maximum of GRTT.
5) 05_uftp-5.0-packet_rate.patch
When using very low rates (<8Kbps) I have repeatedly failed to establish connection between the server and clients. The clients terminated the transmission because of timeout ("Transfer timed out"). This could be somewhat mitigated by severely increasing the robustness factor or initial value of GRTT. After checking the code I think the problem is that the clients are computing their timeouts from the advertised GRTT, but that does not take into account paket delays incurred because of rate limiting in the server (variable "packet_wait"). I have reworked the computation of the advertised GRTT to always include packet_wait delay and this resolved the issue. It may need some refactoring (wrapping into function?), since it must be done in multiple places in the announce phase and the transmission phase.
I have also changed initial rate of the announcement phase (1000Kbps) when using CC TFMCC to respect the configured initial or maximal rate. When used in my narrowband setup, such aggressive announce phase has lead to a congestion.
I have also noticed that the first packet sent in the transmission phase was delayed by rate limiting mechanism, although there already was a delay after the preceding packet from the announcement phase. I have eliminated this delay by initializing the variable "overage" to "packet_wait" in the sending thread.
I also noticed that when the server is interrupted by signal, there is a delay before it writes its restart file and terminates. This is causes by the main thread waiting for the sending thread, which is sleeping between packets ("packet_wait"). On higher rates this is not noticeable, on a very low rates it may prevent uftp from saving its state in time (for example when the device is going to reboot). I have not attempted to resolve this, it is a more complex multithreading issue and its solution may be platform specific.
Eventually I may be experimenting with getting enhanced feedback about running session from both the server and the client. We expect that our sessions may be going for days and we will need some easy to access feedback on their progress.
One more thing I have run into and have not attempted to solve yet: If the client is configured to listen on a dynamic interface that can get removed (not just set DOWN) and created again, it will stop listening (loose its registration for multicast reception). From the standpoint of the list of interfaces, it is probably an entirely new interface with possibly a new index. The only solution that worked so far was restarting the client and then restarting the transaction from the server. I guess there is no other solution except modifying the code to reload the list of interfaces and re-register for multicast reception. This could be ordered by some outside script that manages the interface. The most complex solution would probably use a netlink socket to detect interface events.
A similar event on the server side would lead to sending errors and eventual termination of the server, from which a restart of the server may automatically follow and recover the transmission.
Sorry to bother you with a wall of text, but I would appreciate any insight into a possible solution.
Vit,
Thanks for the patches. Most of these I can include as is (with minor formatting changes). The changes to the advertised GRTT will need a little refactoring. One thing in particular I might do with that is to make the inter-packet wait time the lower limit on the advertised GRTT rather than just adding to it. That should still be enough for your use case while preventing any potential adverse affects on higher transfer speeds.
As for the issue of interfaces going up and down, there's probably not much that can realistically be done in that case. Probably best to have an external script to check for such cases.
Regards,
Dennis
Thank you for the reply. Fortunately my use case was simplified to using only one interface and it already has a script called whenever it is created or destroyed and it should be a rare event.
Regarding GRTT:
If I understand it correctly, then from the standpoint of client it represents the expected time of arrival of the next packet. Various timeouts are derived from this. If the timeout runs from reception of one packet to reception of another, it should be derived from sending delay + 1/2 * RTT. If it runs after sending packet to server and waits for a reception, it should be derived from delay + RTT. Enforcing this difference strictly is probably not worth the effort, since it would require sending GRTT and packet delay separately in packets.
The problem with using packet delay as a lower limit and not adding it to GRTT is that one part of the timeout is lost either way: either RTT is ignored, because packet delay is larger, or packet delay is ignored, because RTT is larger. In both cases the robustness of the algorithm is compromised because the timeout is shorter than it should be. Where once it had room for 5 lost packets, it will now have room for 4. The worst case scenario is when RTT and delay are of similar size, in which case the robustness may be effectively halved.
I am not certain if GRTT advertised to the client can adversely affect the transmission rate at higher speeds. The rate of sending is entirely controlled by the server, it is either fixed or computed by TFMCC. It will affect periods of repeating lost replies from the client.
Regards,
Vít
Hello,
I have noticed a suspicious condition in client_fileinfo.c: handle_fileinfo():539:
if ((group->fileinfo.ftype != FTYPE_DELETE) || (group->fileinfo.ftype != FTYPE_FREESPACE)) {
This condition is always fulfilled, which disables following code dealing with moving an existing target file to backup. There are similarly looking conditions in this section, that are now never evaluated. I have not attempted to do anything with this, since it requires further analysis and it lays outside of my use case anyway.
Regards,
Vít
Yes, that is a bug. The comparisons should each be
==
instead of!=
. I'll fix that as well.The conditions that follow those lines are correct.
Hello,
I am sending another patch, but it is not a fix, it adds a new functionality. I do not propose to add it to the project, I am sending it only in case you find it interesting.
Since I will be running some long-term UFTP processes, I need a feedback. So I have devised a new parameter, for both the client and the server (I am not using a proxy at this moment): '-G progfile'. It allows to set up a file for progress reporting. This file can be the logging stream ("@LOG"). When the process receives signal SIGUSR1, it will dump a progress report to this file. If it is a regular file, it will be overwritten each time.
Progress report for the server:
It first lists all known clients (destinations) and their state. Then it lists all files. The file that is being currently sent will report its transmission rate [KB/s] , size [B] and estimated progress [%]. This estimate will steadily go up, but it may jump lower after receiving NAKs.
Progress report for client:
I have implemented this for Linux only, since I have no experience with signal handling in Windows and I do not have a compilation environment for it anyway.
Regards,
Vít