I have written an application to transfer very large files, either via TCP or UDT from a server to a client.
I am running into issues where the throughput of the transfer keeps slowing down, and eventually completely coming to a halt. I have some control over that by choosing the size of the transfer buffer - but I have not yet managed to transfer files with a size of more than 6 GB. The app is intended for files up to 100 GB.
This is my send loop:
while(true){
int c1 = -1;
while (c1 < buf_.length) {
c = in.read(buf_, (c1<0?0:c1), (buf_.length - (c1<0?0:c1) ));
if (c<0) break;
c1 = (c1<0?0:c1) + c;
}
if (c1 < 0 ) break;
read += c1;
out.write(buf_, 0, c1);
out.flush();
if(read>=size && size>-1)break;
}
A buffer is filled first and then sent out over an OutputStream obtained from an UDTSocket. Flush() is called to ensure that the data was received before continuing.
The receiver loop pretty much keeps reading data until a specified number of bytes has been received. The received data is written directly to a local file:
while ( total < size ) {
total+=(l = dataTransferInputStream.read(buffer));
outputStream.write(buffer, 0, l);
if (l == 0) try {
Thread.sleep(2000); // CPU load reduction (debug only)
} catch (InterruptedException ex) { ; }
}
I am lost at this point: Why does the file transfer slow down after a few GB and come to a complete halt at the end?
Thank you for any help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have been monitoring the behavior of the UDTSocket during transmission. Below is an example of a log from the server that is sending the file (the numbers are sequential packet numbers assigned by the sender).
On the receiver side, each of these packets causes 'OK' to be false (in UDTReceiver.onDataPacketReceived):
boolean OK=session.getSocket().getInputStream().haveNewData(currentSequenceNumber,dp.getData());
It seems that invariably a point is reached where the receiver does not receive new packets, includes it in a NACK packet, and the server re-sends the packets. Looking through the logs the protocol does recover from this occasionally, but in the end on of these events will cause a timeout condition on the receiver side.
Thanks! - Alexander
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think I have made some progress with this problem. I found a few issues which appeared to be bugs. Please let me know what you think:
Again, I am debugging the case where I try to send a very large file from a UDTSender object to a UDTReceiver object.
1. ACK2 never received on the client side.
I noticed that in my project the onAck2PacketReceived() method was never executed, even though Acknowledgment2 packets were sent out by the sender. I think the problem was that converting an Acknowledgment2 packet into a UDTPacket, and then back into an Acknowledgment2 packet discards the information regarding packet type and Sequence Number.
To fix this issue I added/modified this code in the Acknowledgment2 class:
With these two functions modified the packet information remains intact and the receiver properly receives ACK2 packets.
2. The biggest issue with large files was that they slowly stopped transferring after a while, and eventually the receiver would send a disconnect message due to not having received new data for too long.
I have now traced this issue to the ReceiveBuffer class, and I think there are two separate issues:
2a. Packets that have already been read are added to the receiver buffer. I changed one line in the offer() method:
I think the first line improperly allowed older packets to be added to the buffer. The change exits the offer method if a packet number that has already been read by the receiver client is presented.
2b. I also noticed that the number of valid chunks increased steadily during file transfers, even apart from issue 2a. Eventually the entire buffer would be full, which led to the situation I described in the above post, and the end of data transmission.
I found that the numValidChunks variable kept increasing, while the actual number of valid chunks did not increase, so the number reported by the class became incorrect, eventally reaching the number of spaces in the buffer, so that this code never allowed addition of new data to the receiver buffer:
public boolean offer(AppData data){
if(numValidChunks.get()==size) {
return false;
I got around this problem by increasing the buffer - but after transferring a 6.4 GB file the numValidChunks.get() variable reached over 90,000 reported entries, so this is still a limitation for very large files.
I haven't really found a solution to this yet; I just added a hack: every 100th or 1000th read iteration I am now executing a new method:
public void clean_buffer() {
int used = 0;
for (int i=0; i<size; i++) {
if (buffer_ != null)
used++;
}
numValidChunks.set(used);
}
This, at least, keeps the buffer utilization accurate and low, so that a very large buffer is no longer necessary.
Please let me know if you have any thoughts or comments.
Thanks!
Alexander_
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
thank you very much, that was some excellent bug-hunting!
Once one sees these bugs, it is hard to believe they could remain
un-noticed for so long, which in my experience is a trademark
of very good bugs :-)
I've committed a fix for these bugs, and ran some 10GB through
without any trouble.
Thanks again and best regards,
Bernd.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have written an application to transfer very large files, either via TCP or UDT from a server to a client.
I am running into issues where the throughput of the transfer keeps slowing down, and eventually completely coming to a halt. I have some control over that by choosing the size of the transfer buffer - but I have not yet managed to transfer files with a size of more than 6 GB. The app is intended for files up to 100 GB.
This is my send loop:
while(true){
int c1 = -1;
while (c1 < buf_.length) {
c = in.read(buf_, (c1<0?0:c1), (buf_.length - (c1<0?0:c1) ));
if (c<0) break;
c1 = (c1<0?0:c1) + c;
}
if (c1 < 0 ) break;
read += c1;
out.write(buf_, 0, c1);
out.flush();
if(read>=size && size>-1)break;
}
A buffer is filled first and then sent out over an OutputStream obtained from an UDTSocket. Flush() is called to ensure that the data was received before continuing.
The receiver loop pretty much keeps reading data until a specified number of bytes has been received. The received data is written directly to a local file:
while ( total < size ) {
total+=(l = dataTransferInputStream.read(buffer));
outputStream.write(buffer, 0, l);
if (l == 0) try {
Thread.sleep(2000); // CPU load reduction (debug only)
} catch (InterruptedException ex) { ; }
}
I am lost at this point: Why does the file transfer slow down after a few GB and come to a complete halt at the end?
Thank you for any help!
hi,
yes I think I can reproduce it even in a testcase. Some patience is required ;-)
I'll try to see what is behind it.
Bernd.
If it is any help…
I have been monitoring the behavior of the UDTSocket during transmission. Below is an example of a log from the server that is sending the file (the numbers are sequential packet numbers assigned by the sender).
ACK 583972950
NAK: 583972950
NAK: 583972951
NAK: 583972958
NAK: 583972959
NAK: 583972960
NAK: 583972961
NAK: 583972962
NAK: 583972963
Retransmitting 583972950
Retransmitting 583972951
Retransmitting 583972958
Retransmitting 583972959
Retransmitting 583972960
Retransmitting 583972961
Retransmitting 583972962
ACK 583972950
NAK: 583972950
NAK: 583972951
NAK: 583972958
NAK: 583972959
NAK: 583972960
NAK: 583972961
NAK: 583972962
NAK: 583972963
Retransmitting 583972950
Retransmitting 583972951
Retransmitting 583972958
Retransmitting 583972959
Retransmitting 583972960
Retransmitting 583972961
Retransmitting 583972962
Retransmitting 583972963
ACK 583972950
NAK: 583972950
NAK: 583972951
NAK: 583972958
NAK: 583972959
NAK: 583972960
NAK: 583972961
NAK: 583972962
NAK: 583972963
Retransmitting 583972950
Retransmitting 583972951
Retransmitting 583972958
Retransmitting 583972959
Retransmitting 583972960
Retransmitting 583972961
Retransmitting 583972962
Retransmitting 583972963
Repeating pattern until I interrupt the server.
On the receiver side, each of these packets causes 'OK' to be false (in UDTReceiver.onDataPacketReceived):
boolean OK=session.getSocket().getInputStream().haveNewData(currentSequenceNumber,dp.getData());
It seems that invariably a point is reached where the receiver does not receive new packets, includes it in a NACK packet, and the server re-sends the packets. Looking through the logs the protocol does recover from this occasionally, but in the end on of these events will cause a timeout condition on the receiver side.
Thanks! - Alexander
I think I have made some progress with this problem. I found a few issues which appeared to be bugs. Please let me know what you think:
Again, I am debugging the case where I try to send a very large file from a UDTSender object to a UDTReceiver object.
1. ACK2 never received on the client side.
I noticed that in my project the onAck2PacketReceived() method was never executed, even though Acknowledgment2 packets were sent out by the sender. I think the problem was that converting an Acknowledgment2 packet into a UDTPacket, and then back into an Acknowledgment2 packet discards the information regarding packet type and Sequence Number.
To fix this issue I added/modified this code in the Acknowledgment2 class:
import java.io.ByteArrayOutputStream;
void decode(bytedata){
ackSequenceNumber=PacketUtil.decode(data, 0);
}
@Override
public byte encodeControlInformation(){
try {
ByteArrayOutputStream bos=new ByteArrayOutputStream();
bos.write(PacketUtil.encode(ackSequenceNumber));
return bos.toByteArray();
} catch (Exception e) {
return null;
}
}
With these two functions modified the packet information remains intact and the receiver properly receives ACK2 packets.
2. The biggest issue with large files was that they slowly stopped transferring after a while, and eventually the receiver would send a disconnect message due to not having received new data for too long.
I have now traced this issue to the ReceiveBuffer class, and I think there are two separate issues:
2a. Packets that have already been read are added to the receiver buffer. I changed one line in the offer() method:
if(SequenceNumber.compare(seq, initialSequenceNumber)<0)return true;
changed to:
if(SequenceNumber.compare(seq, highestReadSequenceNumber)<=0)return true;
I think the first line improperly allowed older packets to be added to the buffer. The change exits the offer method if a packet number that has already been read by the receiver client is presented.
2b. I also noticed that the number of valid chunks increased steadily during file transfers, even apart from issue 2a. Eventually the entire buffer would be full, which led to the situation I described in the above post, and the end of data transmission.
I found that the numValidChunks variable kept increasing, while the actual number of valid chunks did not increase, so the number reported by the class became incorrect, eventally reaching the number of spaces in the buffer, so that this code never allowed addition of new data to the receiver buffer:
public boolean offer(AppData data){
if(numValidChunks.get()==size) {
return false;
I got around this problem by increasing the buffer - but after transferring a 6.4 GB file the numValidChunks.get() variable reached over 90,000 reported entries, so this is still a limitation for very large files.
I haven't really found a solution to this yet; I just added a hack: every 100th or 1000th read iteration I am now executing a new method:
public void clean_buffer() {
int used = 0;
for (int i=0; i<size; i++) {
if (buffer_ != null)
used++;
}
numValidChunks.set(used);
}
This, at least, keeps the buffer utilization accurate and low, so that a very large buffer is no longer necessary.
Please let me know if you have any thoughts or comments.
Thanks!
Alexander_
hi Alexander!
thank you very much, that was some excellent bug-hunting!
Once one sees these bugs, it is hard to believe they could remain
un-noticed for so long, which in my experience is a trademark
of very good bugs :-)
I've committed a fix for these bugs, and ran some 10GB through
without any trouble.
Thanks again and best regards,
Bernd.
Excellent!
It works like a charm now, thanks! :)
Alexander