Re: TCP and HTTP support (Was: [Etherboot-developers] 3c900-tpo hang after DHCPOFFER)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Eric W. Biederman wrote:
> Ok.  The important thing is that the stack does not have any nasty
> interaction effects when congestion occurs.  Which I know some of the
> proposed TCP algorithms have.  The worse case I can imagine is everyone
> assuming they have an infinite TCP window and just blasting packets.
> That only works for one client, and one server, on an unoccupied network.

The following is a discussion of the short-cuts that I took and why I do 
not expect them to be cause for any serious problems; if somebody can 
point out why those assumptions are wrong, I'll try to implement fixes.

Technically, my code is broken, because it does not check window sizes 
for sending data. It should not be overly complicated to add that code, 
but realistically I'd never expect it to trigger.

In the case of HTTP (and iSCSI is probably similar), we only ever send a 
very small packet at the beginning of the connection and afterwards read 
a large response. If the sending window is originally closed, we 
incorrectly keep retransmitting data, whereas we really should wait 
until the window opens before attempting to do so. But if we do manage 
to trigger this problem, the server has far bigger problems (i.e. 
accepting new TCP connections, but not having any available network 
buffers). And the packets that we send do not actually cause any 
problems other than using up some network bandwidth.

The other bug is that we only send a single packet and then wait for the 
ACK. While not completely wrong, waiting for the ACK causes a small 
performance penalty. This is a deliberate choice, because the code gets 
a lot easier if the only packet which could ever need retransmission is 
the very last one we sent. As we typically never send more than one 
packet worth of data during the entire connection (the HTTP GET request 
is quite small), implementing a sliding window on the sending side is 
quite some overkill.

As for dealing with congested networks, I believe this is mostly done on 
the server side, although I will check my documentation and see if there 
is anything we should do (other than sending an early ACK if we see out 
of order packets, but the code already handles that case). The server 
just keeps sending data at us as fast as it thinks OK, and we keep 
acknowledging it as we get it, and adjusting the window size as needed; 
I believe, there is not much more that we can do about congestion control.

The last protocol violation that I am aware of, is the missing protocol 
state for receiving the final ACK package in the session. This is not 
typically a problem; but if our FIN/ACK got dropped, this will keep us 
from retransmitting it and thus the server could stay in FIN_WAIT_1 
until the connection times out. I do not expect this bug to show unless 
the network is very congested (and even then, its not a big problem), 
but if anybody ever notices, it can be fixed with a little effort.

Another simplification of the code is that it only ever allows for one 
open connection, but none of my use cases are affected by that. It also 
does not implement listening for incoming connections, but those are 
unlikely to ever be needed in a boot Prom. And finally, it currently 
does not allow for sending anything other than one initial block of 
data; I think, if needed, this restrictions could be removed with a one 
or two line change, though.

In short, this TCP stack follows the same design philosophy as the rest 
of Etherboot. It does not attempt to be a full general purpose 
implementation, but rather something very compact which is tailored 
exactly to the use patterns that we need. If you think that any of the 
above limitations are not acceptable, make a case for it, and I'll see 
if I can address it.

> The hard case is the initial block of data.  Which is assumed to have
> enough information to identify the file formant and to start parsing
> the file.  I believe several of the OS loaders will choke if this is
> less than 512 bytes.  After that we are pretty safe, variable size TFTP
> blocks need to be handled already.  They all are all >= 512 bytes but
> there is nothing like a power of two requirement.  We should be able
> to handle 513 byte packets without a hitch.

Again, I'd be very surprised, if any real-life deployment caused 
problems here. Given how all modern network stacks (other than my really 
simple-minded one ;-) implement Nagle and/or have applications that try 
to fill buffers before sending, the first packet received should almost 
certainly contain more than 512 bytes of payload. The only exceptions 
that I could see would be unusually small MTUs (unlikely in a LAN) or 
unusually large HTTP headers in front of the payload.

It would be easy enough to reblock to 512 chunks, but it is going to 
take up 512 of space in the BSS or on the stack. I'd rather wait for bug 
reports from the field, before implementing an expensive work-around for 
a bug that never triggers. I could be persuaded otherwise, if somebody 
feels very strongly about this.

Markus