Eric W. Biederman wrote:
> Ok. The important thing is that the stack does not have any nasty
> interaction effects when congestion occurs. Which I know some of the
> proposed TCP algorithms have. The worse case I can imagine is everyone
> assuming they have an infinite TCP window and just blasting packets.
> That only works for one client, and one server, on an unoccupied network.
The following is a discussion of the short-cuts that I took and why I do
not expect them to be cause for any serious problems; if somebody can
point out why those assumptions are wrong, I'll try to implement fixes.
Technically, my code is broken, because it does not check window sizes
for sending data. It should not be overly complicated to add that code,
but realistically I'd never expect it to trigger.
In the case of HTTP (and iSCSI is probably similar), we only ever send a
very small packet at the beginning of the connection and afterwards read
a large response. If the sending window is originally closed, we
incorrectly keep retransmitting data, whereas we really should wait
until the window opens before attempting to do so. But if we do manage
to trigger this problem, the server has far bigger problems (i.e.
accepting new TCP connections, but not having any available network
buffers). And the packets that we send do not actually cause any
problems other than using up some network bandwidth.
The other bug is that we only send a single packet and then wait for the
ACK. While not completely wrong, waiting for the ACK causes a small
performance penalty. This is a deliberate choice, because the code gets
a lot easier if the only packet which could ever need retransmission is
the very last one we sent. As we typically never send more than one
packet worth of data during the entire connection (the HTTP GET request
is quite small), implementing a sliding window on the sending side is
quite some overkill.
As for dealing with congested networks, I believe this is mostly done on
the server side, although I will check my documentation and see if there
is anything we should do (other than sending an early ACK if we see out
of order packets, but the code already handles that case). The server
just keeps sending data at us as fast as it thinks OK, and we keep
acknowledging it as we get it, and adjusting the window size as needed;
I believe, there is not much more that we can do about congestion control.
The last protocol violation that I am aware of, is the missing protocol
state for receiving the final ACK package in the session. This is not
typically a problem; but if our FIN/ACK got dropped, this will keep us
from retransmitting it and thus the server could stay in FIN_WAIT_1
until the connection times out. I do not expect this bug to show unless
the network is very congested (and even then, its not a big problem),
but if anybody ever notices, it can be fixed with a little effort.
Another simplification of the code is that it only ever allows for one
open connection, but none of my use cases are affected by that. It also
does not implement listening for incoming connections, but those are
unlikely to ever be needed in a boot Prom. And finally, it currently
does not allow for sending anything other than one initial block of
data; I think, if needed, this restrictions could be removed with a one
or two line change, though.
In short, this TCP stack follows the same design philosophy as the rest
of Etherboot. It does not attempt to be a full general purpose
implementation, but rather something very compact which is tailored
exactly to the use patterns that we need. If you think that any of the
above limitations are not acceptable, make a case for it, and I'll see
if I can address it.
> The hard case is the initial block of data. Which is assumed to have
> enough information to identify the file formant and to start parsing
> the file. I believe several of the OS loaders will choke if this is
> less than 512 bytes. After that we are pretty safe, variable size TFTP
> blocks need to be handled already. They all are all >= 512 bytes but
> there is nothing like a power of two requirement. We should be able
> to handle 513 byte packets without a hitch.
Again, I'd be very surprised, if any real-life deployment caused
problems here. Given how all modern network stacks (other than my really
simple-minded one ;-) implement Nagle and/or have applications that try
to fill buffers before sending, the first packet received should almost
certainly contain more than 512 bytes of payload. The only exceptions
that I could see would be unusually small MTUs (unlikely in a LAN) or
unusually large HTTP headers in front of the payload.
It would be easy enough to reblock to 512 chunks, but it is going to
take up 512 of space in the BSS or on the stack. I'd rather wait for bug
reports from the field, before implementing an expensive work-around for
a bug that never triggers. I could be persuaded otherwise, if somebody
feels very strongly about this.
Markus
|