Re: TCP and HTTP support (Was: [Etherboot-developers] 3c900-tpo hang after DHCPOFFER)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>>>2) How well does it the tcp stack work in a congested environment.
>>>   - Does it play badly with other simultaneous tcp connections?
>>>   - Does the implemented window scaling work well.
>>
>>I don't have much in the form of test environments. Hopefully, somebody else can
>>
>>take a stab at testing the code in more real-life situations.
> 
> 
> Some of that can be done by simple inspection and theoretical calculation.
> Testing helps two of course.  But I know I put in all of the exponential
> back off code into etherboot based on pure inspection.

Apart from the fact that I forgot all my documentation at work and had 
to base the code on what I remembered and what I could find online, I 
think the algorithm used mostly makes sense.

The window scaling algorithms that you usually find in TCP stacks assume 
that the kernel receives packets in interrupt service handlers and 
delivers them once the user space application is ready to consume data. 
If the application cannot consume fast enough, then flow control must 
kick in.

None of this is really true for Etherboot. Receiving data is the bottle 
neck, as we can only poll the driver and not all cards support 
sufficiently large buffers; but once the data is received, it can be 
consumed immediately. Also, we don't really know, if we just dropped 
packets on the floor, because we did not poll fast enough.

That's why my code slowly grows the window size, but as soon as it 
discovers problems, shrinks it back to half. It could try to do smarter 
things based on timeouts and RTT estimates, but I am not sure this would 
really help. On the other hand, in my small test environment, I have 
never seen it try to shrink the window; but in more congested networks 
or on slower clients, I wouldn't be surprised if it happened occasionally.

In fact, I do try to correctly compute smoothed estimates of the 
round-trip-time, but realistically that is just overkill. The client 
never sends more than the initial SYN packet, one data packet and 
occasional ACKs. This is definitly not enough to estimate RTT very 
precisely.

This would be different, if somebody really went to the trouble of 
implementing more complicated TCP based protocols (such as the iSCSI 
stuff that you mentioned). If there is a lot of data being sent in both 
directions, then the TCP implementation probably needs to be more 
sophisticated than what I do right now.

As for the more advanced tuning algorithms that modern TCP stacks use 
(e.g. slow start, restart after a stopped connection, ECN, ...), I'll 
check with my documentation and see if any of those are applicable, but 
I don't expect that much of it is useable.

>>>3) Are alignment considerations that need to be taken care of.
>>
>>??? Can you elaborate what you were thinking of?
> 
> 
> Compilers on non-x86 assume there certain kinds of data are aligned
> to certain boundaries.  4 bytes for a 32bit int for example.  Network
> packets have a tendency to misalign structures.
> 
> For running the code on the Itanium for example it is very important
> that we don't have that kind of issue.

I see. I don't believe any of my code violates any alignment 
requirements (other than the HTTP handler calling getdec() in a slightly 
broken way; we really should have something like getndec()), but I am 
not sure about the OS loaders. The TCP code just hands the received data 
off as it gets it. Can all of our loaders deal with arbitrarily 
fragmented blocks or do I have to reassemble blocks before I can hand 
them off? If I have to do this, what block size do I have to use? Is any 
power of two OK, or does it have to be 512 bytes?

Markus