From: Neil H. <nh...@tu...> - 2006-01-31 13:29:56
|
On Mon, Jan 30, 2006 at 02:36:51PM -0500, Vlad Yasevich wrote: > On Mon, 2006-01-30 at 18:33 +0100, Michael Tuexen wrote: > > Hi Vlad, > > > > we compiled a 2.6.15 kernel and applied your patch. It seems to do what > > it should do, the throughput is better. But we still have problems that > > a simple bulk transfer does not work... I'll see, if I can provide more > > details. > > If you used the original patch (not the one I sent this morning), then > it's understandable. The original patch had a bug where it honored the > congestion window during fast retransmit, but now on new data or t3 > timeout. That would wreck havoc with your setup. > > Try the updated patch from today and let us know what happens. > > > I'm wondering if there are fixes available for the receiver window > > problem > > Sridhar mentioned and provided a work around by setting the receiver > > window to a large value. If not, is there a plan when they are > > available? > > This one is a bit tough. There are some patches floating around, but > they don't directly address this problem. > > I know Neil was looking at this before. If I find time, I might be > able to take a wack at it. > Yes, theres a good deal of complexity in trying to manage the appropriate receive window versus the amount of actual receive buffer space you have, especially given that we wanted to take the ratio of payload to overhead data into account. Some initial proposed patches are available in the mailing list acrhives from 3-4 months back, if you want to see some of what we tried. I've been meaning to get back to looking at this, but if you have some more ideas Vlad, I'd love to see them. Thanks & Regards Neil > In the mean time, slow receivers and small packets will cause problems. > > -vlad > > > > > Best regards > > Michael > > > > On Jan 27, 2006, at 23:35 Uhr, Vlad Yasevich wrote: > > > > > Michael > > > > > > OK, here is what lksctp stack is doing. > > > > > > Every time a tsn falls into the gap, we count a strike against it. > > > When the strike count reaches 4 (4 SACKs with a gap not acknowledging > > > the chunk), we fast retransmit. > > > > > > After fast retransmission, we reset the strike count and > > > fast_retransmit > > > allowance on the chunk. So, if we get 4 more SACKs listing the TSN as > > > missing, we'll fast retransmit again. > > > > > > Here is a patch that will not do that. > > > > > > With this patch, once the chunk has been fast retransmitted, it will > > > not be fast retransmitted ever again. The only retransmits of this > > > chunk will be timeouts. > > > > > > Is this what you are looking for? > > > > > > -vlad > > > > > > > > >> On Jan 27, 2006, at 21:20 Uhr, Vlad Yasevich wrote: > > >> > > >>> On Fri, 2006-01-27 at 20:11 +0100, Michael Tuexen wrote: > > >>>> Hi Vlad, > > >>>> > > >>>> see my comments in-line. > > >>>> > > >>>> Best regards > > >>>> Michael > > >>>> > > >>>> On Jan 27, 2006, at 16:01 Uhr, Vlad Yasevich wrote: > > >>>> > > >>>>> Michael > > >>>>> > > >>>>> On Thu, 2006-01-26 at 17:07 +0100, Michael Tuexen wrote: > > >>>>>> Hi Sridhar, > > >>>>>> > > >>>>>> yes you are right, that is a workaround. > > >>>>>> > > >>>>>> We found another problem. When is LKSCTP sending fast > > >>>>>> retransmissions? > > >>>>> > > >>>>> Looks like we go into fast retransmit mode when a chunk is reported > > >>>>> missing 4 times (we haven't changed it to 3 yet as the new IG > > >>>>> says). > > >>>> I do not care about the 3 or 4 issue. What I see is that the data > > >>>> sender > > >>>> receives a lot of SACK with gap reports. As a consequence several > > >>>> TSNS > > >>>> are fast retransmitted several times (within a very short time). As > > >>>> a > > >>>> consequence > > >>>> the bandwidth of link is used by all these duplicate FR. > > >>>> > > >>>> The IG says that a TSN can only be FR once. Is this implemented in > > >>>> LKSCTP? Will > > >>>> someone fix this? > > >>> > > >>> Michael > > >>> > > >>> I think I see the problem, but was wondering if you can provide a > > >>> packet > > >>> capture so that I can make sure that I am looking at the right thing. > > >>> > > >>> Thanks > > >>> -vlad > > >>> > > >>>> > > >>>> As a result LKSCTP can not fill a 1MBit link beween to hosts... > > >>>> > > >>>> Best regards > > >>>> Michael > > >>>> > > >>>>> > > >>>>> What is the problem you are seeing? > > >>>>> > > >>>>> -vlad > > >>>>> > > >>>>>> > > >>>>>> Best regards > > >>>>>> Michael > > >>>>>> > > >>>>>> On Jan 26, 2006, at 6:40 Uhr, Sridhar Samudrala wrote: > > >>>>>> > > >>>>>>> Michael, > > >>>>>>> > > >>>>>>> This seems to be the same issue that we discussed recently n the > > >>>>>>> mailing list when integrating the receive buffer > > >>>>>>> accounting patches. > > >>>>>>> Currently, with small packets, the advertised eceive window and > > >>>>>>> receive buffer can go totally out of sync if the > > >>>>>>> receiver app cannot keep up with the incoming packets. We only > > >>>>>>> account > > >>>>>>> for the actual payload in the receive > > >>>>>>> window, whereas we include the overhead(could be upto 2K bytes > > >>>>>>> for > > >>>>>>> each packet) in the receive buffer > > >>>>>>> calculations. > > >>>>>>> > > >>>>>>> Could you try increasing the default receive buffer limits and > > >>>>>>> see > > >>>>>>> if > > >>>>>>> the problem goes away? > > >>>>>>> echo 500000 > /proc/sys/net/core/rmem_max > > >>>>>>> echo 500000 > /proc/sys/net/core/rmem_default > > >>>>>>> > > >>>>>>> Thanks > > >>>>>>> Sridhar > > >>>>>>> > > >>>>>>> Michael Tuexen wrote: > > >>>>>>>> Hi Sridhar, > > >>>>>>>> > > >>>>>>>> see my comments in-line. > > >>>>>>>> > > >>>>>>>> Best regards > > >>>>>>>> Michael > > >>>>>>>> > > >>>>>>>> On Jan 25, 2006, at 8:07 Uhr, Sridhar Samudrala wrote: > > >>>>>>>> > > >>>>>>>>> Michael Tuexen wrote: > > >>>>>>>>>> Hi Sridhar, > > >>>>>>>>>> > > >>>>>>>>>> we are currently doing some performance testing of SCTP kernel > > >>>>>>>>>> implementations > > >>>>>>>>>> on Solaris, LKSCTP and *BSD. > > >>>>>>>>>> > > >>>>>>>>>> We found a problem when a sender is sending a lot of 10 Byte > > >>>>>>>>>> messages over > > >>>>>>>>>> a GB interface and also about a limited (1 Mbit) link. > > >>>>>>>>>> > > >>>>>>>>>> It seems that the sending side is OK, but the receiving side > > >>>>>>>>>> stops > > >>>>>>>>>> at one > > >>>>>>>>>> point accepting chunks. Sometimes these chunks are in the > > >>>>>>>>>> middle > > >>>>>>>>>> of > > >>>>>>>>>> a packet. > > >>>>>>>>>> > > >>>>>>>>>> I'm attaching to trace files. What kind of information should > > >>>>>>>>>> I > > >>>>>>>>>> provide in > > >>>>>>>>>> addition, such that you can find/fix the bug? > > >>>>>>>>> Michael, > > >>>>>>>>> > > >>>>>>>>> I took a brief look at the traces. It looks like there is > > >>>>>>>>> packet > > >>>>>>>>> loss. Is this expected and part of the test? > > >>>>>>>> I'm attaching also a tracefile, where two computers are > > >>>>>>>> connected > > >>>>>>>> via > > >>>>>>>> a GBit ethernet interface. > > >>>>>>>> client and server run on different systems. The server just > > >>>>>>>> discards > > >>>>>>>> the packets, the client > > >>>>>>>> sends a number of chunks. > > >>>>>>>> > > >>>>>>>> After some time (it depends) the transfer stalls you can see > > >>>>>>>> this > > >>>>>>>> is > > >>>>>>>> the trace. I killed > > >>>>>>>> then the client. > > >>>>>>>>> > > >>>>>>>>> Are you running lksctp as both the sender and the receiver? Is > > >>>>>>>>> it > > >>>>>>>>> possible to provide a test program > > >>>>>>>>> that demonstrates this problem? > > >>>>>>>> Both sides run LKSCTP, the sender just sends a lot of 10 byte > > >>>>>>>> DATA > > >>>>>>>> chunks. server.c and client.c > > >>>>>>>> are attached. > > >>>>>>>>> > > >>>>>>>>> The other information that would be useful is the linux kernel > > >>>>>>>>> version and the lksctp-tools version? > > >>>>>>>> Both systems are configured identically. > > >>>>>>>> [ruengeler@fc4-2 ~]$ uname -a > > >>>>>>>> Linux fc4-2.testbed 2.6.15-1.1823_FC4 #1 Fri Jan 6 17:56:32 EST > > >>>>>>>> 2006 > > >>>>>>>> i686 i686 i386 GNU/Linux > > >>>>>>>> The LKSCTP tools are 1.0.5, but this should not matter. > > >>>>>>>>> > > >>>>>>>>> Also, i will be on vacation most of the next month(Feb). So it > > >>>>>>>>> is a > > >>>>>>>>> good idea to post additional details > > >>>>>>>>> to the lksctp-developers mailing list and other community > > >>>>>>>>> members > > >>>>>>>>> who are interested may also be able > > >>>>>>>>> to respond and look into the issues. > > >>>>>>>> I CCed the list and appreciate any help. > > >>>>>>>>> > > >>>>>>>>> Thanks > > >>>>>>>>> Sridhar > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> ------------------------------------------------------- > > >>>>>> This SF.net email is sponsored by: Splunk Inc. Do you grep through > > >>>>>> log files > > >>>>>> for problems? Stop! Download the new AJAX search engine that > > >>>>>> makes > > >>>>>> searching your log files as easy as surfing the web. DOWNLOAD > > >>>>>> SPLUNK! > > >>>>>> http://sel.as-us.falkag.net/sel? > > >>>>>> cmd=lnk&kid=103432&bid=230486&dat=121642 > > >>>>>> _______________________________________________ > > >>>>>> Lksctp-developers mailing list > > >>>>>> Lks...@li... > > >>>>>> https://lists.sourceforge.net/lists/listinfo/lksctp-developers > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >> > > > <fast_rxt.patch> > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > > for problems? Stop! Download the new AJAX search engine that makes > > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > > _______________________________________________ > > Lksctp-developers mailing list > > Lks...@li... > > https://lists.sourceforge.net/lists/listinfo/lksctp-developers > > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 > _______________________________________________ > Lksctp-developers mailing list > Lks...@li... > https://lists.sourceforge.net/lists/listinfo/lksctp-developers -- /*************************************************** *Neil Horman *Software Engineer *gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu ***************************************************/ |