From: Michael T. <Mic...@lu...> - 2006-01-30 21:23:53
|
Hi Vlad, see my comments in-line. Best regards Michael On Jan 30, 2006, at 20:36 Uhr, Vlad Yasevich wrote: > On Mon, 2006-01-30 at 18:33 +0100, Michael Tuexen wrote: >> Hi Vlad, >> >> we compiled a 2.6.15 kernel and applied your patch. It seems to do >> what >> it should do, the throughput is better. But we still have problems >> that >> a simple bulk transfer does not work... I'll see, if I can provide >> more >> details. > > If you used the original patch (not the one I sent this morning), then > it's understandable. The original patch had a bug where it honored the > congestion window during fast retransmit, but now on new data or t3 > timeout. That would wreck havoc with your setup. Yes, initially I tested the original patch... Now I'm testing your second one. This looks better. > > Try the updated patch from today and let us know what happens. I'm doing it now. > >> I'm wondering if there are fixes available for the receiver window >> problem >> Sridhar mentioned and provided a work around by setting the receiver >> window to a large value. If not, is there a plan when they are >> available? > > This one is a bit tough. There are some patches floating around, but > they don't directly address this problem. > > I know Neil was looking at this before. If I find time, I might be > able to take a wack at it. > > In the mean time, slow receivers and small packets will cause > problems. OK. In our measurements it looks like small is <= 240 bytes... This still covers SIGTRAN packets. So I think this is a serious problem. Hopefully it gets fixed soon. Thanks for the explanation. I'll redo the measurements for small packets when a patch is available. But if I understand the issue correctly, this is only affecting the receiver. So a sender should be OK, for example using a non-Linux receiver. Is this correct? Or is there also a known issue with sending small packets? > > -vlad > >> >> Best regards >> Michael >> >> On Jan 27, 2006, at 23:35 Uhr, Vlad Yasevich wrote: >> >>> Michael >>> >>> OK, here is what lksctp stack is doing. >>> >>> Every time a tsn falls into the gap, we count a strike against it. >>> When the strike count reaches 4 (4 SACKs with a gap not acknowledging >>> the chunk), we fast retransmit. >>> >>> After fast retransmission, we reset the strike count and >>> fast_retransmit >>> allowance on the chunk. So, if we get 4 more SACKs listing the TSN >>> as >>> missing, we'll fast retransmit again. >>> >>> Here is a patch that will not do that. >>> >>> With this patch, once the chunk has been fast retransmitted, it will >>> not be fast retransmitted ever again. The only retransmits of this >>> chunk will be timeouts. >>> >>> Is this what you are looking for? >>> >>> -vlad >>> >>> >>>> On Jan 27, 2006, at 21:20 Uhr, Vlad Yasevich wrote: >>>> >>>>> On Fri, 2006-01-27 at 20:11 +0100, Michael Tuexen wrote: >>>>>> Hi Vlad, >>>>>> >>>>>> see my comments in-line. >>>>>> >>>>>> Best regards >>>>>> Michael >>>>>> >>>>>> On Jan 27, 2006, at 16:01 Uhr, Vlad Yasevich wrote: >>>>>> >>>>>>> Michael >>>>>>> >>>>>>> On Thu, 2006-01-26 at 17:07 +0100, Michael Tuexen wrote: >>>>>>>> Hi Sridhar, >>>>>>>> >>>>>>>> yes you are right, that is a workaround. >>>>>>>> >>>>>>>> We found another problem. When is LKSCTP sending fast >>>>>>>> retransmissions? >>>>>>> >>>>>>> Looks like we go into fast retransmit mode when a chunk is >>>>>>> reported >>>>>>> missing 4 times (we haven't changed it to 3 yet as the new IG >>>>>>> says). >>>>>> I do not care about the 3 or 4 issue. What I see is that the data >>>>>> sender >>>>>> receives a lot of SACK with gap reports. As a consequence several >>>>>> TSNS >>>>>> are fast retransmitted several times (within a very short time). >>>>>> As >>>>>> a >>>>>> consequence >>>>>> the bandwidth of link is used by all these duplicate FR. >>>>>> >>>>>> The IG says that a TSN can only be FR once. Is this implemented in >>>>>> LKSCTP? Will >>>>>> someone fix this? >>>>> >>>>> Michael >>>>> >>>>> I think I see the problem, but was wondering if you can provide a >>>>> packet >>>>> capture so that I can make sure that I am looking at the right >>>>> thing. >>>>> >>>>> Thanks >>>>> -vlad >>>>> >>>>>> >>>>>> As a result LKSCTP can not fill a 1MBit link beween to hosts... >>>>>> >>>>>> Best regards >>>>>> Michael >>>>>> >>>>>>> >>>>>>> What is the problem you are seeing? >>>>>>> >>>>>>> -vlad >>>>>>> >>>>>>>> >>>>>>>> Best regards >>>>>>>> Michael >>>>>>>> >>>>>>>> On Jan 26, 2006, at 6:40 Uhr, Sridhar Samudrala wrote: >>>>>>>> >>>>>>>>> Michael, >>>>>>>>> >>>>>>>>> This seems to be the same issue that we discussed recently n >>>>>>>>> the >>>>>>>>> mailing list when integrating the receive buffer >>>>>>>>> accounting patches. >>>>>>>>> Currently, with small packets, the advertised eceive window and >>>>>>>>> receive buffer can go totally out of sync if the >>>>>>>>> receiver app cannot keep up with the incoming packets. We only >>>>>>>>> account >>>>>>>>> for the actual payload in the receive >>>>>>>>> window, whereas we include the overhead(could be upto 2K bytes >>>>>>>>> for >>>>>>>>> each packet) in the receive buffer >>>>>>>>> calculations. >>>>>>>>> >>>>>>>>> Could you try increasing the default receive buffer limits and >>>>>>>>> see >>>>>>>>> if >>>>>>>>> the problem goes away? >>>>>>>>> echo 500000 > /proc/sys/net/core/rmem_max >>>>>>>>> echo 500000 > /proc/sys/net/core/rmem_default >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Sridhar >>>>>>>>> >>>>>>>>> Michael Tuexen wrote: >>>>>>>>>> Hi Sridhar, >>>>>>>>>> >>>>>>>>>> see my comments in-line. >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> Michael >>>>>>>>>> >>>>>>>>>> On Jan 25, 2006, at 8:07 Uhr, Sridhar Samudrala wrote: >>>>>>>>>> >>>>>>>>>>> Michael Tuexen wrote: >>>>>>>>>>>> Hi Sridhar, >>>>>>>>>>>> >>>>>>>>>>>> we are currently doing some performance testing of SCTP >>>>>>>>>>>> kernel >>>>>>>>>>>> implementations >>>>>>>>>>>> on Solaris, LKSCTP and *BSD. >>>>>>>>>>>> >>>>>>>>>>>> We found a problem when a sender is sending a lot of 10 Byte >>>>>>>>>>>> messages over >>>>>>>>>>>> a GB interface and also about a limited (1 Mbit) link. >>>>>>>>>>>> >>>>>>>>>>>> It seems that the sending side is OK, but the receiving side >>>>>>>>>>>> stops >>>>>>>>>>>> at one >>>>>>>>>>>> point accepting chunks. Sometimes these chunks are in the >>>>>>>>>>>> middle >>>>>>>>>>>> of >>>>>>>>>>>> a packet. >>>>>>>>>>>> >>>>>>>>>>>> I'm attaching to trace files. What kind of information >>>>>>>>>>>> should >>>>>>>>>>>> I >>>>>>>>>>>> provide in >>>>>>>>>>>> addition, such that you can find/fix the bug? >>>>>>>>>>> Michael, >>>>>>>>>>> >>>>>>>>>>> I took a brief look at the traces. It looks like there is >>>>>>>>>>> packet >>>>>>>>>>> loss. Is this expected and part of the test? >>>>>>>>>> I'm attaching also a tracefile, where two computers are >>>>>>>>>> connected >>>>>>>>>> via >>>>>>>>>> a GBit ethernet interface. >>>>>>>>>> client and server run on different systems. The server just >>>>>>>>>> discards >>>>>>>>>> the packets, the client >>>>>>>>>> sends a number of chunks. >>>>>>>>>> >>>>>>>>>> After some time (it depends) the transfer stalls you can see >>>>>>>>>> this >>>>>>>>>> is >>>>>>>>>> the trace. I killed >>>>>>>>>> then the client. >>>>>>>>>>> >>>>>>>>>>> Are you running lksctp as both the sender and the receiver? >>>>>>>>>>> Is >>>>>>>>>>> it >>>>>>>>>>> possible to provide a test program >>>>>>>>>>> that demonstrates this problem? >>>>>>>>>> Both sides run LKSCTP, the sender just sends a lot of 10 byte >>>>>>>>>> DATA >>>>>>>>>> chunks. server.c and client.c >>>>>>>>>> are attached. >>>>>>>>>>> >>>>>>>>>>> The other information that would be useful is the linux >>>>>>>>>>> kernel >>>>>>>>>>> version and the lksctp-tools version? >>>>>>>>>> Both systems are configured identically. >>>>>>>>>> [ruengeler@fc4-2 ~]$ uname -a >>>>>>>>>> Linux fc4-2.testbed 2.6.15-1.1823_FC4 #1 Fri Jan 6 17:56:32 >>>>>>>>>> EST >>>>>>>>>> 2006 >>>>>>>>>> i686 i686 i386 GNU/Linux >>>>>>>>>> The LKSCTP tools are 1.0.5, but this should not matter. >>>>>>>>>>> >>>>>>>>>>> Also, i will be on vacation most of the next month(Feb). So >>>>>>>>>>> it >>>>>>>>>>> is a >>>>>>>>>>> good idea to post additional details >>>>>>>>>>> to the lksctp-developers mailing list and other community >>>>>>>>>>> members >>>>>>>>>>> who are interested may also be able >>>>>>>>>>> to respond and look into the issues. >>>>>>>>>> I CCed the list and appreciate any help. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Sridhar >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------- >>>>>>>> This SF.net email is sponsored by: Splunk Inc. Do you grep >>>>>>>> through >>>>>>>> log files >>>>>>>> for problems? Stop! Download the new AJAX search engine that >>>>>>>> makes >>>>>>>> searching your log files as easy as surfing the web. DOWNLOAD >>>>>>>> SPLUNK! >>>>>>>> http://sel.as-us.falkag.net/sel? >>>>>>>> cmd=lnk&kid=103432&bid=230486&dat=121642 >>>>>>>> _______________________________________________ >>>>>>>> Lksctp-developers mailing list >>>>>>>> Lks...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/lksctp-developers >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> <fast_rxt.patch> >> >> >> >> ------------------------------------------------------- >> This SF.net email is sponsored by: Splunk Inc. Do you grep through >> log files >> for problems? Stop! Download the new AJAX search engine that makes >> searching your log files as easy as surfing the web. DOWNLOAD >> SPLUNK! >> http://sel.as-us.falkag.net/sel? >> cmd=lnk&kid=103432&bid=230486&dat=121642 >> _______________________________________________ >> Lksctp-developers mailing list >> Lks...@li... >> https://lists.sourceforge.net/lists/listinfo/lksctp-developers >> > > |