From: Vlad Y. <vla...@hp...> - 2009-12-10 22:04:11
|
atomar wrote: > On 12/04/2009 10:42 PM, Vlad Yasevich wrote: >> atomar wrote: >> >>> On 12/03/2009 08:30 PM, Vlad Yasevich wrote: >>> >>>> atomar wrote: >>>> >>>> >>>>> On 12/02/2009 09:26 PM, Vlad Yasevich wrote: >>>>> >>>>> >>>>>> atomar wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> In net/sctp/outqueue.c following functions: >>>>>>> >>>>>>> sctp_check_transmitted(), >>>>>>> sctp_retransmit_mark() >>>>>>> and >>>>>>> sctp_generate_fwdtsn() >>>>>>> >>>>>>> do not check for underflow for "flight_size" and " outstanding_bytes" >>>>>>> before performing: >>>>>>> >>>>>>> chunk->transport->flight_size -= sctp_data_size(chunk); >>>>>>> q->outstanding_bytes -= sctp_data_size(chunk); >>>>>>> >>>>>>> Due to some error in SCTP stack, the flight_size underflows and becomes >>>>>>> a very large positive, so does q->outstanding_bytes and due to following >>>>>>> checks: >>>>>>> 1. data in flight is larger than the current local congestion window >>>>>>> 2. data chunk to be sent is larger than peer receive window >>>>>>> >>>>>>> SCTP stack is blocked and no data can be sent out forever as any >>>>>>> subsequent outstanding acks just decrements a very large number. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Is this something that you are seeing or are you just theorizing? >>>>>> >>>>>> >>>>>> >>>>>> >>>>> After applying 25% random loss in IP packets on the network in one >>>>> direction, the impacted SCTP association became "stuck": the sending >>>>> buffer is full and no data is sent out. Even removing packet loss, the >>>>> association never recovers. >>>>> >>>>> I am reporting a probable bug. I am not very conversant with sctp and >>>>> hence cannot say what could be root cause of the problem. If we apply >>>>> checks: >>>>> >>>>> if (q->outstanding_bytes>= sctp_data_size(chunk)) >>>>> q->outstanding_bytes -= sctp_data_size(chunk); >>>>> else >>>>> q->outstanding_bytes = 0; >>>>> >>>>> if (transport->flight_size>= sctp_data_size(chunk)) >>>>> transport->flight_size -= sctp_data_size(chunk); >>>>> else >>>>> transport->flight_size = 0; >>>>> >>>>> >>>>> >>>>> then this problem is not observed, but this may not be the real solution. >>>>> >>>>> >>>> Which kernel version are you seeing this problem in? >>>> >>>> If you are really seeing flight_size underflow, then the real problem is >>>> elsewhere. Doing the check above justs masks the real bug. >>>> >>>> >>> I am using a very old kernel 2.6.10 with rt-patches, and most of sctp >>> related patches are backported to it. While I tried looking for patches >>> related to underflow guard in sctp or window miscalculation, I didn't >>> find anything related. >>> >>> >> If you've added the following commit to you tree >> (d0ce92910bc04e107b2f3f2048f07e94f570035d SCTP: Do not retransmit chunks that >> are newer then rtt), please revert it. It introduced a lot of problems and is >> actually violating the spec. >> > No, this patch was not checked-in for that kernel. > -Anuz Thanks. I was able to reproduce on an upstream kernel with some BUG_ONs. I'll investigate and try to figure out what's going on. -vlad > > ------------------------------------------------------------------------------ > Join us December 9, 2009 for the Red Hat Virtual Experience, > a free event focused on virtualization and cloud computing. > Attend in-depth sessions from your desk. Your couch. Anywhere. > http://p.sf.net/sfu/redhat-sfdev2dev > _______________________________________________ > Lksctp-developers mailing list > Lks...@li... > https://lists.sourceforge.net/lists/listinfo/lksctp-developers > |