From: Vlad Y. <vla...@hp...> - 2008-12-23 16:35:55
|
Wei Yongjun wrote: > Hi, Vlad Yasevich: > > Can you send me your patch? I have a similar problem like this, I want > to check whether your patch > can fixed my problem. > My problem never happend before, but I do not know this problem is come > from after which patch. > Wei Here are 3 patches that resolve all of the stalling association issues for me. I have them queued to go out as soon as all my tests pass. -vlad > Regrads. > > > Vlad Yasevich wrote: >> Steven Brown wrote: >> >>> Steven Brown wrote: >>> >>>> Vlad Yasevich wrote: >>>> >>>>> Also, if your setup is still up, can you run this test single-homed >>>>> and see if this situation still occurs? >>>>> >>>> Same stalls occur, although it was a bit resistant to it occurring >>>> (probably due to less network badness due to being directly >>>> connected to each other). >>>> >>> On looking into this further, although the wedging in multihoming and >>> singlehoming looked similar, the singlehoming had a different cause. >>> The stack had started ignoring ICMP MTU updates causing it to get >>> stuck in retransmit with a too-large packet. Disabling the SCTP >>> conntrack code seemed to fix it. >>> >>> The line it was tested on had a 1500 byte MTU on the local interface, >>> but the PMTU was being restricted to 1492 on the next hop. The test >>> would then cause a MTU update shortly after starting, and SCTP would >>> process it properly and continue. However, as the cached MTU would >>> time out and SCTP would start trying a larger MTU and cause MTU >>> updates again, it would ignore all of those updates until the >>> connection would eventually die from not being able to get anything >>> through. >>> >>> On instrumenting the kernel to figure out where they were going, the >>> ones that would get lost would wind up never seeming to make it to >>> SCTP's ICMP error handler. Unlike the first ICMP MTU updates that >>> made it, the ones that wouldn't would never made it to ip_rcv_finish, >>> instead being eaten in ip_input.c here (2.6.28-rc3): >>> >>> return NF_HOOK(PF_INET, NF_INET_PRE_ROUTING, skb, dev, NULL, >>> ip_rcv_finish); >>> >>> I'm not all that familiar with how netfilter and conntrack interact, >>> but disabling SCTP's conntrack got around the problem, so I assume >>> it's a bug in SCTP's conntrack. >>> >>> >> >> While working on a different bug, I think found the solution to this >> problem. >> >> Can you apply the attached patch and see if you still see the stall. >> >> The patch was generated against 2.6.28-rc series, but should apply to >> 2.6.27 as >> well. >> >> Thanks >> -vlad >> > |