From: Vlad Y. <vla...@hp...> - 2006-04-28 13:17:19
|
On Thu, 2006-04-27 at 17:05 -0700, Sridhar Samudrala wrote: > On Thu, 2006-04-27 at 14:44 -0400, Vlad Yasevich wrote: > > There is a rare situation that causes lksctp to go into infinite > > recursion and crash the system. The trigger is a packet that > > contains at least the first two DATA fragments of a message bundled > > together. The recursion is triggered when the user data buffer is > > smaller that the full data message. The problem is that we clone > > the skb for every fragment in the message. When reassembling the full > > message, we try to link skbs from the "first fragment" clone using > > the frag_list. However, since the frag_list is shared between two > > clones in this rare situation, we end up setting the frag_list pointer > > of the second fragment to point to itself. This causes sctp_skb_pull() > > to potentially recurs indefinitely. > > > > Proposed solution is to make a copy of the skb when attempting to link > > things using frag_list. > > But it is not clear to me why you are checking for cloned skb only if > f_frag has no frag_list. Don't we need this even if f_frag has a > frag_list(i.e last is not NULL)? If the skb already has frags then it was fragmented at IP layer. This means that there shouldn't be multiple DATA chunks bundled. If there are, we have other much bigger problems (like not processing the bundled DATA chunk). This is another problem, and actually causes system crashes when I set MTU low enough. The INIT-ACK or COOKIE-ECHO could end up fragmented and the receiver dies because it always does skb_pull()s. Easy fix is to call skb_linearize(), but if someone is using jumbo-grams, that will be very slow. I think we need to convert to pskb_pull() in the control chunk processing, and may be for DATA chunk header as well. > > Also why can't we use skb_unshare()? skb_unshare() does kfree_skb() if it successfully unshared. However, f_frag is still linked on the reassembly queue. I was being careful here and make sure that we don't corrupt the queue in any way. I didn't want to unlink the f_frag, only to have unshare fail. I also didn't want unshare to free the f_frag before we had a chance to unlink it properly. So, I had to do skb_copy myself, and free the f_frag only once we have successfully unlinked it and had a copy made. Thanks -vlad |