From: Jon M. <jm...@re...> - 2020-10-09 12:48:17
|
On 10/9/20 12:12 AM, Hoang Huu Le wrote: > Hi Jon, Jakub, > > I tried with your comment. But looks like we got into circular locking and deadlock could happen like this: > CPU0 CPU1 > ---- ---- > lock(&n->lock#2); > lock(&tn->nametbl_lock); > lock(&n->lock#2); > lock(&tn->nametbl_lock); > > *** DEADLOCK *** > > Regards, > Hoang Ok. So although your solution is not optimal, we know it is safe. Again: Acked-by: Jon Maloy <jm...@re...> >> -----Original Message----- >> From: Jon Maloy <jm...@re...> >> Sent: Friday, October 9, 2020 1:01 AM >> To: Jakub Kicinski <ku...@ke...>; Hoang Huu Le <hoa...@de...> >> Cc: ma...@do...; yin...@wi...; tip...@li...; ne...@vg... >> Subject: Re: [net] tipc: fix NULL pointer dereference in tipc_named_rcv >> >> >> >> On 10/8/20 1:25 PM, Jakub Kicinski wrote: >>> On Thu, 8 Oct 2020 14:31:56 +0700 Hoang Huu Le wrote: >>>> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c >>>> index 2f9c148f17e2..fe4edce459ad 100644 >>>> --- a/net/tipc/name_distr.c >>>> +++ b/net/tipc/name_distr.c >>>> @@ -327,8 +327,13 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, >>>> struct tipc_msg *hdr; >>>> u16 seqno; >>>> >>>> + spin_lock_bh(&namedq->lock); >>>> skb_queue_walk_safe(namedq, skb, tmp) { >>>> - skb_linearize(skb); >>>> + if (unlikely(skb_linearize(skb))) { >>>> + __skb_unlink(skb, namedq); >>>> + kfree_skb(skb); >>>> + continue; >>>> + } >>>> hdr = buf_msg(skb); >>>> seqno = msg_named_seqno(hdr); >>>> if (msg_is_last_bulk(hdr)) { >>>> @@ -338,12 +343,14 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, >>>> >>>> if (msg_is_bulk(hdr) || msg_is_legacy(hdr)) { >>>> __skb_unlink(skb, namedq); >>>> + spin_unlock_bh(&namedq->lock); >>>> return skb; >>>> } >>>> >>>> if (*open && (*rcv_nxt == seqno)) { >>>> (*rcv_nxt)++; >>>> __skb_unlink(skb, namedq); >>>> + spin_unlock_bh(&namedq->lock); >>>> return skb; >>>> } >>>> >>>> @@ -353,6 +360,7 @@ static struct sk_buff *tipc_named_dequeue(struct sk_buff_head *namedq, >>>> continue; >>>> } >>>> } >>>> + spin_unlock_bh(&namedq->lock); >>>> return NULL; >>>> } >>>> >>>> diff --git a/net/tipc/node.c b/net/tipc/node.c >>>> index cf4b239fc569..d269ebe382e1 100644 >>>> --- a/net/tipc/node.c >>>> +++ b/net/tipc/node.c >>>> @@ -1496,7 +1496,7 @@ static void node_lost_contact(struct tipc_node *n, >>>> >>>> /* Clean up broadcast state */ >>>> tipc_bcast_remove_peer(n->net, n->bc_entry.link); >>>> - __skb_queue_purge(&n->bc_entry.namedq); >>>> + skb_queue_purge(&n->bc_entry.namedq); >>> Patch looks fine, but I'm not sure why not hold >>> spin_unlock_bh(&tn->nametbl_lock) here instead? >>> >>> Seems like node_lost_contact() should be relatively rare, >>> so adding another lock to tipc_named_dequeue() is not the >>> right trade off. >> Actually, I agree with previous speaker here. We already have the >> nametbl_lock when tipc_named_dequeue() is called, and the same lock is >> accessible from no.c where node_lost_contact() is executed. The patch >> and the code becomes simpler. >> I suggest you post a v2 of this one. >> >> ///jon >> >>>> /* Abort any ongoing link failover */ >>>> for (i = 0; i < MAX_BEARERS; i++) { |