From: Ying X. <yin...@wi...> - 2017-02-27 11:46:39
|
Hi Partha, it's very luck that you have a chance to review the series before they are submitted to mainline. Please take a look at my comments below. On 02/23/2017 08:10 PM, Parthasarathy Bhuvaragan wrote: > Hi Ying, > > > Thanks for the patches. I introduced these bugs and was sick for more > than 2 weeks, so you had to fix them. > > > I have few comments for the entire series. > > [tipc-discussion] [net 2/5] tipc: adjust the policy of holding > subscription kref > <https://sourceforge.net/p/tipc/mailman/message/35676908/>*<https://sourceforge.net/p/tipc/mailman/message/35676908/>* > > The above commit looks good and should fix the issue reported by John. > > > [tipc-discussion] [net 1/5] tipc: advance the time of deleting > subscription from subscriber->subscrp_list > <https://sourceforge.net/p/tipc/mailman/message/35676906/> > > In the above commit, we miss to delete sub->subscrp_list at subscription > timeout. We placed this common code in refcount cleanup to avoid code > duplication. > > > [tipc-discussion] [net 3/5] tipc: adjust policy that sub->timer holds > subscription kref > <https://sourceforge.net/p/tipc/mailman/message/35676904/>*<https://sourceforge.net/p/tipc/mailman/message/35676904/>* > > With this patch, we will never trigger a subscription cleanup at > tipc_subscrp_timeout() as the refcount will be one when we > exittipc_subscrp_timeout(). When I made the changes, in fact I was aware of the case, but I supposed that user should explicitly cancel subscription with TIPC_SUB_CANCEL to destroy subscription instance. However, I was unaware that new subscription events still happen e even if they are expired. I understood your proposal. But as you said, the approach looks very odd for us. However, I have another idea to a bit elegantly fix the issue while we don't need to make a big change for the whole series. Please wait for v2. Thanks, Ying > > Thus the subscription is still present in nametable and hence the > subscriber will receive new events for this subscription after receiving > the subscription timeout until the subscriber terminates. > > We need to update our design policy that at subscription timeout, we > remove all references to the subscription and it is no longer > accessible. This conflicts with the change proposed by the above commit. > > > If we still need to include this commit, we need to adapt > tipc_subscrp_timeout(), as: > > 1. tipc_nametbl_unsubscribe() > 2. tipc_subscrp_put() > 3. tipc_subscrp_put() > > static void tipc_subscrp_delete(struct tipc_subscription *sub) > { > u32 timeout = htohl(sub->evt.s.timeout, sub->swap); > > - if (timeout == TIPC_WAIT_FOREVER || del_timer(&sub->timer)) > + if (timeout != TIPC_WAIT_FOREVER && del_timer(&sub->timer)) { > tipc_subscrp_put(sub); > + tipc_subscrp_put(sub); > } else { > + tipc_subscrp_put(sub); > } > } > > But this makes the functions look odd as we adjust refcount twice in the > same function. To get the subscription timeout cleanup working > correctly, we need to sqash this commit together :*[tipc-discussion] > [net 4/5] tipc: advance the time of calling tipc_nametbl_unsubscribe > <https://sourceforge.net/p/tipc/mailman/message/35676905/>* > > *[tipc-discussion] [net 5/5] tipc: remove unnecessary increasement of > subscription refcount > <https://sourceforge.net/p/tipc/mailman/message/35676917/>* > > In this commit message, you need to edit the stacktraces as the scenario > you describe will not happen due to your previous patches. As we never > perform a tipc_subscrp_get() in tipc_subscrp_report_overlap(). > > > /Partha > ------------------------------------------------------------------------ > *From:* John Thompson <tho...@gm...> > *Sent:* Tuesday, February 21, 2017 10:15 PM > *To:* Ying Xue > *Cc:* Jon Maloy; Parthasarathy Bhuvaragan; > tip...@li... > *Subject:* Re: [net 5/5] tipc: remove unnecessary increasement of > subscription refcount > > Sorry, I was mistaken. You are right that you have removed the > sub->lock from > tipc_subscrp_kref_release() and so there is no chance of a deadlock. > > JT > > On Tue, Feb 21, 2017 at 11:22 PM, Ying Xue <yin...@wi... > <mailto:yin...@wi...>> wrote: > > On 02/21/2017 06:13 AM, John Thompson wrote: > > > > > > On Tue, Feb 21, 2017 at 12:39 AM, Ying Xue <yin...@wi... <mailto:yin...@wi...> > > <mailto:yin...@wi... <mailto:yin...@wi...>>> > wrote: > > > > As the policy of holding subscription in subscriber time has been > > adjusted, it's unnecessary to hold subscription refcount before > > tipc_subscrp_delete() is called. As a consequence, > subscriber->lock > > can be safely removed to avoid the following two deadlock > scenarios: > > > > CPU1: CPU2: > > ---------- ---------------- > > tipc_nametbl_publish > > spin_lock_bh(&tn->nametbl_lock) > > tipc_nametbl_insert_publ > > tipc_nameseq_insert_publ > > tipc_subscrp_report_overlap > > tipc_subscrp_get > > tipc_subscrp_send_event > > tipc_close_conn > > tipc_subscrb_release_cb > > tipc_subscrb_delete > > tipc_subscrp_put > > tipc_subscrp_put > > tipc_subscrp_kref_release > > tipc_nametbl_unsubscribe > > spin_lock_bh(&tn->nametbl_lock) > > <<grab nametbl_lock again>> > > > > CPU1: CPU2: > > ---------- ---------------- > > tipc_nametbl_stop > > spin_lock_bh(&tn->nametbl_lock) > > tipc_purge_publications > > tipc_nameseq_remove_publ > > tipc_subscrp_report_overlap > > tipc_subscrp_get > > tipc_subscrp_send_event > > tipc_close_conn > > tipc_subscrb_release_cb > > tipc_subscrb_delete > > tipc_subscrp_put > > tipc_subscrp_put > > tipc_subscrp_kref_release > > tipc_nametbl_unsubscribe > > spin_lock_bh(&tn->nametbl_lock) > > <<grab nametbl_lock again>> > > > > Reported-by: John Thompson <tho...@gm... > <mailto:tho...@gm...> > > <mailto:tho...@gm... <mailto:tho...@gm...>>> > > Reported-by: Jon Maloy <jon...@er... > <mailto:jon...@er...> > > <mailto:jon...@er... <mailto:jon...@er...>>> > > Fixes: d094c4d5f5 ("tipc: add subscription refcount to avoid invalid > > delete") > > Signed-off-by: Ying Xue <yin...@wi... <mailto:yin...@wi...> > > <mailto:yin...@wi... <mailto:yin...@wi...>>> > > --- > > net/tipc/subscr.c | 6 ------ > > 1 file changed, 6 deletions(-) > > > > diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c > > index 6624232..ea67cce 100644 > > --- a/net/tipc/subscr.c > > +++ b/net/tipc/subscr.c > > @@ -168,9 +168,7 @@ static void tipc_subscrp_kref_release(struct > > kref *kref) > > struct tipc_net *tn = net_generic(sub->net, tipc_net_id); > > struct tipc_subscriber *subscriber = sub->subscriber; > > > > - spin_lock_bh(&subscriber->lock); > > atomic_dec(&tn->subscription_count); > > - spin_unlock_bh(&subscriber->lock); > > kfree(sub); > > tipc_subscrb_put(subscriber); > > } > > @@ -202,11 +200,7 @@ static void > tipc_subscrb_subscrp_delete(struct > > tipc_subscriber *subscriber, > > list_del(&sub->subscrp_list); > > > > tipc_nametbl_unsubscribe(sub); > > - tipc_subscrp_get(sub); > > - spin_unlock_bh(&subscriber->lock); > > > > > > By removing the unlocking at this point couldn't you end up with a > > deadlock on subscriber->lock > > due to tipc_subscrp_delete() putting the subscrp twice (which > could end > > up with kref == 0) and > > as a result calling tipc_subscrp_kref_release() which gets > subscriber->lock? > > > > I think there is no deadlock risk even if tipc_subscrp_put() will be > called twice in tipc_subscrp_delete() because > tipc_subscrp_kref_release() doesn't hold subscriber->lock after the > change. > > Please take a look at tipc_subscrp_kref_release() code: > static void tipc_subscrp_kref_release(struct kref *kref) > { > struct tipc_subscription *sub = container_of(kref, > struct > tipc_subscription, > kref); > struct tipc_net *tn = net_generic(sub->net, tipc_net_id); > struct tipc_subscriber *subscriber = sub->subscriber; > > atomic_dec(&tn->subscription_count); > kfree(sub); > tipc_subscrb_put(subscriber); > } > > > > > > > tipc_subscrp_delete(sub); > > - tipc_subscrp_put(sub); > > - spin_lock_bh(&subscriber->lock); > > > > if (s) > > break; > > -- > > 2.7.4 > > > > > > |