From: Neil H. <nh...@tu...> - 2007-02-28 01:50:27
|
On Tue, Feb 27, 2007 at 04:29:39PM -0500, Vlad Yasevich wrote: > Hi Neil > > Neil Horman wrote: > > Hey all- > > I'm currently tracking down a potential association reference count > > leak, and in my searching I ran accross something that troubled me (which may or > > may not be related to the problem I am currently diagnosing). > > > > It involves the endpoint strucutres list of associations. Specifically, > > I see two problems: > > > > 1) I see no lock protecting that list. I see in sctp_endpoint_lookup_assoc we > > do a sctp_local_bh_disable which would protect us from modifications that occur > > during local receive operations, but not from receive operations that happen on > > other cpu's > > hmm... I don't think this is strictly necessary. > > Things that add an association to the endpoint's list: > 1) sctp_assoc_migrate(). Done when the socket lock for both source and destination socket > is held. > > 2) sctp_cmd_interpreter(). Done when the socket spin-lock is held. > > Things that delete an association from the endpoint's list: > 1) sctp_association_free(). This is called froma 9 different places: > sctp_make_temp_asoc() - not an issue since temporary associations do not get linked to > the endpoint. (check if this is your issue). > > sctp_unpack_cookie() - in the fail case, we never link the association to the endpoint. > > sctp_cmd_delete_tcb() - since this is part of the command interpreter, we are holding the > socket spin lock. No-one else should have access to the endpoint. > > sctp_sf_do_5_1B_init() - failure case, never link. holding socket lock anyway. > > sctp_sf_do_5_1D_ce() - failure case, never linkg. holding socket lock. > > sctp_sf_do_unexpected_init() - same > > __sctp_connect() - we are under the the sctp_lock_sock(). > > sctp_close() - we are under sctp_lock_sock(). > > sctp_sendmsg() - we are under sctp_lock_sock(). > > > Hmm... looks like a there might be a bug in sctp_shutdown()!!! We generate traffic and change the association > contents without holding a lock!!! > Ok, thank you for the analysis. In fact, based on your comments here, it seems that sctp_shutdown should be ok too, since we take the socket lock in inet_shutdown before we call the protocols specifc shutdown handler (sctp_shutdown). About the only thing that worries me beyond that then would be readers of the list accessing the list contents while adds or deletes are happening. not sure if that is a problem, but I can look at that further. > > > > 2) I see no method for removing items from the endpoint->asocs list. I imagine > > this could be particularly problematic as associations refcounts reached zero, > > since it appears that all associations are allocated in sctp_association_new > > (which sets malloced to 1), but in sctp_association_destroy we just call kfree > > if malloced is one, instead of sctp_association_free, which appears responsible > > for deleting the association from the endpoint->assocs list. The result is bad > > entries on the asocs list. > > Actually, sctp_association_free() removes the item from the assocs list. It does: > list_del(&asoc->asocs); > > which unlinkes the assoc from the list. > > > Can you describe the scenario a bit. > Don't actually have a secnario for this problem, my questions were based only on visual inspection. About the only situation I can see it being a problem in is if sctp_association_put gets called and decrements the refcount to 0 from a location other than sctp_association_free, since that would result in the aformentioned unlinking never happening. not sure if thats possible though. Shifting gears for a moment, I do have a (still hypothetical) scenario that results in an association getting deleted while still having frames attahced to it. In the event that sctp_association_put decrements a refcount to zero, there is a critical section between the time the refcount hits zero and the time we call idr_remove from sctp_association_destroy, in which the association can still be found. That would suggest to me that if a frame arrived for an association id during that window, that sctp_rcv could could lookup the association and enqueue frames to it while we were in the process of deleting it, which would potentially result in failing the rmem_alloc assertion in sctp_association_destroy (the symptom of the bug). I'm trying to prevent the problem by holding the association id list lock while I decrement the reference count and releasing it after the destroy is done. Not sure if that is a valid solution though, given that, again based on your description above, it seem as though the last refcount decrement should always occur from the association_put at the end of sctp_association_free I'll let you all know if I find anything else Thanks & Regards Neil > Later > -vlad |