|
From: praveen m. <pra...@or...> - 2015-10-14 09:34:57
|
Hi Lennart,
I have published v2 of the patch with following changes:
1)Added a cautionary Note in the header of saNtfFinalize().
Commit log has enough details, I think, about the problem and solution
and it will get committed with the fix, so a separate README is not needed.
2)Added two test cases for saNtfFinalize().
Thanks,
Praveen
On 13-Oct-15 2:13 PM, Lennart Lund wrote:
> Comments:
> 1.
> The explanation indicates that the problem is not very easy to understand. In order to save future maintainers some work the explanation below can be added to for example the README file.
> In the saNtfFinalize() you can add a comment saying that the mutex must not be locked and a reference to the README file can be given.
>
> 2.
> Add test-case(s) in the UML test suite. Maybe the test code attached to the ticket can be used.
>
> Thanks
> Lennart
>
>> -----Original Message-----
>> From: pra...@or... [mailto:pra...@or...]
>> Sent: den 12 oktober 2015 13:23
>> To: Lennart Lund; Minh Chau H
>> Cc: ope...@li...
>> Subject: [PATCH 1 of 1] ntfa: fix deadlock situation due to saNtfFinalize
>> [#1521]
>>
>> osaf/libs/agents/saf/ntfa/ntfa_api.c | 4 ++--
>> 1 files changed, 2 insertions(+), 2 deletions(-)
>>
>>
>> Generally any API called after finalize returns with BAD_HANDLE.
>> But in a multi-threaded application, deadlock situation may arise between
>> saNtfFinalize()
>> with saNtfNotificationUnsubscribe(),
>> or with saNtfStateChangeNotificationAllocate(),
>> or with and saNtfPtrValAllocate().
>>
>> After getting successful response from the server, saNtfFinalize() takes lock
>> on ntfa_cb.cb_lock
>> and tries to free all the handles and resources. Here it gets block in
>> ntfa_hdl_rec_del()
>> ---->ncshm_destroy_hdl() (actually in in ncshm_destroy_hdl()---
>>> hm_block_me())) as client_handle
>> is in use by other threads and waits for it to get released. At the same time,
>> one of the above three APIS may be waiting for taking the lock on
>> ntfa_cb.cb_lock which is
>> not avialable as it is taken by saNtfFinalize().
>>
>> From the solution perspective, the LEAP API where santfFinalize() gets
>> blocked wants other threads to
>> complete thier functionality and release the handle. In this situation, new API
>> calls
>> will return with BAD_HANDLE (LEAP functionality take_handle() fails) and
>> existing API calls which has
>> already taken the client_handle will be able to give back client_handle (LEAP
>> functionality
>> give_handle()). So if the lock is removed from Finalize(), other APIS will be
>> able to complete their
>> call and will return handle one by one which eventually will unblock
>> saNtfFinalize().
>>
>> diff --git a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> --- a/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> +++ b/osaf/libs/agents/saf/ntfa/ntfa_api.c
>> @@ -1187,7 +1187,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>> rc = SA_AIS_ERR_NO_RESOURCES;
>>
>> if (rc == SA_AIS_OK) {
>> -
>> osafassert(pthread_mutex_lock(&ntfa_cb.cb_lock) == 0);
>>
>> /** delete the hdl rec
>> ** including all resources allocated by client if MDS send is
>> ** succesful.
>> @@ -1197,7 +1196,6 @@ SaAisErrorT saNtfFinalize(SaNtfHandleT n
>> TRACE_1("ntfa_hdl_rec_del
>> failed");
>> rc = SA_AIS_ERR_BAD_HANDLE;
>> }
>> -
>> osafassert(pthread_mutex_unlock(&ntfa_cb.cb_lock) == 0);
>> }
>>
>> done_give_hdl:
>> @@ -2739,6 +2737,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>>
>> ntfHandle = ntfHandleGet(subscriptionId);
>> if (ntfHandle == 0) {
>> + TRACE_1("ntfHandleGet failed, subscription not
>> exist");
>> rc = SA_AIS_ERR_NOT_EXIST;
>> goto done;
>> }
>> @@ -2772,6 +2771,7 @@ SaAisErrorT saNtfNotificationUnsubscribe
>> /* Send a sync MDS message to obtain a log
>> stream id */
>> rc = ntfa_mds_msg_sync_send(&ntfa_cb,
>> &msg, &o_msg, timeout);
>> if (rc != NCSCC_RC_SUCCESS) {
>> +
>> TRACE_1("ntfa_mds_msg_sync_send failed with: %u",rc);
>> rc = SA_AIS_ERR_TRY_AGAIN;
>> goto done_give_hdl;
>> }
|