This issue is seen on 46FC Tag changeset, this may also be relevant to all the older versions of OpenSAF(not verified)
Spec says on Page 67:
The operation fails if an administrative operation is currently in progress on one of the targeted objects. An administrative operation is considered to be in progress on an object if the SaImmOiAdminOperationCallbackT_2 Object Implementer's callback has been invoked for that operation and the Object Implementer is still registered but has not yet called saImmOiAdminOperationResult() to provide the operation results.
To simulate the above case, invoked AdminOperationAsync on an object in the test application. After AdminOperationCallback is invoked, without responding with AdminOperationResult from the object OI, invoked adminOwnerRelease from OM and the API succeeded.
According to the spec ERR_BUSY needs to be given as response to AdminOwnerRelease operation. The same is applicable for AdminOwnerClear() API.
IMMND trace on that node:
Mar 24 11:02:13.611054 osafimmnd [4131:ImmModel.cc:10998] >> adminOperationInvoke
Mar 24 11:02:13.611072 osafimmnd [4131:ImmModel.cc:11005] T5 Admin op on objectName:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxattrName_testAdminOwnerRelease_Failures_1012
Mar 24 11:02:13.611111 osafimmnd [4131:ImmModel.cc:11114] T5 IMPLEMENTER FOR ADMIN OPERATION INVOKE 19 conn:55 node:2030f name:implementer_testAdminOwnerRelease_Failures_101
Mar 24 11:02:13.611139 osafimmnd [4131:ImmModel.cc:11122] T5 Updating req invocation inv:34359738367 conn:54 timeout:0
Mar 24 11:02:13.611163 osafimmnd [4131:ImmModel.cc:11129] TR Located pre request continuation 34359738367 adjusting timeout to 0
Mar 24 11:02:13.611182 osafimmnd [4131:ImmModel.cc:11157] T5 Storing impl invocation 55 for inv: 34359738367
Mar 24 11:02:13.611215 osafimmnd [4131:ImmModel.cc:11226] << adminOperationInvoke
Mar 24 11:02:13.611252 osafimmnd [4131:immnd_evt.c:4984] T2 IMMND sending Agent upcall
Mar 24 11:02:13.613901 osafimmnd [4131:immnd_evt.c:4990] T2 IMMND UPCALL TO AGENT SEND SUCCEEDED
Mar 24 11:02:13.614270 osafimmnd [4131:immnd_evt.c:5128] T2 Delayed reply, wait for reply from implementer
Mar 24 11:02:13.614547 osafimmnd [4131:immnd_evt.c:5132] << immnd_evt_proc_admop
Mar 24 11:02:13.614873 osafimmnd [4131:immnd_evt.c:8658] >> dequeue_outgoing
Mar 24 11:02:13.615112 osafimmnd [4131:immnd_evt.c:8664] TR Pending replies:0 space:16 out list?:(nil)
Mar 24 11:02:13.615396 osafimmnd [4131:immnd_evt.c:8693] << dequeue_outgoing
Mar 24 11:02:13.615829 osafimmnd [4131:immnd_evt.c:8777] << immnd_evt_proc_fevs_rcv
Mar 24 11:02:14.496009 osafimmnd [4131:ImmModel.cc:12450] T5 Did not timeout now - start < 0(1)
Mar 24 11:02:14.609660 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
Mar 24 11:02:14.609724 osafimmnd [4131:immnd_evt.c:2837] T2 sender_count: 1 size: 268
Mar 24 11:02:14.609761 osafimmnd [4131:immnd_evt.c:3118] >> immnd_fevs_local_checks
Mar 24 11:02:14.609808 osafimmnd [4131:immnd_evt.c:3575] << immnd_fevs_local_checks
Mar 24 11:02:14.609838 osafimmnd [4131:immnd_evt.c:3036] T2 SENDING FEVS TO IMMD
Mar 24 11:02:14.609863 osafimmnd [4131:immsv_evt.c:5481] T8 Sending: IMMD_EVT_ND2D_FEVS_REQ to 0
Mar 24 11:02:14.616600 osafimmnd [4131:immnd_evt.c:8716] >> immnd_evt_proc_fevs_rcv
Mar 24 11:02:14.616745 osafimmnd [4131:immnd_evt.c:8732] T2 FEVS from myself, still pending:0
Mar 24 11:02:14.616815 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_ADMO_RELEASE (10) from 0
Mar 24 11:02:14.616860 osafimmnd [4131:ImmModel.cc:4549] >> adminOwnerChange
Mar 24 11:02:14.616893 osafimmnd [4131:ImmModel.cc:4576] T5 Release admin owner 'exowner'
Mar 24 11:02:14.634875 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount
Mar 24 11:02:14.635431 osafimmnd [4131:ImmModel.cc:4589] T5 Release Admin Owner for object xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxattrName_testAdminOwnerRelease_Failures_1012
Mar 24 11:02:14.641743 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount
Mar 24 11:02:14.642150 osafimmnd [4131:ImmModel.cc:4694] << adminOwnerChange
This is relevant for all older versions.
Behavior has not changed and no one has noticed or cared about this issue before.
The quotes from the SAF spec are correct and I would say they make some
sense for saImmOmAdminOwnerRelease.
However, I am not sure it makes sense for saImmOmAdminOwnerClear().
That operations is an "emergency override" operation that should be
extremely rare in its use. It is needed for doing a forced remove of
admin-owner from objects where the client that set admin-owner is
either dead (reelase-on-finalize set to false) or hung.
So I would argue that we keep current behavior for saImmOmAdminOwnerClear
and make a note of it in the spec deviations of the OpenSAF_IMMSV_PR.
I have analyzed the implications of this reported defect both on the
possible current negative effects of not having this defect fixed and
on the implementation aspects of fixing it.
1) Possible negative effects of not having this defect fixed:
For real users - no effect that I can think of.
For test caseses that test this particular case, which is according to
the SAF spec - they fail.
2) Implementation aspects of fixing this.
Current implementation:
The request is sent over fevs to all processors, this is only to
ensure that admin-operation-requests arrive and are processed fevs
syncronously.
The admin-ownership is checked to match at admin-operation invoke.
If it matches at that (fevs) time then the admin operation proceeds
with a callback towards the OI at only the processor where the OI resides.
Continuation records are created at the requesting processor (for the
reply to the request) and at the OI processor (for the OI callback reply).
At other processors reciveing the admin-op request has no effect.
No access is done after that to the admin-owner data as part of
processing the admin-operation since the admin-owner mechanism is only
an access control mechanism and access has now been verified.
The reply from OI is matched against the callback-reply continuation and
the reply forwarded *directly i.e. not over fevs to the requesting
processor.The reply arrives at the requestiong processor (which could
be identical to the OI residence processor) and the reply forwarded
back to the om-client.
The current datstructures and message protocol makes it impossible to
fix this defect. To fix this defect requires a relatively large change
to the implentation of the admin-op mechanism. Both datastructures and
message protocol would need to be changed. That new implementation
will have a poorer performance, both in terms of reduced response time
(reply needs to go over fevs) and memory (either new member in ObjectInfo
increasing the memory cost for all objects, or a new continuation record
stored at all nodes).
My conclusion from this analysis is (a) that this defect is minor
since it has no known impact on real usage; and (b) the cost of
implementing this is too high (both in added complexity and in reduced
performance) for us to do it without any sensible real use-case.
But the documentation should be updated to reflect the discrepancy
relative to the SAF spec. This discrepancy has not been noticed or
noticed but ignored during the several years of OpenSAFs existsnce.
This in itself illustrates that the reported problem (as far as we know)
is academic i.e. a case of overspecification by SAF.