Menu

#1278 IMM: admin owner clear/release on an object is allowed when admin operation is in progress for the object

never
wontfix
None
defect
imm
nd
1.0
minor
2015-11-02
2015-03-24
No

This issue is seen on 46FC Tag changeset, this may also be relevant to all the older versions of OpenSAF(not verified)

Spec says on Page 67:

The operation fails if an administrative operation is currently in progress on one of the targeted objects. An administrative operation is considered to be in progress on an object if the SaImmOiAdminOperationCallbackT_2 Object Implementer's callback has been invoked for that operation and the Object Implementer is still registered but has not yet called saImmOiAdminOperationResult() to provide the operation results.

To simulate the above case, invoked AdminOperationAsync on an object in the test application. After AdminOperationCallback is invoked, without responding with AdminOperationResult from the object OI, invoked adminOwnerRelease from OM and the API succeeded.

According to the spec ERR_BUSY needs to be given as response to AdminOwnerRelease operation. The same is applicable for AdminOwnerClear() API.

IMMND trace on that node:

Mar 24 11:02:13.611054 osafimmnd [4131:ImmModel.cc:10998] >> adminOperationInvoke
Mar 24 11:02:13.611072 osafimmnd [4131:ImmModel.cc:11005] T5 Admin op on objectName:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxattrName_testAdminOwnerRelease_Failures_1012
Mar 24 11:02:13.611111 osafimmnd [4131:ImmModel.cc:11114] T5 IMPLEMENTER FOR ADMIN OPERATION INVOKE 19 conn:55 node:2030f name:implementer_testAdminOwnerRelease_Failures_101
Mar 24 11:02:13.611139 osafimmnd [4131:ImmModel.cc:11122] T5 Updating req invocation inv:34359738367 conn:54 timeout:0
Mar 24 11:02:13.611163 osafimmnd [4131:ImmModel.cc:11129] TR Located pre request continuation 34359738367 adjusting timeout to 0
Mar 24 11:02:13.611182 osafimmnd [4131:ImmModel.cc:11157] T5 Storing impl invocation 55 for inv: 34359738367
Mar 24 11:02:13.611215 osafimmnd [4131:ImmModel.cc:11226] << adminOperationInvoke
Mar 24 11:02:13.611252 osafimmnd [4131:immnd_evt.c:4984] T2 IMMND sending Agent upcall
Mar 24 11:02:13.613901 osafimmnd [4131:immnd_evt.c:4990] T2 IMMND UPCALL TO AGENT SEND SUCCEEDED
Mar 24 11:02:13.614270 osafimmnd [4131:immnd_evt.c:5128] T2 Delayed reply, wait for reply from implementer
Mar 24 11:02:13.614547 osafimmnd [4131:immnd_evt.c:5132] << immnd_evt_proc_admop
Mar 24 11:02:13.614873 osafimmnd [4131:immnd_evt.c:8658] >> dequeue_outgoing
Mar 24 11:02:13.615112 osafimmnd [4131:immnd_evt.c:8664] TR Pending replies:0 space:16 out list?:(nil)
Mar 24 11:02:13.615396 osafimmnd [4131:immnd_evt.c:8693] << dequeue_outgoing
Mar 24 11:02:13.615829 osafimmnd [4131:immnd_evt.c:8777] << immnd_evt_proc_fevs_rcv
Mar 24 11:02:14.496009 osafimmnd [4131:ImmModel.cc:12450] T5 Did not timeout now - start < 0(1)
Mar 24 11:02:14.609660 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
Mar 24 11:02:14.609724 osafimmnd [4131:immnd_evt.c:2837] T2 sender_count: 1 size: 268
Mar 24 11:02:14.609761 osafimmnd [4131:immnd_evt.c:3118] >> immnd_fevs_local_checks
Mar 24 11:02:14.609808 osafimmnd [4131:immnd_evt.c:3575] << immnd_fevs_local_checks
Mar 24 11:02:14.609838 osafimmnd [4131:immnd_evt.c:3036] T2 SENDING FEVS TO IMMD
Mar 24 11:02:14.609863 osafimmnd [4131:immsv_evt.c:5481] T8 Sending: IMMD_EVT_ND2D_FEVS_REQ to 0
Mar 24 11:02:14.616600 osafimmnd [4131:immnd_evt.c:8716] >> immnd_evt_proc_fevs_rcv
Mar 24 11:02:14.616745 osafimmnd [4131:immnd_evt.c:8732] T2 FEVS from myself, still pending:0
Mar 24 11:02:14.616815 osafimmnd [4131:immsv_evt.c:5500] T8 Received: IMMND_EVT_A2ND_ADMO_RELEASE (10) from 0
Mar 24 11:02:14.616860 osafimmnd [4131:ImmModel.cc:4549] >> adminOwnerChange
Mar 24 11:02:14.616893 osafimmnd [4131:ImmModel.cc:4576] T5 Release admin owner 'exowner'
Mar 24 11:02:14.634875 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount
Mar 24 11:02:14.635431 osafimmnd [4131:ImmModel.cc:4589] T5 Release Admin Owner for object xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxattrName_testAdminOwnerRelease_Failures_1012
Mar 24 11:02:14.641743 osafimmnd [4131:ImmModel.cc:4681] TR Cutoff in admo-change-loop by childCount
Mar 24 11:02:14.642150 osafimmnd [4131:ImmModel.cc:4694] << adminOwnerChange

Discussion

  • Sirisha Alla

    Sirisha Alla - 2015-03-24
    • Version: --> 4.6 FC
     
  • Anders Bjornerstedt

    • summary: admin owner clear/release on an object is allowed when admin operation is in progress for the object --> IMM: admin owner clear/release on an object is allowed when admin operation is in progress for the object
    • Part: - --> nd
    • Version: 4.6 FC --> 1.0
     
  • Anders Bjornerstedt

    This is relevant for all older versions.
    Behavior has not changed and no one has noticed or cared about this issue before.

    The quotes from the SAF spec are correct and I would say they make some
    sense for saImmOmAdminOwnerRelease.

    However, I am not sure it makes sense for saImmOmAdminOwnerClear().
    That operations is an "emergency override" operation that should be
    extremely rare in its use. It is needed for doing a forced remove of
    admin-owner from objects where the client that set admin-owner is
    either dead (reelase-on-finalize set to false) or hung.

    So I would argue that we keep current behavior for saImmOmAdminOwnerClear
    and make a note of it in the spec deviations of the OpenSAF_IMMSV_PR.

     
  • Anders Bjornerstedt

    • Milestone: 4.6.RC1 --> 4.4.2
     
  • Anders Bjornerstedt

    • status: unassigned --> accepted
     
  • Anders Bjornerstedt

    • assigned_to: Anders Bjornerstedt
     
  • Anders Bjornerstedt

    • Priority: major --> minor
     
  • Anders Bjornerstedt

    I have analyzed the implications of this reported defect both on the
    possible current negative effects of not having this defect fixed and
    on the implementation aspects of fixing it.


    1) Possible negative effects of not having this defect fixed:
    For real users - no effect that I can think of.
    For test caseses that test this particular case, which is according to
    the SAF spec - they fail.


    2) Implementation aspects of fixing this.
    Current implementation:
    The request is sent over fevs to all processors, this is only to
    ensure that admin-operation-requests arrive and are processed fevs
    syncronously.
    The admin-ownership is checked to match at admin-operation invoke.
    If it matches at that (fevs) time then the admin operation proceeds
    with a callback towards the OI at only the processor where the OI resides.
    Continuation records are created at the requesting processor (for the
    reply to the request) and at the OI processor (for the OI callback reply).
    At other processors reciveing the admin-op request has no effect.

    No access is done after that to the admin-owner data as part of
    processing the admin-operation since the admin-owner mechanism is only
    an access control mechanism and access has now been verified.

    The reply from OI is matched against the callback-reply continuation and
    the reply forwarded *directly i.e. not over fevs to the requesting
    processor.The reply arrives at the requestiong processor (which could
    be identical to the OI residence processor) and the reply forwarded
    back to the om-client.

    The current datstructures and message protocol makes it impossible to
    fix this defect. To fix this defect requires a relatively large change
    to the implentation of the admin-op mechanism. Both datastructures and
    message protocol would need to be changed. That new implementation
    will have a poorer performance, both in terms of reduced response time
    (reply needs to go over fevs) and memory (either new member in ObjectInfo
    increasing the memory cost for all objects, or a new continuation record
    stored at all nodes).


    My conclusion from this analysis is (a) that this defect is minor
    since it has no known impact on real usage; and (b) the cost of
    implementing this is too high (both in added complexity and in reduced
    performance) for us to do it without any sensible real use-case.

    But the documentation should be updated to reflect the discrepancy
    relative to the SAF spec. This discrepancy has not been noticed or
    noticed but ignored during the several years of OpenSAFs existsnce.
    This in itself illustrates that the reported problem (as far as we know)
    is academic i.e. a case of overspecification by SAF.

     
  • Anders Bjornerstedt

    • status: accepted --> wontfix
     
  • Anders Widell

    Anders Widell - 2015-11-02
    • Milestone: 4.4.2 --> never
     

Log in to post a comment.