Menu

#1504 imm: Implicit class/object-applier checked by OiImplementeSet is not globally consistent

4.5.2
fixed
None
defect
imm
nd
major
2015-10-13
2015-09-28
Hung Nguyen
No

Set an applier to a class. Then exit immapplier to detach the applier.

root@SC1:~# immapplier -a @whatever Test

Let another node join the cluster.
Create a CCB which is active on an object of 'Test' class. Don't commit the CCB.

root@SC1:~# immcfg
> immcfg -c Test test=1
>

Try to set applier again.

root@SC1:/srv/shared# immapplier -a @whatever Test
Implementer: @whatever
ImmVersion: A 2 16
error - saImmOiImplementerSet FAILED: SA_AIS_ERR_TRY_AGAIN (6)

SC-1

osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 225
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13040] TR TRY_AGAIN: ccb 2 is active on object 'test=1' bound to class applier '@whatever'. Can not re-attach applier
osafimmnd [419:ImmModel.cc:13156] << implementerSet

PL-3

osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 225
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [392:ImmModel.cc:13087] NO Implementer (applier) connected: 5 (@whatever) <0, 2010f>
osafimmnd [392:ImmModel.cc:13156] << implementerSet

IMMND on SC-1 rejected the implSet request but IMMND on PL-3 accepted it.
The applier was not synced to PL-3 (mAppliers.empty() returned true) so the implSet request passed the ccb check.

            if( ! obj->mClassInfo->mAppliers.empty()) {
                ImplementerSet::iterator ii = obj->mClassInfo->mAppliers.begin();
                for(; ii != obj->mClassInfo->mAppliers.end(); ++ii) {
                    if((*ii) == info) {
                        TRACE("TRY_AGAIN: ccb %u is active on object '%s' "
                           "bound to class applier '%s'. Can not re-attach applier",
                           ccb->mId, omit->first.c_str(), implName.c_str());
                        err = SA_AIS_ERR_TRY_AGAIN;
                        goto done;
                    }
                }
            }

Now commit the CCB and try to set the applier again.

SC-1

osafimmnd [419:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
osafimmnd [419:immnd_evt.c:9527] T2 originated here?:1 nodeId:2010f conn: 226
osafimmnd [419:ImmModel.cc:12967] >> implementerSet
osafimmnd [419:ImmModel.cc:13008] T7 Re-using implementer for @whatever
osafimmnd [419:ImmModel.cc:13087] NO Implementer (applier) connected: 6 (@whatever) <226, 2010f>
osafimmnd [419:ImmModel.cc:13156] << implementerSet

PL-3

osafimmnd [392:immsv_evt.c:5414] T8 Received: IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
osafimmnd [392:immnd_evt.c:9527] T2 originated here?:0 nodeId:2010f conn: 226
osafimmnd [392:ImmModel.cc:12967] >> implementerSet
osafimmnd [392:ImmModel.cc:13003] T7 ERR_EXIST: Registered implementer already exists: @whatever
osafimmnd [392:ImmModel.cc:13005] << implementerSet

The applier had diferent ids on SC-1 and PL-3. When a new node joins the cluster, IMMND on PL-3 will crash when verifying the implementers.

PL3 osafimmnd[392]: ER Sync-verify: Established node has different Implementer-id: 5 for name: @whatever, sync says 6.

Related

Tickets: #1504
Wiki: ChangeLog-4.5.2
Wiki: ChangeLog-4.6.1

Discussion

  • Hung Nguyen

    Hung Nguyen - 2015-09-28

    Not only class appliers, but also object appliers also have this problem.
    sObjAppliersMap is not built up on sync-client.

     
  • Anders Bjornerstedt

    I have a problem with this ticket.
    Appliers are intentionally not synced.
    They should not need to be synced.
    The question here is how you manage to execute a sync with a ccb being active.
    Non empty Ccbs are terminated before the actual sync can start.

    So there seems to have been introduced a bug somewhere.

     
    • Hung Nguyen

      Hung Nguyen - 2015-09-28

      The CCB was created after the sync.

       
  • Anders Bjornerstedt

     
  • Anders Bjornerstedt

    The applier-names are synced but the class/object-applier data is not sync-ed.
    That is intentional and I dont want a solution that tries to sync all applier information to all nodes.

    The class-applier and object-applier mechanism is inherrently local, i.e. only used at the node where
    the applier exists. Remeber that an applier is a listener and not a true particiapnt in CCbs, so
    its existence should only matter locally. The only thing thatg is global is the existence of an
    applier with a certain name and the current location if any for that exact applier with that name.

    Having said that, it is still important that the local class/object applier is not allowed to attach in such
    a way that it can see an incomplete ccb.

    Iam thinking about what the best approach foir a fix would be.
    Dont start doing some complex implementation of this yet.

     
  • Anders Bjornerstedt

    The problem is the feature of implicit class-implementer-set and implicit object-implementer-set.
    Ironically this feature is parctically useless for appliers.

    One possible (and relatively simple) solution would be to only do the ccb interference checks for
    appliers at the node where the applier is actually attaching. That would almost be in
    fevs_local_checks, except that implementer-set is not a regular fevs message at the sending side.
    So instad it would be in immnd_evt_proc_impl_set in immnd_evt.c.

    If the check fails then the local IMMND simply rejects the request with TRY_AGAIN (or ERR_BUSY
    would in reality be better here since the immsv has no control over how long the wait will be).

    The current applier check at the fevs receiving side for implementer-set is simply removed.

     
  • Anders Bjornerstedt

    With the above solution there is the issue that the check is then not done in fevs order.
    By the time the implementer-set arrives over fevs at all nodes, there may have been creaed a
    ccb-operation that interferes, resulting in the implementer-set having to be aborted anyway.

    The local immnd thus has to run the applier checks again in the receiving fevs for implementer-set.
    If that check fails, it rejects the operation, replies with error to the client and broadcast an implementer_clear over fevs.

     
  • Anders Bjornerstedt

    • summary: imm: Appliers for classes and objects are not synced to sync-client --> imm: Implicit class/object-applier checked by OiImplementeSet is incorrect
     
  • Anders Bjornerstedt

    • summary: imm: Implicit class/object-applier checked by OiImplementeSet is incorrect --> imm: Implicit class/object-applier checked by OiImplementeSet is not globally consistent
     
  • Anders Bjornerstedt

    The actual local check done for the locall attaching applier is correct. The problem is that the same check may fail at other nodes (where the applier is not currently attaching) resulting in the
    applier attach being handled as rejected at some other node. This will not impact the correct
    behavior of the applier mechanism where it attached. That logic is purely node local.

    A detrimental effect that can happen is that a future sync may detect an inconstency on
    applier-id for a given applier name and escalate this to a restart of the node director(s) that detects this inconsistency. That is a serious side effect so the problem needs to be fixed.
    One quick solution would be to disarm the check, or at least reduce its severity to a warning.
    That would of course not remove the inconsistency.

    Worst case scenario resulting from he inconsistency on applier existence/identity is that two appliers could be allowed to attach under the same applier name, at two or more different nodes.
    That in itself would not be a problem for the applier mechanism. A problem would only ocur
    if a third party tries to communicate with the supposed unique named applier using a direct
    admin-operation with the applier-name as the target. Such an admin-operation would end up
    arbitrarily reaching one of the appliers under that name.

     
  • Hung Nguyen

    Hung Nguyen - 2015-10-01
    • status: assigned --> accepted
     
  • Zoran Milinkovic

    • assigned_to: Hung Nguyen --> Zoran Milinkovic
     
  • Zoran Milinkovic

    • status: accepted --> review
     
  • Zoran Milinkovic

     
  • Zoran Milinkovic

    • status: review --> fixed
     
  • Zoran Milinkovic

    opensaf-4.5.x:

    changeset: 7001:32075f5c5570
    branch: opensaf-4.5.x
    parent: 6998:bebc2783183f
    user: Zoran Milinkovic zoran.milinkovic@ericsson.com
    date: Tue Oct 13 10:56:50 2015 +0200
    summary: imm: synchronize applier set on all nodes [#1504]


    opensaf-4.6.x:

    changeset: 7002:fa5da6b01f61
    branch: opensaf-4.6.x
    tag: tip
    parent: 6997:ae65b0ffa596
    user: Zoran Milinkovic zoran.milinkovic@ericsson.com
    date: Tue Oct 13 11:00:39 2015 +0200
    summary: imm: synchronize applier set on all nodes [#1504]


    opensaf-4.7.x:

    changeset: 6999:2767588ed092
    branch: opensaf-4.7.x
    parent: 6996:e5bb3f7120eb
    user: Zoran Milinkovic zoran.milinkovic@ericsson.com
    date: Wed Oct 07 16:35:59 2015 +0200
    summary: imm: synchronize applier set on all nodes [#1504]


    default(4.7):

    changeset: 7000:9d30fa46a7a5
    parent: 6995:fe634f270a98
    user: Zoran Milinkovic zoran.milinkovic@ericsson.com
    date: Wed Oct 07 16:35:59 2015 +0200
    summary: imm: synchronize applier set on all nodes [#1504]

     

    Related

    Tickets: #1504


Log in to post a comment.