Menu

#1002 FileSync assertion in immnd resulted in smf mw-rollback failure(4.4 - 4.5)

never
invalid
nobody
None
defect
imm
-
4.5.FC
major
2015-11-02
2014-08-21
Hrishikesh
No

Setup: SLES 64bit, 4nodes
Changeset : 5608 branch opensaf-4.5.x along with patches for #938,#994 and #997.
5044 branch opensaf-4.4.x

Test case: Middleware upgrade between 4.4 - 4.5

Testprocedre:
1. Cluster is up and running on 4.4, merging OpensafImm_Upgrade_4.5.xml (immcfg -f, xml taken from 4.5) after setting NoStdFlags for scheme change.
<immadm -o="" 1="" -p="" opensafImmNostdFlags:SA_UINT32_T:1="" opensafImm="opensafImm,safApp=safImmService">

  1. Upgrade was triggered to 4.5
  2. After upgrade is successful, rollback is triggered to bring back the cluster to 4.4.

Failure description: Step 1, 2 of procedure was successful. After triggering rollback, SC-1,SC-2 was rolled back successfully to 4.4 and at the end of rollback of SC-2,
there was an error observed on PL-3 and PL-4 of "finalizeSync: Assertion" as
given in log snippet below.

After crossing the limit of 10 immnd restarts, SU Failover was triggered on PL-3,PL-4
and nodes went for reboot.

As PL-3 and PL-4 never joined the cluster again , SMF failed its campaign after timeout
waiting for nodes to join.
SC-1 Active
SC-2 Standby

syslog snippet:

Aug 20 20:47:35 SLES2-3 osafimmnd[2269]: ER ccb->mState:11 != ol->ccbState:9 for CCB:16
Aug 20 20:47:35 SLES2-3 osafimmnd[2269]: ImmModel.cc:16878: finalizeSync: Assertion 'ccb->mState == (ImmCcbState) ol->ccbState' failed.
Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 60000000000 ns)
Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart'
Aug 20 20:47:35 SLES2-3 osafimmnd[2842]: Started

Aug 20 20:47:39 SLES2-3 osafimmnd[2842]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: ER Can not sync Ccb that is active
Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: ER Unexpected local error 21 in finalizeSync for sync client - aborting
Aug 20 20:47:40 SLES2-3 osafamfnd[2289]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)
Aug 20 20:47:40 SLES2-3 osafamfnd[2289]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart'
Aug 20 20:47:40 SLES2-3 osafimmnd[2862]: Started

Related

Tickets: #1002

Discussion

  • Hrishikesh

    Hrishikesh - 2014-08-21
     

    Last edit: Hrishikesh 2014-08-21
  • Anders Bjornerstedt

    I dont htink this is a fair test.
    The rollback failks because there is a protocol incompatibility between
    4.4. and 4.5. The 4.5 protocol is turned on only as the last step of a
    successful upgrade.

    If the upgrade to 4.5 is successfull then why would anyone do a rollback ?
    The 4.5 protocol is switched on by:

        immadm -o 1 -p opensafImmNostdFlags:SA_UINT32_T:16 \
           opensafImm=opensafImm,safApp=safImmService
    

    If you are to rollbackto 4.4 after successfull upgrade then at least you
    must toggle off the 4.5 protocol flag first.

        immadm -o 2 -p opensafImmNostdFlags:SA_UINT32_T:16 \
           opensafImm=opensafImm,safApp=safImmService
    
     
  • Hrishikesh

    Hrishikesh - 2014-08-21

    Tried reseting 4.5 protocol flag at the start of rollback(to 4.4) and this time rollback succedded. But for upgrade I have set the 4.5protocol flag at the campaign completion stage.

    Another observation: But during the failure attempt (when ticket was raised) it was observed that even if 4.5 protocol flag was not reset at the start of rollback, its value of already reset at failure time stamp.

     
    • Anders Bjornerstedt

      Hrishikesh wrote:

      Tried reseting 4.5 protocol flag at the start of rollback(to 4.4) and this time rollback succedded.
      Good.
      But for upgrade I have set the 4.5protocol flag at the campaign completion stage.

      Yes that is the way it should be done.
      Another observation: But during the failure attempt (when ticket was raised) it was observed that even if 4.5 protocol flag was not reset at the start of rollback, its value of already reset at failure time stamp.

      I cant follow what you are saying in that last sentence.

      /AndersBj


      [tickets:#1002] FileSync assertion in immnd resulted in smf mw-rollback failure(4.4 - 4.5)

      Status: unassigned
      Milestone: 4.3.3
      Created: Thu Aug 21, 2014 06:02 AM UTC by Hrishikesh
      Last Updated: Thu Aug 21, 2014 09:11 AM UTC
      Owner: nobody

      Setup: SLES 64bit, 4nodes
      Changeset : 5608 branch opensaf-4.5.x along with patches for #938,#994 and #997.
      5044 branch opensaf-4.4.x

      Test case: Middleware upgrade between 4.4 - 4.5

      Testprocedre:
      1. Cluster is up and running on 4.4, merging OpensafImm_Upgrade_4.5.xml (immcfg -f, xml taken from 4.5) after setting NoStdFlags for scheme change.
      <immadm -o="" 1="" -p="" opensafImmNostdFlags:SA_UINT32_T:1="" opensafImm="opensafImm,safApp=safImmService">

      1. Upgrade was triggered to 4.5
      2. After upgrade is successful, rollback is triggered to bring back the cluster to 4.4.

      Failure description: Step 1, 2 of procedure was successful. After triggering rollback, SC-1,SC-2 was rolled back successfully to 4.4 and at the end of rollback of SC-2,
      there was an error observed on PL-3 and PL-4 of "finalizeSync: Assertion" as
      given in log snippet below.

      After crossing the limit of 10 immnd restarts, SU Failover was triggered on PL-3,PL-4
      and nodes went for reboot.

      As PL-3 and PL-4 never joined the cluster again , SMF failed its campaign after timeout
      waiting for nodes to join.
      SC-1 Active
      SC-2 Standby

      syslog snippet:

      Aug 20 20:47:35 SLES2-3 osafimmnd[2269]: ER ccb->mState:11 != ol->ccbState:9 for CCB:16
      Aug 20 20:47:35 SLES2-3 osafimmnd[2269]: ImmModel.cc:16878: finalizeSync: Assertion 'ccb->mState == (ImmCcbState) ol->ccbState' failed.
      Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 60000000000 ns)
      Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
      Aug 20 20:47:35 SLES2-3 osafamfnd[2289]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart'
      Aug 20 20:47:35 SLES2-3 osafimmnd[2842]: Started

      Aug 20 20:47:39 SLES2-3 osafimmnd[2842]: NO NODE STATE-> IMM_NODE_W_AVAILABLE
      Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
      Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: ER Can not sync Ccb that is active
      Aug 20 20:47:40 SLES2-3 osafimmnd[2842]: ER Unexpected local error 21 in finalizeSync for sync client - aborting
      Aug 20 20:47:40 SLES2-3 osafamfnd[2289]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2)
      Aug 20 20:47:40 SLES2-3 osafamfnd[2289]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart'
      Aug 20 20:47:40 SLES2-3 osafimmnd[2862]: Started


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/1002/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Tickets: #1002

  • Anders Bjornerstedt

    • status: unassigned --> invalid
    • Version: --> 4.5.FC
    • Milestone: 4.3.3 --> 4.5.0
     
  • Anders Widell

    Anders Widell - 2015-11-02
    • Milestone: 4.5.0 --> never
     

Log in to post a comment.