Menu

#1304 imm:FAILED_OPERATION is not returned in case of response timeout from OI callback

4.6.1
fixed
None
defect
imm
nd
4.6 FC
major
2015-04-22
2015-04-06
No

The issue is seen on cs6377(46 FC Tag). According to the fix of #57, failed operation is returned in case of timeout on response from OI. But timeout is being returned.

IMMA Trace:

Apr 6 15:31:37.931007 imma [7642:imma_om_api.c:1597] >> ccb_object_create_common
Apr 6 15:31:37.931031 imma [7642:imma_om_api.c:1863] TR attr:attrName1
Apr 6 15:31:37.934663 imma [7642:imma_proc.c:1346] TR Event type:15
Apr 6 15:31:37.934700 imma [7642:imma_proc.c:1142] TR Posted IMMA_CALLBACK_OI_CCB_CREATE for ccb 2
Apr 6 15:31:37.934712 imma [7642:imma_db.c:0209] >> imma_oi_ccb_record_add
Apr 6 15:31:37.934720 imma [7642:imma_db.c:0187] >> imma_oi_ccb_record_find
Apr 6 15:31:37.934727 imma [7642:imma_db.c:0196] TR Record for ccbid:0x2 handle:330002030f client:0x89adf0 NOT found
Apr 6 15:31:37.934735 imma [7642:imma_db.c:0198] << imma_oi_ccb_record_find
Apr 6 15:31:37.934740 imma [7642:imma_db.c:0244] TR Record for ccbid:0x2 handle:330002030f client:0x89adf0 opCount:0 added
Apr 6 15:31:37.934744 imma [7642:imma_db.c:0245] << imma_oi_ccb_record_add
Apr 6 15:31:37.934748 imma [7642:imma_proc.c:1239] >> imma_proc_free_pointers
Apr 6 15:31:37.934754 imma [7642:imma_proc.c:1332] << imma_proc_free_pointers
Apr 6 15:31:37.934774 imma [7642:imma_db.c:0187] >> imma_oi_ccb_record_find
Apr 6 15:31:37.934779 imma [7642:imma_db.c:0194] TR Record for ccbid:0x2 handle:330002030f client:0x89adf0 found
Apr 6 15:31:37.934784 imma [7642:imma_db.c:0198] << imma_oi_ccb_record_find
Apr 6 15:31:37.934788 imma [7642:imma_proc.c:1914] >> imma_process_callback_info
Apr 6 15:31:37.934793 imma [7642:imma_proc.c:2233] TR Ccb-object-create callback
Apr 6 15:31:37.934797 imma [7642:imma_db.c:0187] >> imma_oi_ccb_record_find
Apr 6 15:31:37.934801 imma [7642:imma_db.c:0194] TR Record for ccbid:0x2 handle:330002030f client:0x89adf0 found
Apr 6 15:31:37.934804 imma [7642:imma_db.c:0198] << imma_oi_ccb_record_find
Apr 6 15:31:37.934813 imma [7642:imma_proc.c:2336] TR ccb-object-create make the callback
Apr 6 15:31:43.889216 imma [7642:imma_proc.c:1346] TR
Event type:20
Apr 6 15:31:43.889296 imma [7642:imma_proc.c:0957] TR Posted IMMA_CALLBACK_OI_CCB_ABORT for ccb 2
Apr 6 15:31:43.889327 imma [7642:imma_db.c:0302] >> imma_oi_ccb_record_abort
Apr 6 15:31:43.889349 imma [7642:imma_db.c:0187] >> imma_oi_ccb_record_find
Apr 6 15:31:43.889857 imma [7642:imma_db.c:0194] TR Record for ccbid:0x2 handle:330002030f client:0x89adf0 found
Apr 6 15:31:43.889888 imma [7642:imma_db.c:0198] << imma_oi_ccb_record_find
Apr 6 15:31:43.889909 imma [7642:imma_db.c:0313] << imma_oi_ccb_record_abort
Apr 6 15:31:43.889928 imma [7642:imma_proc.c:0960] T2 CCB-ABORT-UC: oi_ccb_record for 2 aborted
Apr 6 15:31:43.889950 imma [7642:imma_proc.c:1239] >> imma_proc_free_pointers
Apr 6 15:31:43.889970 imma [7642:imma_proc.c:1332] << imma_proc_free_pointers
Apr 6 15:31:47.941573 imma [7642:imma_om_api.c:1961] TR objectCreate send RETURNED:5
Apr 6 15:31:47.941671 imma [7642:imma_om_api.c:2092] << ccb_object_create_common
Apr 6 15:31:49.950709 imma [7642:imma_proc.c:2396] TR ccb-object-create callback returned RC:1

IMMND Trace:

Apr 6 15:31:37.934582 osafimmnd [7422:immnd_evt.c:5978] T2 MAKING OI-IMPLEMENTER OBJ CREATE upcall towards agent
Apr 6 15:31:37.935214 osafimmnd [7422:immnd_evt.c:6064] << immnd_evt_proc_object_create
Apr 6 15:31:37.935231 osafimmnd [7422:immnd_evt.c:8658] >> dequeue_outgoing
Apr 6 15:31:37.935236 osafimmnd [7422:immnd_evt.c:8664] TR Pending replies:0 space:16 out list?:(nil)
Apr 6 15:31:37.935241 osafimmnd [7422:immnd_evt.c:8693] << dequeue_outgoing
Apr 6 15:31:37.935244 osafimmnd [7422:immnd_evt.c:8777] << immnd_evt_proc_fevs_rcv
Apr 6 15:31:38.607788 osafimmnd [7422:immnd_proc.c:1637] T5 tmout:1000 ste:10 ME:24 RE:24 crd:0 rim:KEEP_REPO 4.3A:1 2Pbe:0 VetA/B: 1/0 othsc:0/0
Apr 6 15:31:43.632884 osafimmnd [7422:immnd_proc.c:1637] T5 tmout:1000 ste:10 ME:24 RE:24 crd:0 rim:KEEP_REPO 4.3A:1 2Pbe:0 VetA/B: 1/0 othsc:0/0
Apr 6 15:31:43.888439 osafimmnd [7422:immnd_evt.c:8716] >> immnd_evt_proc_fevs_rcv
Apr 6 15:31:43.888514 osafimmnd [7422:immnd_evt.c:8734] T2 REMOTE FEVS received. Messages from me still pending:0
Apr 6 15:31:43.888605 osafimmnd [7422:immsv_evt.c:5500] T8 Received: IMMND_EVT_D2ND_ABORT_CCB (62) from 0
Apr 6 15:31:43.888650 osafimmnd [7422:immnd_evt.c:7364] >> immnd_evt_proc_ccb_finalize
Apr 6 15:31:43.888676 osafimmnd [7422:immnd_evt.c:6621] >> immnd_evt_ccb_abort
Apr 6 15:31:43.888696 osafimmnd [7422:immnd_evt.c:6625] TR We expect there to be a PBE
Apr 6 15:31:43.888723 osafimmnd [7422:ImmModel.cc:5562] >> ccbAbort
Apr 6 15:31:43.888750 osafimmnd [7422:ImmModel.cc:5571] T5 ABORT CCB 2
Apr 6 15:31:43.888829 osafimmnd [7422:ImmModel.cc:5580] NO Aborting ccb 2 while waiting for reply from implementer on CREATE-OP
Apr 6 15:31:43.888891 osafimmnd [7422:ImmModel.cc:5640] NO Ccb 2 ABORTED (exowner)

Also the environment variable for IMMA_OI_CALLBACK_TIMEOUT is set to 8 seconds. But the timeout happened at 6 seconds. IMMA library code snippet:

    if(cl_node->isImmA2e && (timeout_env_value = getenv("IMMA_OI_CALLBACK_TIMEOUT"))!=NULL) {
            char *endp = NULL;
            cl_node->oiTimeout = strtol(timeout_env_value, &endp, 10);
            if(!endp || *endp) {
                    TRACE_2("Failed to parse IMMA_OI_CALLBACK_TIMEOUT. "
                                    "OI timeout will be set to the default value");
                    cl_node->oiTimeout = 0;
            } else {
                    TRACE_2("IMMA library OI timeout set to:%u", cl_node->oiTimeout);
            }
    } else {
            cl_node->oiTimeout = 0;
    }

The IMMA OI agent needs to be A.2.14 to set the variable according to the implementation. But there is no information on the dependency on version in the documentation.

Related

Tickets: #1304
Wiki: ChangeLog-4.6.1

Discussion

  • Anders Bjornerstedt

    Is the issue reported by this ticket just a documentation issue ?

     
  • Neelakanta Reddy

    • status: unassigned --> assigned
    • assigned_to: Neelakanta Reddy
    • Part: - --> nd
    • Milestone: 4.5.2 --> 4.6.0
     
  • Neelakanta Reddy

    • Milestone: 4.6.0 --> 4.7-Tentative
     
  • Neelakanta Reddy

    I tried to re-produce in sirisha nodes, After OI timeout FAILED_OPERATION is returned. Sirisha is checking the test code to confirm it.

     
  • Anders Bjornerstedt

    • Milestone: 4.7-Tentative --> 4.6.1
     
  • Anders Bjornerstedt

    Defect introduced on 4.6 branch.
    CAn not have milestone on 4.7.tentative

     
  • Anders Bjornerstedt

    I think I understand this problem better now.
    The problem occurs if the om-client is not on the same processor as
    the immnd-coord. The IMMND that is elected the coordinator is the only
    one executing the periodic cleanTheBasement cleanup function. This function
    among other things checks for timeout on oi-ccb-callbacks. When it detects
    timeout it broadcasts an abort-ccb for that such a ccb over fevs.

    For some reason the FAILED_OPERATION then only reaches the om client if
    the om-client is colocated with the IMMND coord (at SC1 or SC2).

    The problem can be reproduced by executing 'immoitest 2 29'.
    If it is executed on a payload, the test fails.
    If it is executed on the SC where the immnd coord resides (same as where
    the immpbed process resides, whichis forked by the immnd coord, then
    the test succeeds.

     
  • Anders Bjornerstedt

     
  • Neelakanta Reddy

    • status: assigned --> accepted
     
  • Neelakanta Reddy

    • summary: FAILED_OPERATION is not returned in case of response timeout from OI callback --> imm:FAILED_OPERATION is not returned in case of response timeout from OI callback
    • status: accepted --> review
     
  • Neelakanta Reddy

    [staging:655311]
    [staging:9bdd45]

    changeset: 6486:9bdd453e433a
    tag: tip
    parent: 6484:6c21badd0111
    user: Neelakanta Reddy reddy.neelakanta@oracle.com
    date: Wed Apr 22 10:52:49 2015 +0530
    summary: imm :return ERR_FAILED_OPERATION to the client when the CCB is aborted [#1304]

    changeset: 6485:655311de27f7
    branch: opensaf-4.6.x
    parent: 6482:2e9c417cca42
    user: Neelakanta Reddy reddy.neelakanta@oracle.com
    date: Wed Apr 22 10:52:49 2015 +0530
    summary: imm :return ERR_FAILED_OPERATION to the client when the CCB is aborted [#1304]

     

    Related

    Tickets: #1304
    Commit: [655311]
    Commit: [9bdd45]

  • Neelakanta Reddy

    • status: review --> fixed
     

Log in to post a comment.