This ticket is similar to #1096:
http://sourceforge.net/p/opensaf/tickets/1096/
The PBE detaches after having received the ccb-operations for a ccb but before
having received the completed-callback. In this case there are no OIs so
the completed-callback to PBE is to be sent directly when handling the apply
downcall from the user.
Detachment itself (of the PBE or any imm client) arrives over fevs, so that
is actually not the problem. The client node will only be removed in conjuction
with clearing of the implementer in ImmModel. Thus the return from ImmModel of
a non-null pbeConn means the client-node must exist. This is an "invariant"
i.e. an assertable condition.
The problem that does exist in immnd_evt_proc_ccb_apply is that the send
itself over MDS may fail, due to a race with a PBE going down. In that case
the code in immnd_evt_proc_ccb_apply will explititly abort, which will happen
on all nodes, which will result in a cluster restart.
It is this abort() on send failure which is wrong. The other abort on client
node not found should be changed to an assert.
So the problem that needs to be fixed is to remove the abort on send failure
and instead "drop" the ccb apply to the recovery case, letting the apply
result be resolved by the PBE restart/recovery.
Indeed, it is concewivable that the PBE may have received the completed&commit
message even if the sending IMMND receives an error from MDS on the send.
Diff:
Diff:
http://sourceforge.net/p/opensaf/mailman/message/32863407/
changeset: 5933:bb53270bfe18
tag: tip
parent: 5929:468f7cf19611
user: Anders Bjornerstedt anders.bjornerstedt@ericsson.com
date: Wed Sep 24 15:48:12 2014 +0200
summary: IMM: Failure to send completed to PBE defaulted to ccb-recovery [#1127]
changeset: 5932:2505c06b19ca
branch: opensaf-4.5.x
parent: 5928:3cd62e8831a7
user: Anders Bjornerstedt anders.bjornerstedt@ericsson.com
date: Wed Sep 24 15:48:12 2014 +0200
summary: IMM: Failure to send completed to PBE defaulted to ccb-recovery [#1127]
changeset: 5931:3fff80ea7b42
branch: opensaf-4.4.x
parent: 5927:832244b78b65
user: Anders Bjornerstedt anders.bjornerstedt@ericsson.com
date: Wed Sep 24 15:52:11 2014 +0200
summary: IMM: Failure to send completed to PBE defaulted to ccb-recovery [#1127]
changeset: 5930:214972614415
branch: opensaf-4.3.x
parent: 5926:72def88cf2f8
user: Anders Bjornerstedt anders.bjornerstedt@ericsson.com
date: Wed Sep 24 15:52:11 2014 +0200
summary: IMM: Failure to send completed to PBE defaulted to ccb-recovery [#1127]
Related
Tickets:
#1127