At start of the standby clmsv a msg based ckpt may arrive before cold sync is completed. This leads to ckpt_proc_reg_rec() is called for the same client_id twice and the global clms_cb->last_client_id is not correct as it is not updated the second time at cold sync, already exists.
After role change, standby to active, proc_initialize_msg tries to add a new client_id, that already exists, due to the above described inconsistency. The clma agent receives this message as the somewhat misleading error message SA_AIS_ERR_NO_MEMORY at the clma.
A patch to avoid sending msg based checkpoints before cold sync is complete solves this problem.
changeset: 7327:4ed808a6af5b
tag: qparent
user: Hans Nordeback hans.nordeback@ericsson.com
date: Tue Mar 15 13:59:58 2016 +0100
files: osaf/services/saf/clmsv/clms/clms_mbcsv.c
description:
clmd: Wait for cold sync to complete before sending message based checkpoints [#1701]
Related
Tickets:
#1701The fix is causing the issue https://sourceforge.net/p/opensaf/tickets/1738/
Looks causing https://sourceforge.net/p/opensaf/tickets/1760/ as well.
I was perhaps in a haste to ack this patch, but i think this patch (and probably combined with behavioural changes introduced in 79 i.e. longer election timeout might create problems) could create the issues described in the tickets highlighted in 1738, 1760.
Need to see if 1762 too is somewhat related to this.
This may not be related to 1762. And 1760 looks different too.
Evaluating 1738 now!
The only change related to checkpointing was introduced in the fix of ticket 1701.
So, given that 1738, 1762 is not reproduced if 1701 is removed and given that this ticket also has the same pattern, i have reverted 1701 patch as a fix for the 3 tickets 1738, 1726 and 1777.
This ticket is reopened for tracking and setting the priority for minor.
Additional information like steps to reproduce, logs, traces would help.
Diff:
Diff:
Diff:
Attached v3 of the fix provided by HansN and GaryL. Thanks for retesting, sharing traces and providing a fix. Unfortunately the issue is not reproducible for me.
Pushed after making a trivial change to the log string.
[staging:ee924d]
[staging:2f8255]
[staging:06f276]
[staging:742b76]
changeset: 7579:ee924d090c55
branch: opensaf-5.0.x
parent: 7574:af17916c8873
user: Hans Nordeback hans.nordeback@ericsson.com
date: Tue May 03 12:04:49 2016 +0530
summary: clm: fix handling of last_client_id at standby|quiesced [#1701]
changeset: 7580:2f8255951950
parent: 7578:fa4a30bbdd27
user: Hans Nordeback hans.nordeback@ericsson.com
date: Tue May 03 12:04:49 2016 +0530
summary: clm: fix handling of last_client_id at standby|quiesced [#1701]
changeset: 7581:06f2761901e6
branch: opensaf-5.0.x
parent: 7579:ee924d090c55
user: Gary Lee gary.lee@dektech.com.au
date: Tue May 03 12:12:31 2016 +0530
summary: clm: change log category from ER to NO when clms_client_delete fails at standby [#1701]
changeset: 7582:742b765c3d97
tag: tip
parent: 7580:2f8255951950
user: Gary Lee gary.lee@dektech.com.au
date: Tue May 03 12:12:31 2016 +0530
summary: clm: change log category from ER to NO when clms_client_delete fails at standby [#1701]
Related
Commit: [06f276]
Commit: [2f8255]
Commit: [742b76]
Commit: [ee924d]
Tickets:
#1701