OpenSAF / Tickets / #2265 clm: clmd coredump

Praveen - 2017-01-19

Hi Hung,

Please share traces/logs and steps to reproduce.

Thanks,
Praveen

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hung Nguyen - 2017-01-23

Hi,
Here's the syslog, trace was not enabled.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-01-23

status: unassigned --> assigned

assigned_to: Praveen

Milestone: 5.2.FC --> 5.0.2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-01-23

Hi Hung,

Thanks for the logs. I am going thorugh them.

Thanks,
Praveen

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-01-25

Hi,
Syslogs in systemlogs.tgz indicates that cluster was coming up with SC-1, SC-2 and PL-3 and also some CCB operations were initiated when SC-2 and PL-3 were still joining.
If CCB operations are related to scale out, then there is very thin window of time in which this issue can occur. I increased this thin window by adding some sleeps and reproduced the issue . Logs, traces and configuration file to add a payload in clm cluster are attached in clm_issue.tgz. I have not used scale out script but I think with that also it can be reproduced.

Steps to reproduce:
1) Bring first controller up with attached imm.xml. It contains all the MW objects for PL-3 except object of CLM node.
2) When standby controller is coming up and clms reads cluster information from IMM, add PL-3 configuraion immcfg -f pl_3.xml. Active CLMS will not checkpoint this node to standby CLMS as it is still not visible via MBCSV.
3) Now when Standby is trying to encode MBCSV request for COLD sync, modify attribute of CLM node PL-3 with command:
immcfg -a saClmNodeLockCallbackTimeout=50000 safNode=PL-3,safCluster=myClmCluster
4) SInce standby CLMS is now visible, active will try to send this runtime information to standby. PL-3 was added runtime after stanbby has read the configuration from IMM so it will assert for not finding the PL-3.

Solution: One solution could be: Active should not send async updates if cold sync is not completed. Other solution could be: standby CLMS should ignore async update requests if cold sync is not completed. In cold sync messages it will get updated states. Need to evaluate.

Thanks,
Praveen

clm_issue.tgz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-01-27

status: assigned --> accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-02-03

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-02-10

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Praveen - 2017-02-10

changeset: 8576:fcaa10e6a992
tag: tip
parent: 8571:38f9e7f6ec5b
user: Praveen Malviya praveen.malviya@oracle.com
date: Fri Feb 10 16:07:10 2017 +0530
summary: clmd: checkpoint full node record in CCB modify cbk [#2265]

changeset: 8575:1473bf4b1214
branch: opensaf-5.1.x
parent: 8572:c7e402c9e46b
user: Praveen Malviya praveen.malviya@oracle.com
date: Fri Feb 10 16:06:28 2017 +0530
summary: clmd: checkpoint full node record in CCB modify cbk [#2265]

changeset: 8574:8840a5cd12b8
branch: opensaf-5.0.x
user: Praveen Malviya praveen.malviya@oracle.com
date: Fri Feb 10 16:05:52 2017 +0530
summary: clmd: checkpoint full node record in CCB modify cbk [#2265]

[staging:8840a5]
[staging:1473bf]
[staging:fcaa10]

Related

Commit: [1473bf]
Commit: [8840a5]
Commit: [fcaa10]
Tickets: ~~#2265~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

clm: clmd coredump

Milestone

Searches

Help

#2265 clm: clmd coredump

Related

Discussion

Related