In function proc_node_up_msg() where clmd handles node_up from node_agent, if the node_up comes without ip attached (it's happening by somehow in the test case that simulates SCs gone due to disabled tipc for resilience feature), we see clmd coredump.
'#0 proc_node_up_msg (cb=<optimized out="">, evt=0x44006550) at clms_evt.c:369
'#1 0x00000000004051c5 in process_api_evt (evt=0x44006550) at clms_evt.c:1333
'#2 0x0000000000408910 in clms_process_mbx (mbx=<optimized out="">) at clms_evt.c:1373
'#3 0x00000000004042ee in main (argc=<optimized out="">, argv=<optimized out="">) at clms_main.c:499
(gdb) bt full
'#0 proc_node_up_msg (cb=<optimized out="">, evt=0x44006550) at clms_evt.c:369
nodeup_info = 0x440065a0
node = 0x65f820
nodeid = 131855
rc = 1
node_name = {length = 36, value = "safNode=PL-3,safCluster=myClmCluster", '\000' <repeats 219="" times="">}
clm_msg = {next = 0x1, evt_type = CLMSV_CLMS_TO_CLMA_API_RESP_MSG, info = {api_info = {type = CLMSV_CLUSTER_JOIN_REQ, param = {
init = {version = {releaseCode = 36 '$', majorVersion = 0 '\000', minorVersion = 115 's'}}, finalize = {
client_id = 1634926628}, track_start = {client_id = 1634926628, flags = 102 'f', sync_resp = 78 'N'}, track_stop = {
client_id = 1634926628}, node_get = {client_id = 1634926628, node_id = 1685016166}, node_get_async = {
client_id = 1634926628, inv = 8299064482983853413, node_id = 1816356449}, clm_resp = {client_id = 1634926628,
resp = 1685016166, inv = 8299064482983853413}, nodeup_info = {node_id = 1634926628, node_name = {length = 20070,
value = "ode=PL-3,safCluster=myClmCluster", '\000' <repeats 223="" times="">}}}}, cbk_info = {client_id = 7,
type = CLMSV_NODE_ASYNC_GET_CBK, param = {track = {buf_info = {viewNumber = 7237089327836233764,
numberOfItems = 1280327013, notification = 0x657473756c436661}, mem_num = 2037202290, inv = 125780070987116,
root_cause_ent = 0x0, cor_ids = 0x0, step = 0, time_super = 0, err = 0}, node_get = {err = 1634926628,
inv = 8299064482983853413, info = {nodeId = 1816356449, nodeAddress = {family = (SA_CLM_AF_INET | unknown: 1702130548),
length = 15730, value = "myClmCluster", '\000' <repeats 51="" times="">}, nodeName = {length = 0,
value = '\000' <repeats 255="" times="">}, executionEnvironment = {length = 0,
---Type <return> to continue, or q <return> to quit---
value = '\000' <repeats 65="" times="">, "\271n\243\264\017\020\000\000\000\000\000\000\000\000\000Pe\000D\000\000\000\000\r\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\200k\252\277\177", '\000' <repeats 19="" times="">, "+\301#@\000\000\000\000\001\000\000\000\000\000\000\000(\000\000\000\060\000\000\000j\252\277\177\000\000\000\240i\252\277\177\000\000\000\314\271d\000\000\000\000\000V\366$@\000\000\000\000\200j\252\277\177\000\000\000\001\000\000\000\000\000\000\000"...}, member = (unknown: 6529616),
bootTimestamp = 1078576288, initialViewNumber = 1076151337}}}}, api_resp_info = {type = CLMSV_CLUSTER_JOIN_RESP,
rc = SA_AIS_OK, param = {client_id = 1634926628, node_get = {nodeId = 1634926628, nodeAddress = {
family = (SA_CLM_AF_INET6 | unknown: 1685016164), length = 15717,
value = "PL-3,safCluster=myClmCluster", '\000' <repeats 35 times>}, nodeName = {length = 0,
value = '\000' <repeats 255 times>}, executionEnvironment = {length = 0,
value = '\000' <repeats 81 times>, "\271n\243\264\017\020\000\000\000\000\000\000\000\000\000Pe\000D\000\000\000\000\r\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\200k\252\277\177", '\000' <repeats 19 times>, "+\301#@\000\000\000\000\001\000\000\000\000\000\000\000(\000\000\000\060\000\000\000
j\252\277\177\000\000\000\240i\252\277\177\000\000\000\314\271d\000\000\000\000\000V\366$@\000\000\000\000"...}, member = (unknown: 2741942528), bootTimestamp = 1078576288, initialViewNumber = 6529616},
inv = 7237089327836233764, track = {notify_info = 0x646f4e6661730024, num = 15717}, node_name = {length = 36,
value = "safNode=PL-3,safCluster=myClmCluster", '\000' <repeats 219="" times="">}}}, is_member_info = {
is_member = (SA_TRUE | unknown: 6), is_configured = SA_TRUE, client_id = 1634926628}}}
check_member = SA_FALSE
ip = 0x0
FUNCTION = "proc_node_up_msg"
There should be a mds/tipc problem at the first place, though this ticket is to correct the error handling in clmd to avoid coredump. When clmd finds null ip, it set error code as SA_AIS_ERR_NOT_EXIST, but later the error code is overwriten back to SA_AIS_OK.
Changed to "enhancement". We can't see this problem as a bug since cluster reboots if SCs have gone
changeset: 7233:5fdd071344f2
branch: opensaf-4.6.x
parent: 7230:09518bc57ca4
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Tue Jan 19 00:13:55 2016 +0530
summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]
changeset: 7234:3214314462ec
branch: opensaf-4.7.x
parent: 7231:551804135446
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Tue Jan 19 00:13:55 2016 +0530
summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]
changeset: 7235:172dc1af0cca
tag: tip
parent: 7232:3958b289cab4
user: Mathivanan N.P.mathi.naickan@oracle.com
date: Tue Jan 19 00:13:55 2016 +0530
summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]
Related
Tickets:
#1647