Menu

#1647 clm: Incorrect error code handling in processing node_up due to null ip

4.6.2
fixed
None
enhancement
clm
d
minor
2016-01-18
2015-12-17
No

In function proc_node_up_msg() where clmd handles node_up from node_agent, if the node_up comes without ip attached (it's happening by somehow in the test case that simulates SCs gone due to disabled tipc for resilience feature), we see clmd coredump.

'#0 proc_node_up_msg (cb=<optimized out="">, evt=0x44006550) at clms_evt.c:369
'#1 0x00000000004051c5 in process_api_evt (evt=0x44006550) at clms_evt.c:1333
'#2 0x0000000000408910 in clms_process_mbx (mbx=<optimized out="">) at clms_evt.c:1373
'#3 0x00000000004042ee in main (argc=<optimized out="">, argv=<optimized out="">) at clms_main.c:499
(gdb) bt full
'#0 proc_node_up_msg (cb=<optimized out="">, evt=0x44006550) at clms_evt.c:369
nodeup_info = 0x440065a0
node = 0x65f820
nodeid = 131855
rc = 1
node_name = {length = 36, value = "safNode=PL-3,safCluster=myClmCluster", '\000' <repeats 219="" times="">}
clm_msg = {next = 0x1, evt_type = CLMSV_CLMS_TO_CLMA_API_RESP_MSG, info = {api_info = {type = CLMSV_CLUSTER_JOIN_REQ, param = {
init = {version = {releaseCode = 36 '$', majorVersion = 0 '\000', minorVersion = 115 's'}}, finalize = {
client_id = 1634926628}, track_start = {client_id = 1634926628, flags = 102 'f', sync_resp = 78 'N'}, track_stop = {
client_id = 1634926628}, node_get = {client_id = 1634926628, node_id = 1685016166}, node_get_async = {
client_id = 1634926628, inv = 8299064482983853413, node_id = 1816356449}, clm_resp = {client_id = 1634926628,
resp = 1685016166, inv = 8299064482983853413}, nodeup_info = {node_id = 1634926628, node_name = {length = 20070,
value = "ode=PL-3,safCluster=myClmCluster", '\000' <repeats 223="" times="">}}}}, cbk_info = {client_id = 7,
type = CLMSV_NODE_ASYNC_GET_CBK, param = {track = {buf_info = {viewNumber = 7237089327836233764,
numberOfItems = 1280327013, notification = 0x657473756c436661}, mem_num = 2037202290, inv = 125780070987116,
root_cause_ent = 0x0, cor_ids = 0x0, step = 0, time_super = 0, err = 0}, node_get = {err = 1634926628,
inv = 8299064482983853413, info = {nodeId = 1816356449, nodeAddress = {family = (SA_CLM_AF_INET | unknown: 1702130548),
length = 15730, value = "myClmCluster", '\000' <repeats 51="" times="">}, nodeName = {length = 0,
value = '\000' <repeats 255="" times="">}, executionEnvironment = {length = 0,
---Type <return> to continue, or q <return> to quit---
value = '\000' <repeats 65="" times="">, "\271n\243\264\017\020\000\000\000\000\000\000\000\000\000Pe\000D\000\000\000\000\r\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\200k\252\277\177", '\000' <repeats 19="" times="">, "+\301#@\000\000\000\000\001\000\000\000\000\000\000\000(\000\000\000\060\000\000\000j\252\277\177\000\000\000\240i\252\277\177\000\000\000\314\271d\000\000\000\000\000V\366$@\000\000\000\000\200j\252\277\177\000\000\000\001\000\000\000\000\000\000\000"...}, member = (unknown: 6529616), bootTimestamp = 1078576288, initialViewNumber = 1076151337}}}}, api_resp_info = {type = CLMSV_CLUSTER_JOIN_RESP, rc = SA_AIS_OK, param = {client_id = 1634926628, node_get = {nodeId = 1634926628, nodeAddress = { family = (SA_CLM_AF_INET6 | unknown: 1685016164), length = 15717, value = "PL-3,safCluster=myClmCluster", '\000' <repeats 35 times>}, nodeName = {length = 0, value = '\000' <repeats 255 times>}, executionEnvironment = {length = 0, value = '\000' <repeats 81 times>, "\271n\243\264\017\020\000\000\000\000\000\000\000\000\000Pe\000D\000\000\000\000\r\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\200k\252\277\177", '\000' <repeats 19 times>, "+\301#@\000\000\000\000\001\000\000\000\000\000\000\000(\000\000\000\060\000\000\000j\252\277\177\000\000\000\240i\252\277\177\000\000\000\314\271d\000\000\000\000\000V\366$@\000\000\000\000"...}, member = (unknown: 2741942528), bootTimestamp = 1078576288, initialViewNumber = 6529616},
inv = 7237089327836233764, track = {notify_info = 0x646f4e6661730024, num = 15717}, node_name = {length = 36,
value = "safNode=PL-3,safCluster=myClmCluster", '\000' <repeats 219="" times="">}}}, is_member_info = {
is_member = (SA_TRUE | unknown: 6), is_configured = SA_TRUE, client_id = 1634926628}}}
check_member = SA_FALSE
ip = 0x0
FUNCTION = "proc_node_up_msg"

There should be a mds/tipc problem at the first place, though this ticket is to correct the error handling in clmd to avoid coredump. When clmd finds null ip, it set error code as SA_AIS_ERR_NOT_EXIST, but later the error code is overwriten back to SA_AIS_OK.

Related

Tickets: #1647
Wiki: ChangeLog-4.6.2

Discussion

  • Minh Hon Chau

    Minh Hon Chau - 2015-12-17

    Changed to "enhancement". We can't see this problem as a bug since cluster reboots if SCs have gone

     
  • Minh Hon Chau

    Minh Hon Chau - 2015-12-17
    • Type: defect --> enhancement
    • Milestone: 4.6.2 --> 5.0.FC
     
  • Minh Hon Chau

    Minh Hon Chau - 2015-12-17
    • status: assigned --> review
     
  • Mathi Naickan

    Mathi Naickan - 2016-01-18
    • status: review --> fixed
     
  • Mathi Naickan

    Mathi Naickan - 2016-01-18

    changeset: 7233:5fdd071344f2
    branch: opensaf-4.6.x
    parent: 7230:09518bc57ca4
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Tue Jan 19 00:13:55 2016 +0530
    summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]

    changeset: 7234:3214314462ec
    branch: opensaf-4.7.x
    parent: 7231:551804135446
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Tue Jan 19 00:13:55 2016 +0530
    summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]

    changeset: 7235:172dc1af0cca
    tag: tip
    parent: 7232:3958b289cab4
    user: Mathivanan N.P.mathi.naickan@oracle.com
    date: Tue Jan 19 00:13:55 2016 +0530
    summary: clmd: Fix coredump in handling node_up message due to null ip [#1647]

     

    Related

    Tickets: #1647

  • Mathi Naickan

    Mathi Naickan - 2016-01-18
    • Milestone: 5.0.FC --> 4.6.2
     

Log in to post a comment.