Menu

#242 cpsv : ckptnd crashed while running multi thread application during section iteration get next

4.5.2
fixed
None
defect
ckpt
-
4.2
major
2015-10-13
2013-05-16
No

from http://devel.opensaf.org/ticket/2864

The issue is seen on SLES 64bit VMs

There are two threads in the application, a writer thread and a reader thread.

Writer thread does the follows:
1) Creates the checkpoint
2) In a loop opens the same checkpoint in write mode, creates a section, writes into the section and closes the checkpoint

Reader thread does as follows:

1) In a loop open the checkpoint created by writer thread, do a section iteration initialize and read the section returned by section descriptor of iterationNext() and close the checkpoint

Bt observed:

(gdb) bt

0 0x0000000000417606 in cpnd_proc_fill_sec_desc (pTmpSecPtr=0x0, sec_des=0x7fffa9c28530) at cpnd_proc.c:1637

1 0x0000000000417b42 in cpnd_proc_getnext_section (cp_node=0x64a810, get_next=0x654bb0, sec_des=0x7fffa9c28530,

n_secs_trav=0x7fffa9c2852c) at cpnd_proc.c:1756

2 0x000000000040f680 in cpnd_evt_proc_ckpt_iter_getnext (cb=0x637f30, evt=0x654ba0, sinfo=0x6551f8) at cpnd_evt.c:4122

3 0x00000000004059df in cpnd_process_evt (evt=0x654b90) at cpnd_evt.c:241

4 0x0000000000411619 in cpnd_main_process (cb=0x637f30) at cpnd_init.c:544

5 0x00000000004118e3 in main (argc=1, argv=0x7fffa9c28e68) at cpnd_main.c:72

(gdb) fr 2

2 0x000000000040f680 in cpnd_evt_proc_ckpt_iter_getnext (cb=0x637f30, evt=0x654ba0, sinfo=0x6551f8) at cpnd_evt.c:4122

4122 cpnd_evt.c: No such file or directory.

in cpnd_evt.c

(gdb) p *evt
$1 = {dont_free_me = false, error = 0, type = CPND_EVT_A2ND_CKPT_ITER_GETNEXT, info = {initReq = {version = {releaseCode = 51 '3',

majorVersion = 0 '\0', minorVersion = 0 '\0'}}, finReq = {client_hdl = 51}, openReq = {client_hdl = 51, lcl_ckpt_hdl = 11,

ckpt_name = {length = 61664, value = "d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t", '\0' <repeats 236="" times="">},
ckpt_attrib = {creationFlags = 0, checkpointSize = 0, retentionDuration = 0, maxSections = 0, maxSectionSize = 0,

maxSectionIdSize = 0}, ckpt_flags = 0, invocation = 0, timeout = 0}, closeReq = {client_hdl = 51, ckpt_id = 11,

ckpt_flags = 6615264}, ulinkReq = {ckpt_name = {length = 51,

value = "\000\000\000\000\000\000\v\000\000\000\000\000\000\000��d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t", '\0' <repeats 220="" times="">}}, rdsetReq = {ckpt_id = 51, reten_time = 11}, arsetReq = {ckpt_id = 51}, statReq = {ckpt_id = 51},

refCntsetReq = {no_of_nodes = 51, ref_cnt_array = {{ckpt_id = 11, ckpt_ref_cnt = 6615264}, {ckpt_id = 6390432, ckpt_ref_cnt = 5}, {

ckpt_id = 0, ckpt_ref_cnt = 0} <repeats 98="" times="">}}, sec_creatReq = {ckpt_id = 51, lcl_ckpt_id = 11, agent_mdest = 6615264,

sec_attri = {sectionId = 0x6182a0, expirationTime = 38654705669}, init_data = 0x0, init_size = 0}, sec_delReq = {ckpt_id = 51,
sec_id = {idLen = 11, id = 0x64f0e0 "section_4_1"}, lcl_ckpt_id = 6390432, agent_mdest = 38654705669}, sec_expset = {ckpt_id = 51,
sec_id = {idLen = 11, id = 0x64f0e0 "section_4_1"}, exp_time = 6390432}, iter_getnext = {ckpt_id = 51, section_id = {idLen = 11,

id = 0x64f0e0 "section_4_1"}, iter_id = 6390432, filter = SA_CKPT_SECTIONS_ANY, n_secs_trav = 9, exp_tmr = 0}, arr_ntfy = {

client_hdl = 51}, ckpt_write = {type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264, agent_mdest = 6390432, num_of_elmts = 5,
all_repl_evt_flag = 9, data = 0x0, seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0, lcl_ckpt_hdl = 0, client_hdl = 0,

invocation = 0, cpa_sinfo = {to_svc = 0, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0',

data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}}, ckpt_read = {type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264,

agent_mdest = 6390432, num_of_elmts = 5, all_repl_evt_flag = 9, data = 0x0, seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0,

lcl_ckpt_hdl = 0, client_hdl = 0, invocation = 0, cpa_sinfo = {to_svc = 0, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {

length = 0 '\0', data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}}, ckpt_sync = {ckpt_id = 51, lcl_ckpt_hdl = 11,

client_hdl = 6615264, invocation = 6390432, cpa_sinfo = {to_svc = 5, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0',

data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}, ckpt_read_ack = {ckpt_id = 51, mds_dest = 11}, ckpt_info = {error = 51,

ckpt_id = 11, is_active_exists = 224, active_dest = 6390432, dest_cnt = 5, dest_list = 0x0, attributes = {creationFlags = 0,

checkpointSize = 0, retentionDuration = 0, maxSections = 0, maxSectionSize = 0, maxSectionIdSize = 0}, ckpt_rep_create = false},

ckpt_mem_size = {ckpt_id = 51, ckpt_used_size = 11, error = 0}, ckpt_sections = {ckpt_id = 51, ckpt_num_sections = 11, error = 0},
ckpt_add = {ckpt_id = 51, mds_dest = 11, active_dest = 6615264, attributes = {creationFlags = 6390432, checkpointSize = 38654705669,

retentionDuration = 0, maxSections = 0, maxSectionSize = 0, maxSectionIdSize = 0}, ckpt_flags = 0, is_cpnd_restart = false,

dest_cnt = 0, dest_list = 0x0}, ckpt_del = {ckpt_id = 51, mds_dest = 11}, ckpt_create = {ckpt_name = {length = 51,

value = "\000\000\000\000\000\000\v\000\000\000\000\000\000\000��d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t", '\0' <repeats 220="" times="">}, ckpt_info = {error = 0, ckpt_id = 0, is_active_exists = false, active_dest = 0, dest_cnt = 0, dest_list = 0x0,
attributes = {creationFlags = 0, checkpointSize = 0, retentionDuration = 0, maxSections = 0, maxSectionSize = 0,

maxSectionIdSize = 0}, ckpt_rep_create = false}}, ckpt_destroy = {ckpt_id = 51}, ckpt_ulink = {ckpt_id = 51}, rdset = {

ckpt_id = 51, reten_time = 11, type = 6615264}, active_set = {ckpt_id = 51, mds_dest = 11}, cl_ack = {error = 51}, ulink_ack = {
error = 51}, rdset_ack = {error = 51}, crset_ack = {error = 51}, arep_ack = {error = 51}, destroy_ack = {error = 51},

cpnd_restart = {ckpt_id = 51}, cpnd_restart_done = {ckpt_id = 51, mds_dest = 11, active_dest = 6615264, attributes = {

creationFlags = 6390432, checkpointSize = 38654705669, retentionDuration = 0, maxSections = 0, maxSectionSize = 0,

—Type <return> to continue, or q <return> to quit—

maxSectionIdSize = 0}, ckpt_flags = 0, is_cpnd_restart = false, dest_cnt = 0, dest_list = 0x0}, stat_get = {ckpt_id = 51},

status = {error = 51, ckpt_id = 11, status = {checkpointCreationAttributes = {creationFlags = 6615264, checkpointSize = 6390432,

retentionDuration = 38654705669, maxSections = 0, maxSectionSize = 0, maxSectionIdSize = 0}, numberOfSections = 0,

memoryUsed = 0}}, active_sec_creat = {ckpt_id = 51, lcl_ckpt_id = 11, agent_mdest = 6615264, sec_attri = {sectionId = 0x6182a0,
expirationTime = 38654705669}, init_data = 0x0, init_size = 0}, sec_creat_rsp = {error = 51}, active_sec_creat_rsp = {

ckpt_id = 51, sec_id = {idLen = 11, id = 0x64f0e0 "section_4_1"}, error = 6390432, lcl_ckpt_id = 38654705669, agent_mdest = 0},

sec_delete_req = {ckpt_id = 51, sec_id = {idLen = 11, id = 0x64f0e0 "section_4_1"}, error = 6390432, lcl_ckpt_id = 38654705669,

agent_mdest = 0}, sec_delete_rsp = {error = 51}, sec_iter_req = {ckpt_id = 51}, sec_exp_set = {ckpt_id = 51, sec_id = {idLen = 11,

id = 0x64f0e0 "section_4_1"}, exp_time = 6390432}, sec_exp_rsp = {error = 51}, sync_req = {ckpt_id = 51, lcl_ckpt_hdl = 11,

client_hdl = 6615264, invocation = 6390432, cpa_sinfo = {to_svc = 5, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0',

data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}, ckpt_nd2nd_sync = {type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264,

agent_mdest = 6390432, num_of_elmts = 5, all_repl_evt_flag = 9, data = 0x0, seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0,

lcl_ckpt_hdl = 0, client_hdl = 0, invocation = 0, cpa_sinfo = {to_svc = 0, dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {

length = 0 '\0', data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}}, active_sync_rsp = {error = 51}, ckpt_nd2nd_data = {

type = 51, ckpt_id = 11, lcl_ckpt_id = 6615264, agent_mdest = 6390432, num_of_elmts = 5, all_repl_evt_flag = 9, data = 0x0,
seqno = 0, last_seq = 0 '\0', ckpt_sync = {ckpt_id = 0, lcl_ckpt_hdl = 0, client_hdl = 0, invocation = 0, cpa_sinfo = {to_svc = 0,

dest = 0, stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0', data = '\0' <repeats 11="" times="">}}, is_ckpt_open = false}},

ckpt_nd2nd_data_rsp = {type = 51, num_of_elmts = 0, size = 11, error = 0, ckpt_id = 6615264, error_index = 6390432,

from_svc = 38654705669, info = {write_err_index = 0x0, read_mapping = 0x0, read_data = 0x0, ovwrite_error = {error = 0}}},

getnext_req = {ckpt_id = 51, section_id = {idLen = 11, id = 0x64f0e0 "section_4_1"}, iter_id = 6390432, filter = SA_CKPT_SECTIONS_ANY,

n_secs_trav = 9, exp_tmr = 0}, ckpt_nd2nd_getnext_rsp = {ckpt_id = 51, iter_id = 11, error = 6615264, sect_desc = {sectionId = {

idLen = 33440, id = 0x900000005 <Address 0x900000005="" out="" of="" bounds="">}, expirationTime = 0, sectionSize = 0, sectionState = 0,

lastUpdate = 0}, n_secs_trav = 0}, mds_info = {change = 51, dest = 11, svc_id = 6615264, node_id = 0, role = 6390432}, tmr_info = {

type = 51, ckpt_id = 11, lcl_sec_id = 6615264, agent_dest = 6390432, write_type = 5, sinfo = {to_svc = 0, dest = 0,

stype = MDS_SENDTYPE_SND, ctxt = {length = 0 '\0', data = '\0' <repeats 11="" times="">}}, invocation = 0, lcl_ckpt_hdl = 0,

cpnd_tmr = 0x0}, ckptListUpdate = {client_hdl = 51, ckpt_name = {length = 11,

value = "\000\000\000\000\000\000��d\000\000\000\000\000�\202a\000\000\000\000\000\005\000\000\000\t", '\0' <repeats 228="" times="">}}}}

(gdb) p *sinfo
$2 = {to_svc = 18, dest = 566314965155865, stype = MDS_SENDTYPE_SNDRSP, ctxt = {length = 12 '\f',

data = "\000\000\001\n\000\002\003\017zT@\031"}}

The issue is reproducible with the attached application

2 Attachments

Related

Tickets: #242
Wiki: ChangeLog-4.5.2
Wiki: ChangeLog-4.6.1

Discussion

  • Maxim

    Maxim - 2014-05-07

    Hi,

    I'm faced the same problem in 4.4.0 release. I use the attached simple patch to avoid the coredump. Initial testing shows no problem after patching.

    Could you pelase confirm that the patch is valid? Is it possible that the patch will cause some side-effects?

    When are you going to determine the milestone for the fix?

     
  • Anders Bjornerstedt

    • Type: defect --> enhancement
     
  • A V Mahesh (AVM)

    • Type: enhancement --> defect
    • Milestone: future --> 4.7-Tentative
     
  • A V Mahesh (AVM)

    Need to reproduce on current staging , if issue exist need to be fixed in 4.7 release

     
  • A V Mahesh (AVM)

    • Milestone: 4.7-Tentative --> 4.5.2
     
  • A V Mahesh (AVM)

    • status: assigned --> review
    • Attachments has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,2 @@
     checkpoint_app1.c (12.9 kB; application/octet-stream)
    +ticket242_app.c (10.0 kB; text/plain)
    
     
  • A V Mahesh (AVM)

    The attached test application (ticket242_app.c) will reproduce the problem .

    #gcc ticket242_app.c -o ckpt_sect_threads -lSaCkpt
    #./ckpt_sect_threads

     
  • A V Mahesh (AVM)

    changeset: 6998:bebc2783183f
    branch: opensaf-4.5.x
    tag: tip
    parent: 6982:3cc375475384
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 11:32:25 2015 +0530
    summary: cpsv: introduced null check for SecPtr to prevent ckptnd crashed (multithreading)app [#242]

    changeset: 6997:ae65b0ffa596
    branch: opensaf-4.6.x
    parent: 6990:edc41730df45
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 11:32:11 2015 +0530
    summary: cpsv: introduced null check for SecPtr to prevent ckptnd crashed (multithreading)app [#242]

    changeset: 6996:e5bb3f7120eb
    branch: opensaf-4.7.x
    parent: 6993:0b8a5474d3f7
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 11:31:54 2015 +0530
    summary: cpsv: introduced null check for SecPtr to prevent ckptnd crashed (multithreading)app [#242]

    changeset: 6995:fe634f270a98
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 10:12:18 2015 +0530
    summary: cpsv: introduced null check for SecPtr to prevent ckptnd crashed (multithreading)app [#242]

     

    Related

    Tickets: #242

  • A V Mahesh (AVM)

    • status: review --> fixed
     
  • A V Mahesh (AVM)

    searching for changes
    changeset: 7003:fced4b3c5341
    parent: 7000:9d30fa46a7a5
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 14:35:57 2015 +0530
    summary: cpsv: review comments on changeset 6995 [#242]

    changeset: 7004:99e62804cd2c
    branch: opensaf-4.7.x
    parent: 6999:2767588ed092
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 14:36:45 2015 +0530
    summary: cpsv: review comments on changeset 6996 [#242]

    changeset: 7005:a9be7ddf73f3
    branch: opensaf-4.6.x
    parent: 7002:fa5da6b01f61
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 14:37:00 2015 +0530
    summary: cpsv: review comments on changeset 6997 [#242]

    changeset: 7006:4e4f9241483d
    branch: opensaf-4.5.x
    tag: tip
    parent: 7001:32075f5c5570
    user: A V Mahesh mahesh.valla@oracle.com
    date: Tue Oct 13 14:37:19 2015 +0530
    summary: cpsv: review comments on changeset 6998 [#242]

     

    Related

    Tickets: #242


Log in to post a comment.