When IMMND is down or unregisters MDS (for headless), pbe may call exit() in both main thread and MDS thread.
09:26:40.343 SC-1 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle
09:26:40.343 SC-1 osafimmpbed: IN IMM PBE process EXITING... ### main thread
09:26:40.347 SC-1 osafimmnd[213]: WA SC Absence IS allowed:900 IMMD service is DOWN
09:26:40.347 SC-1 osafimmnd[213]: NO IMMD SERVICE IS DOWN, HYDRA IS CONFIGURED => UNREGISTERING IMMND form MDS
09:26:40.348 SC-1 osafimmnd[213]: NO Removing client id:1050002010f sv_id:27
09:26:40.349 SC-1 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting ### MDS thread
11:34:20.183 SC-2 osafimmpbed: NO IMM PBE received SIG_TERM, closing db handle
11:34:20.184 SC-2 osafimmpbed: IN IMM PBE process EXITING... ### main thread
11:34:20.194 SC-2 osafimmnd[213]: exiting for shutdown
11:34:20.195 SC-2 osafimmpbed: WA PBE lost contact with parent IMMND - Exiting ### MDS thread
In exit(), gcov_do_dump() is called. Calling gcov_do_dump() in both threads causes PBE to carsh.
Thread 3 (Thread 0x7f3a4bc74740 (LWP 242)):
#0 0x00007f3a4a62270b in do_fcntl (arg=<optimized out>, cmd=7, fd=22) at ../sysdeps/unix/sysv/linux/fcntl.c:31
#1 __libc_fcntl (fd=22, cmd=<optimized out>) at ../sysdeps/unix/sysv/linux/fcntl.c:75
#2 0x00007f3a4aea1621 in __gcov_open () from /usr/local/lib/libopensaf_core.so.0
#3 0x00007f3a4aea220e in gcov_do_dump () from /usr/local/lib/libopensaf_core.so.0
#4 0x00007f3a4aea3172 in gcov_exit () from /usr/local/lib/libopensaf_core.so.0
#5 0x00007f3a4a28336a in __cxa_finalize (d=0x7f3a4b0cbe20) at cxa_finalize.c:56
#6 0x00007f3a4ae05da3 in __do_global_dtors_aux () from /usr/local/lib/libopensaf_core.so.0
#7 0x00007ffe44419580 in ?? ()
#8 0x00007f3a4ba6ec17 in _dl_fini () at dl-fini.c:235
Backtrace stopped: frame did not save the PC
Thread 2 (Thread 0x7f3a4bc71b00 (LWP 245)):
...
Thread 1 (Thread 0x7f3a4bc51b00 (LWP 247)):
#0 0x00007f3a49b1ee78 in __gcov_read_summary () from /usr/local/lib/opensaf/libimm_common.so.0
#1 0x00007f3a49b1fefe in gcov_do_dump () from /usr/local/lib/opensaf/libimm_common.so.0
#2 0x00007f3a49b20592 in gcov_exit () from /usr/local/lib/opensaf/libimm_common.so.0
#3 0x00007f3a4a282ff8 in __run_exit_handlers (status=1, listp=0x7f3a4a60c5f8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#4 0x00007f3a4a283045 in __GI_exit (status=<optimized out>) at exit.c:104
#5 0x00007f3a4b38c6b0 in imma_mark_clients_stale (cb=0x7f3a4b5d33c0 <imma_cb>, mark_exposed=false) at src/imm/agent/imma_db.cc:690
#6 0x00007f3a4b392973 in imma_mds_svc_evt (cb=0x7f3a4b5d33c0 <imma_cb>, svc_evt=0x7f3a44000a90) at src/imm/agent/imma_mds.cc:413
#7 0x00007f3a4b39228d in imma_mds_callback (info=0x7f3a44000a80) at src/imm/agent/imma_mds.cc:221
#8 0x00007f3a4ae61287 in mds_mcm_user_event_callback (local_svc_hdl=562945658454043, pwe_id=1, svc_id=25, role=V_DEST_RL_ACTIVE, vdest_id=65535, adest=564113889558741, event_type=NCSMDS_DOWN, svc_sub_part_ver=1 '\001', archword_type=10 '\n') at src/mds/mds_c_api.c:4555
#9 0x00007f3a4ae5ed76 in mds_mcm_svc_down (pwe_id=1, svc_id=25, role=V_DEST_RL_ACTIVE, scope=NCSMDS_SCOPE_NONE, vdest_id=65535, vdest_policy=NCS_VDEST_TYPE_N_WAY_ROUND_ROBIN, adest=564113889558741, my_pcon=false, local_svc_hdl=562945658454043, subtn_ref_val=2, svc_sub_part_ver=1 '\001', archword_type=10 '\n') at src/mds/mds_c_api.c:3583
#10 0x00007f3a4ae87ca0 in mds_mdtm_process_recvdata (rcv_bytes=34, buff_in=0x7f3a44003110 "V\022\064V\001\002V\001\004\031\240\033\377\377\240\033\377\377") at src/mds/mds_dt_trans.c:1150
#11 0x00007f3a4ae86ad4 in mdtm_process_poll_recv_data_tcp () at src/mds/mds_dt_trans.c:815
#12 0x00007f3a4ae87599 in mdtm_process_recv_events_tcp () at src/mds/mds_dt_trans.c:995
#13 0x00007f3a4a6196ba in start_thread (arg=0x7f3a4bc51b00) at pthread_create.c:333
#14 0x00007f3a4a34f82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
In imma_mark_clients_stale(), we should not call exit() directly.
Instead, we should mark the handle as exposed and then the main thread of PBE will exit due to ERR_BAD_HANDLE.
In general, exit() shouldn't be called in any library/agent.
develop (5.17.10) [code:bc4979]
release [code:9094ca]
Related
Commit: [9094ca]
Commit: [bc4979]