Menu

#3322 log: log agent in main process is disabled after child process exits

5.23.03
fixed
None
defect
log
-
minor
False
2022-11-25
2022-10-05
No

While using log agent, if the new process was created by forking current process, both processes have a same mds address. If the child process exits, the destructor of log agent is invoked. It will unregister with mds so all other subscribing services will detect this log agent is down. However the main process is still using that mds address, all requests from main process become invalid because logd thinks this log agent has already been down.

Log analysis:

  • Main process initializes the mds address:
<143>1 2022-10-05T10:41:05.537677+02:00 SC-2 logtest 763 loga [meta sequenceId="89"] 763:log/agent/lga_mds.cc:1287 >> lga_mds_init 
<143>1 2022-10-05T10:41:05.537782+02:00 SC-2 logtest 763 loga [meta sequenceId="90"] 763:log/agent/lga_mds.cc:1334 << lga_mds_init 
<143>1 2022-10-05T10:41:05.541101+02:00 SC-2 logtest 763 loga [meta sequenceId="156"] 772:log/agent/lga_agent.cc:167 >> ~LogAgent 
<143>1 2022-10-05T10:41:05.54126+02:00 SC-2 logtest 763 loga [meta sequenceId="157"] 772:log/agent/lga_state.cc:160 >> stop_recovery2_thread 
<143>1 2022-10-05T10:41:05.541297+02:00 SC-2 logtest 763 loga [meta sequenceId="158"] 772:log/agent/lga_state.cc:166 TR stop_recovery2_thread RecoveryState::kNormal no thread to stop
<143>1 2022-10-05T10:41:05.541315+02:00 SC-2 logtest 763 loga [meta sequenceId="159"] 772:log/agent/lga_state.cc:183 << stop_recovery2_thread 
<143>1 2022-10-05T10:41:05.541322+02:00 SC-2 logtest 763 loga [meta sequenceId="160"] 772:log/agent/lga_util.cc:125 >> lga_shutdown 
<143>1 2022-10-05T10:41:05.541329+02:00 SC-2 logtest 763 loga [meta sequenceId="161"] 772:log/agent/lga_mds.cc:1351 >> lga_mds_deinit 
<143>1 2022-10-05T10:41:05.541573+02:00 SC-2 logtest 763 loga [meta sequenceId="162"] 772:log/agent/lga_mds.cc:1362 << lga_mds_deinit 
  • Logd detects this log agent is down and delete all clients of this log agent:
<143>1 2022-10-05T10:41:05.541593+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3386"] 452:log/logd/lgs_mds.cc:1230 T8 MDS DOWN dest: 2020f3aafb5a7, node ID: 2020f, svc_id: 21
<143>1 2022-10-05T10:41:05.541636+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3387"] 447:log/logd/lgs_evt.cc:415 >> proc_lga_updn_mds_msg 
<143>1 2022-10-05T10:41:05.541648+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3388"] 447:log/logd/lgs_evt.cc:436 TR proc_lga_updn_mds_msg: LGSV_LGS_EVT_LGA_DOWN mds_dest = 2020f3aafb5a7
<143>1 2022-10-05T10:41:05.541656+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3389"] 447:log/logd/lgs_evt.cc:338 >> lgs_client_delete_by_mds_dest: mds_dest 2020f3aafb5a7
<143>1 2022-10-05T10:41:05.541663+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3390"] 447:log/logd/lgs_evt.cc:191 >> lgs_client_delete: client_id 9
<143>1 2022-10-05T10:41:05.541678+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3391"] 447:log/logd/lgs_evt.cc:213 T4 client_id: 9, REMOVE stream id: 2
<143>1 2022-10-05T10:41:05.541686+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3392"] 447:log/logd/lgs_stream.cc:856 >> log_stream_close: safLgStrCfg=saLogSystem,safApp=safLogService
<143>1 2022-10-05T10:41:05.541713+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3393"] 447:log/logd/lgs_stream.cc:922 << log_stream_close: rc=0, numOpeners=7
<143>1 2022-10-05T10:41:05.541737+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3394"] 447:log/logd/lgs_evt.cc:239 << lgs_client_delete 
<143>1 2022-10-05T10:41:05.541744+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3395"] 447:log/logd/lgs_evt.cc:348 << lgs_client_delete_by_mds_dest 
  • The main process send a writting request to logd:
<143>1 2022-10-05T10:41:07.541457+02:00 SC-2 logtest 763 loga [meta sequenceId="182"] 763:log/agent/lga_mds.cc:1439 >> lga_mds_msg_async_send 
<143>1 2022-10-05T10:41:07.541488+02:00 SC-2 logtest 763 loga [meta sequenceId="183"] 763:log/agent/lga_mds.cc:789 >> lga_mds_enc 
<143>1 2022-10-05T10:41:07.541516+02:00 SC-2 logtest 763 loga [meta sequenceId="184"] 763:log/agent/lga_mds.cc:820 T2 msgtype: 0
<143>1 2022-10-05T10:41:07.541524+02:00 SC-2 logtest 763 loga [meta sequenceId="185"] 763:log/agent/lga_mds.cc:834 T2 api_info.type: 4
<143>1 2022-10-05T10:41:07.541533+02:00 SC-2 logtest 763 loga [meta sequenceId="186"] 763:log/agent/lga_mds.cc:862 << lga_mds_enc: total_bytes = 108
<143>1 2022-10-05T10:41:07.541581+02:00 SC-2 logtest 763 loga [meta sequenceId="187"] 763:log/agent/lga_mds.cc:1462 << lga_mds_msg_async_send 
  • Logd skips the request due to invalid client id:
<143>1 2022-10-05T10:41:07.541929+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3420"] 447:log/logd/lgs_evt.cc:1313 >> proc_write_log_async_msg: client_id 9, stream ID 2, node_name = SC-2
<143>1 2022-10-05T10:41:07.541941+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3421"] 447:log/logd/lgs_evt.cc:109 TR clm_node_id delete  failed : 9
<143>1 2022-10-05T10:41:07.541948+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3422"] 447:log/logd/lgs_evt.cc:114 TR client_id: 9 lookup failed
<143>1 2022-10-05T10:41:07.541954+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3423"] 447:log/logd/lgs_evt.cc:1323 TR Bad client ID: 9
<143>1 2022-10-05T10:41:07.541962+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3424"] 447:log/logd/lgs_util.cc:387 >> lgs_send_write_log_ack 
<141>1 2022-10-05T10:41:07.542052+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3425"] 447:log/logd/lgs_util.cc:406 NO Failed (2) to send of WRITE ack to: 2020f3aafb5a7
<143>1 2022-10-05T10:41:07.542073+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3426"] 447:log/logd/lgs_util.cc:408 << lgs_send_write_log_ack 
<143>1 2022-10-05T10:41:07.54208+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3427"] 447:log/logd/lgs_util.cc:416 >> lgs_free_write_log 
<143>1 2022-10-05T10:41:07.542087+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3428"] 447:log/logd/lgs_util.cc:438 << lgs_free_write_log 
<143>1 2022-10-05T10:41:07.542093+02:00 SC-1 osaflogd 447 osaflogd [meta sequenceId="3429"] 447:log/logd/lgs_evt.cc:0 << proc_write_log_async_msg
2 Attachments

Related

Wiki: ChangeLog-5.23.03

Discussion

  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-10-06
    • status: assigned --> review
     
  • Gary Lee

    Gary Lee - 2022-11-18
    • Milestone: 5.22.11 --> 5.23.03
     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-11-25

    commit 4e1b84f156bd23965b706784a1ebd7ad0b5b4cfc (HEAD -> develop, origin/develop, ticket-3322)
    Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
    Date: Thu Nov 17 14:08:21 2022 +0700

    log: Testcase for closing log stream in destructor [#3322]
    
    Add a testcase for closing log stream in the destructor of a cpp object.
    

    commit ec25bd1c91b3efa5d041ced24b5f23453cebb8ea
    Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
    Date: Wed Nov 2 10:53:36 2022 +0700

    log: Create a test case for forked processes [#3322]
    
    Create a test case that the process running log agent is forked
    to a new process.
    

    commit cfb1e6436b0209f82ccd3d0c1989d67051b31c8e
    Author: hieu.h.hoang hieu.h.hoang@dektech.com.au
    Date: Thu Nov 17 08:51:57 2022 +0700

    log: Update the destructor and constructor of log agent [#3322]
    
    - The destructor of the static objects will be called in the reverse
    order of constructors. If other destructors call a function of the
    log agent after the destructor of the log agent, it could be crashed.
    Solution is to construct the log agent object early.
    
    - If a process is forked, both processes share all created sockets.
    Cleaning resources in one process causes undefined behaviours in
    other process. Solution is to empty the destructor of log agent.
    Those resources are manually cleaned by the api functions of the
    log agent.
    
     
  • Hieu Hong Hoang

    Hieu Hong Hoang - 2022-11-25
    • status: review --> fixed
     

Log in to post a comment.