Menu

#2412 log: refactor handling log client database in log agent

5.17.07
fixed
nobody
None
enhancement
log
lib
major
False
2017-07-27
2017-04-04
Canh Truong
No

In log agent, there is a link list holding all log clients of an application process. Also, in each log client, there is an additional link list holding all log streams which belongs to each log client.

Adding, modifying or deleing the link lists' elements or on sub-items of the client dabases are distrubuted in a lot of places, this could easily cause troubles regarding race condition, deadlock, or risks when adding code that do changes the databases.

So, this ticket intends to remove that concern by doing:
1) Centralizing read/write accesses to the database to one place with its private mutex
2) Use C++ containters to contain and handle databases

And will push the ticket in 02 increments:
1) Convert agent code to C++ without touching any existing logic (looks like what AMF has done it in [#1673])
2) Do #1 and #2 above

Related

Tickets: #1673
Tickets: #2439
Tickets: #2457
Wiki: ChangeLog-5.17.07

Discussion

  • Canh Truong

    Canh Truong - 2017-04-04
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
     Program terminated with signal SIGSEGV, Segmentation fault.
     #0 0x00007f5bb0159d68 in lga_hdl_rec_del (list_head=list_head@entry=0x7f5bb0360288 <lga_cb+40>, rm_node=rm_node@entry=0x7f5b740029e0) at src/log/agent/lga_util.c:651
    
    -When 2 threads do the same action finalize client, log agent remove the client in the list. in one thread, it access to the node of the list and the node maybe delete by another thread. it causes the coredump happen.
    +When 2 threads do the same action finalize client, log agent remove the client in the list. in one thread, it access to the node in the list and this node maybe deleted by another thread. it causes the coredump happen.
    
     
  • Canh Truong

    Canh Truong - 2017-04-04
    • Milestone: 5.0.2 --> future
     
  • elunlen

    elunlen - 2017-04-05
    • Milestone: future --> 5.2.0
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-05
    • summary: log: coredump while finalize client in parallel --> log: refactor handling log client database in log agent
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,7 @@
    -Program terminated with signal SIGSEGV, Segmentation fault.
    -#0 0x00007f5bb0159d68 in lga_hdl_rec_del (list_head=list_head@entry=0x7f5bb0360288 <lga_cb+40>, rm_node=rm_node@entry=0x7f5b740029e0) at src/log/agent/lga_util.c:651
    +In log agent, there is a link list holding all log clients of an pplication process. Also, in each log client, there is an additional link list holding all log streams which belongs to each log client.
    
    -When 2 threads do the same action finalize client, log agent remove the client in the list. in one thread, it access to the node in the list and this node maybe deleted by another thread. it causes the coredump happen.
    +Adding, modifying or deleing the link lists' elements or on sub-items of the client dabases are distrubuted in a lot of places, this could easily cause troubles regarding race condition, deadlock, or risks when adding code that do changes the databases.
    +
    +So, this ticket intends to remove that concern by doing:
    +1) Centralizing read/write accesses to the database to one place with its private mutex
    +2) Use C++ containters to contain and handle databases
    
    • assigned_to: Canh Truong --> Vu Minh Nguyen
    • Type: defect --> enhancement
    • Milestone: 5.2.0 --> future
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-05
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,4 @@
    -In log agent, there is a link list holding all log clients of an pplication process. Also, in each log client, there is an additional link list holding all log streams which belongs to each log client.
    +In log agent, there is a link list holding all log clients of an application process. Also, in each log client, there is an additional link list holding all log streams which belongs to each log client.
    
     Adding, modifying or deleing the link lists' elements or on sub-items of the client dabases are distrubuted in a lot of places, this could easily cause troubles regarding race condition, deadlock, or risks when adding code that do changes the databases.
    
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-05
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -5,3 +5,7 @@
     So, this ticket intends to remove that concern by doing:
     1) Centralizing read/write accesses to the database to one place with its private mutex
     2) Use C++ containters to contain and handle databases
    +
    +And will push the ticket in 02 increments:
    +1) Convert agent code to C++ without touching any existing logic (looks like what AMF has done it in [#1673])
    +2) Do #1 and #2 above
    
     

    Related

    Tickets: #1673

  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-12
    • Milestone: future --> 5.17.08
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-12

    First increment is just sent out for review.

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-19
    • status: accepted --> review
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-24

    commit 7945cef0efaf4f1e8d72722c9c8d3109d2783c9c
    Author: Vu Minh Nguyen vu.m.nguyen@dektech.com.au
    Date: Wed Apr 12 09:25:48 2017 +0200

    log: refactor log agent [#2412]
    
    This is the first increment, major changes:
    1) Convert to C++
    2) Fix compiling error messages
    3) Fix all cpplint & cppcheck messages (few remained, will be fixed in 2nd)
    4) Reorganize header files
    
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-04-28

    Hi,

    With current LOG Agent design, if N log clients open the same log stream, it will create N log stream object, instead of refering to one.

    I am going to change that point, all log clients will share same LOG stream object. That stream object only closes when no client refer to it.

    Will send an V2 patch when done, and I will add more description for each class for more readability.

    /Vu

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-02

    After analyzing, log stream info data is not big. Move that data to shared one could introduce unwanted troubles (e.g: race condition, performance - as we have to create a map b/w log stream handle and its own client, do lookup whenever handling log stream, etc.). So, it could be better to keep current design.

    Instead, the V2 patch will deal with global lga_cb log agent control block. No more lga_cb in any place.

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-08

    I got few comments from Lennart and V3 just sent out to cover them:

    1) Introduce LogServerState - reflect LOG server states.
    2) Improve some method's names for better self-explanation (eg: LogServerIsUp() instead of ScIsUp)
    3) Change "ObjectStatus" to "ObjectState" & convert enum to enum class.
    4) Change "kBeingFree" to "kDeleting"
    5) Don't hide state change when getting MDS events.

    And few other improvement such as fixing typo, or adding more documentation to classes.

    /Vu

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-15

    Update version #4.

    https://sourceforge.net/u/winhvu/review/ci/ab6cf7db983cc97ad4e4b5e39327bd9c72004d11/

    Major Changes:
    1) Remove failed-recovery client is done in LOG application thread instead of recovery thread
    2) Do not lock application thread while recovery thread is on-going if it is operating on successful recovery clients
    3) Make Search<xx>ByHandle() to FetchAndUpdateObjectState() critical section.
    4) Introduce ScopeData class, which is internally to lga_agent.cc, to avoid using goto - easy missing restore the data back if it runs out of scope (e.g: unlock recovery mutex, or restore the object state back).
    5) Enhance documentation.

    /Vu

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-18

    Updated version is just sent out.
    https://sourceforge.net/u/winhvu/review/ci/744109058ebfd01911e62627dcae3a80bd222902/

    Using ObjectState not help to solve completely race condition.

    In this version, introduce object reference counter instead (refer to ref_counter_ attribute). Basically, it has same meaning as previous ObjectState. One different and more strong point is that, reference counter can tell how many theads are refering to the object.
    1) ref_counter_ > 0: have ref_counter_ number of threads are referring to the object
    2) ref_counter_ = -1: object is being deleted
    3) ref_counter_0: same as previous idle state.

    /Vu

     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-24
    • status: review --> fixed
    • assigned_to: Vu Minh Nguyen --> nobody
    • Blocker: --> False
     
  • Vu Minh Nguyen

    Vu Minh Nguyen - 2017-05-24

    [Develop]
    commit e9af9fb5721b137ef6e0a05441dd85569e5faf50
    Author: Vu Minh Nguyen vu.m.nguyen@dektech.com.au
    Date: Wed Apr 12 12:58:32 2017 +0200

    log: refactor log agent - 2nd increment [#2412]
    
     

Log in to post a comment.