OpenSAF / Tickets / #952 immsv: sync data Mbcsv Check pointing can be optimized

A V Mahesh (AVM) - 2014-07-08

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2014-07-08

On 7/7/2014 1:28 PM, Anders Bjornerstedt wrote:
Yes one could optimize this in the sense of not checkpointing the whole message, but one has to checkpoint part of it and the resend done at failover would need to resend the truncated message.

The receiving IMMNDs would need to detect the message as a truncated sync message and discard it.

All this is necessary to not have gaps in the fevs count.

Most fevs messages pass though the IMMD without being unpacked, so the IMMD does not know if a particular message is a sync message or not.

The standby IMMD must be in sync with the active IMMD before the active broadcasts the fevs to the IMMNDs to prevent any gap in the fevs message sequence.

Like all optimizations that are non trivial, whether to do it or not depends on how prioritized it is.

You can test the relative difference in performance by executing the same sync with or without any standby SC, to sync payloads.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2014-07-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2014-07-16

Attached temporary workaround patch I used for my multicast testing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2014-10-02

We also can reduced some overflow by optimizing in FinilizeSync message,
while debugging #1036 I observed ,in some sinario FinilizeSync is containing Committed CCB data , I this which is not required to be sent/sync to peer IMMD & IMMND as part of FinilizeSync message

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anders Bjornerstedt - 2014-10-02
  
  Ccb outcome (commit/abort) is synced.
  This should not be very litle data compared with all the rest.
  
  It is used to redundantly verify the outcome of CCBs.
  The nightmre scenario for me would be some state missmatch within the IMMNDs that resultet
  in a CCB getting derailed (and aborted) at one IMMND but not at others.
  
  Such an error, if it impated CCBs, would at least be caught sooner or later by this check.
  
  /AndersBj
  
  From: A V Mahesh (AVM) [mailto:avmahesh@users.sf.net]
  Sent: den 2 oktober 2014 15:43
  To: opensaf-tickets@lists.sourceforge.net
  Subject: [tickets] [opensaf:tickets] #952 immsv: sync data Mbcsv Check pointing can be optimized
  
  We also can reduced some overflow by optimizing in FinilizeSync message,
  while debugging #1036 I observed ,in some sinario FinilizeSync is containing Committed CCB data , I this which is not required to be sent/sync to peer IMMD & IMMND as part of FinilizeSync message
  
  [tickets:#952]http://sourceforge.net/p/opensaf/tickets/952 immsv: sync data Mbcsv Check pointing can be optimized
  
  Status: unassigned
  Milestone: future
  Created: Tue Jul 08, 2014 03:31 AM UTC by A V Mahesh (AVM)
  Last Updated: Wed Jul 16, 2014 04:16 AM UTC
  Owner: nobody
  
  On 7/7/2014 9:58 AM, A V Mahesh wrote:
  While IMMD broadcasting datasync message (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2)
  to IMMNDs additionally the same message (including large sync data) getting Check pointed to standby director , this means for eachdata`sync FEVS message
  (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2) IMMD is sending two messages one to IMMND's as BCAST and one to peer IMMD as R-BCAST (MBCSV).
  
  Currently in my observation if fault happens when sync is in-progress that sync gets aborted ,
  and the new active is starting a fresh sync. If my understanding is right ,
  why we need to Check point IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 message to peer IMMD including large sync data?
  
  Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/https://sourceforge.net/p/opensaf/tickets
  
  To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
  
  Related
  
  Tickets: ~~#952~~
  Tickets: tickets
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - A V Mahesh (AVM) - 2014-10-02
    
    In one of my test case where I was creating contentious objected creation on bother Active & Standby then triggered fail-over and observed very large data (size:92808 see below ) in FinilizeSync message , if I Stop Object creation on both node and give some delay of 5 minits , and then do fail-over IMMD is sending very small data size:11682 ( see below )
    
    ====================================================================================
    Sep 30 9:53:25.977339 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68734 size:60596
    Sep 30 9:53:26.072524 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68735 size:60594 <==========Sync
    Sep 30 9:53:26.142643 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68736 size:50486
    .........................................
    Sep 30 9:53:26.204038 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68748 size:628
    Sep 30 9:53:26.207858 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68749 size:16
    Sep 30 9:53:26.211459 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68750 size:1716
    Sep 30 9:53:26.215674 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68751 size:11682 <====== with some delay FinilizeSync =============
    
    ====================================================================================
    Sep 30 10:05:31.284906 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78183 size:60560
    Sep 30 10:05:31.365129 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78184 size:60598 <==========Sync
    Sep 30 10:05:31.404772 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78185 size:17820
    Sep 30 10:05:31.407538 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78186 size:16
    .....................
    Sep 30 10:05:31.473186 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78198 size:16
    Sep 30 10:05:31.475273 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78199 size:1716
    Sep 30 10:05:31.480167 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78200 size:92808 <=======with out some delay FinilizeSync============
    ====================================================================================
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Anders Bjornerstedt - 2014-10-02
      
      Yes that makes sense.
      But continuos small ccbs of this kind is not really a realistic test of immsv.
      
      Real CCB usage is typically a burst of small changes or in some cases one large CCB.
      Very rarely is there a continuous stream of new CCBs generated over a long time.
      in fact you could argue that it is incorrect use of the IMM since such streming updates of data
      indicate that it is not really config data.
      
      We do of course then have the case of persistent runtime data.
      But PRT is a strange concept that no one knows what it is good for and no onw should relally use :-)
      
      It is quite important to limit the scope of the indented function for the imm.
      It is not a high write transaction throughput database.
      For reads though faste is always better and worth working more on.
      
      /AndersBj
      
      From: A V Mahesh (AVM) [mailto:avmahesh@users.sf.net]
      Sent: den 2 oktober 2014 16:25
      To: [opensaf:tickets]
      Subject: [opensaf:tickets] Re: #952 immsv: sync data Mbcsv Check pointing can be optimized
      
      In one of my test case where I was creating contentious objected creation on bother Active & Standby then triggered fail-over and observed very large data (size:92808 see below ) in FinilizeSync message , if I Stop Object creation on both node and give some delay of 5 minits , and then do fail-over IMMD is sending very small data size:11682 ( see below )
      
      ====================================================================================
      Sep 30 9:53:25.977339 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68734 size:60596
      Sep 30 9:53:26.072524 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68735 size:60594 <==========Sync
      Sep 30 9:53:26.142643 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68736 size:50486
      .........................................
      Sep 30 9:53:26.204038 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68748 size:628
      Sep 30 9:53:26.207858 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68749 size:16
      Sep 30 9:53:26.211459 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68750 size:1716
      Sep 30 9:53:26.215674 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68751 size:11682 <====== with some delay FinilizeSync =============
      
      ====================================================================================
      Sep 30 10:05:31.284906 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78183 size:60560
      Sep 30 10:05:31.365129 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78184 size:60598 <==========Sync
      Sep 30 10:05:31.404772 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78185 size:17820
      Sep 30 10:05:31.407538 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78186 size:16
      .....................
      Sep 30 10:05:31.473186 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78198 size:16
      Sep 30 10:05:31.475273 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78199 size:1716
      Sep 30 10:05:31.480167 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78200 size:92808 <=======with out some delay FinilizeSync============
      ====================================================================================
      
      [tickets:#952]http://sourceforge.net/p/opensaf/tickets/952 immsv: sync data Mbcsv Check pointing can be optimized
      
      Status: unassigned
      Milestone: future
      Created: Tue Jul 08, 2014 03:31 AM UTC by A V Mahesh (AVM)
      Last Updated: Thu Oct 02, 2014 01:46 PM UTC
      Owner: nobody
      
      On 7/7/2014 9:58 AM, A V Mahesh wrote:
      While IMMD broadcasting datasync message (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2)
      to IMMNDs additionally the same message (including large sync data) getting Check pointed to standby director , this means for eachdata`sync FEVS message
      (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2) IMMD is sending two messages one to IMMND's as BCAST and one to peer IMMD as R-BCAST (MBCSV).
      
      Currently in my observation if fault happens when sync is in-progress that sync gets aborted ,
      and the new active is starting a fresh sync. If my understanding is right ,
      why we need to Check point IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 message to peer IMMD including large sync data?
      
      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/952/https://sourceforge.net/p/opensaf/tickets/952
      
      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/https://sourceforge.net/auth/subscriptions
      
      Related
      
      Tickets: ~~#952~~
      
      alternate
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Anders Bjornerstedt - 2014-10-02
        
        But I should also add that a test using a continuous stream of ccbs is a good stress test of the imm.
        The point is only that we dont need to optimize the implementation a lot to cater for that use case.
        
        /AndersBj
        
        From: Anders Bjornerstedt [mailto:andersbj@users.sf.net]
        Sent: den 2 oktober 2014 16:37
        To: opensaf-tickets@lists.sourceforge.net
        Subject: [tickets] [opensaf:tickets] Re: #952 immsv: sync data Mbcsv Check pointing can be optimized
        
        Yes that makes sense.
        But continuos small ccbs of this kind is not really a realistic test of immsv.
        
        Real CCB usage is typically a burst of small changes or in some cases one large CCB.
        Very rarely is there a continuous stream of new CCBs generated over a long time.
        in fact you could argue that it is incorrect use of the IMM since such streming updates of data
        indicate that it is not really config data.
        
        We do of course then have the case of persistent runtime data.
        But PRT is a strange concept that no one knows what it is good for and no onw should relally use :-)
        
        It is quite important to limit the scope of the indented function for the imm.
        It is not a high write transaction throughput database.
        For reads though faste is always better and worth working more on.
        
        /AndersBj
        
        From: A V Mahesh (AVM) [mailto:avmahesh@users.sf.net]
        Sent: den 2 oktober 2014 16:25
        To: [opensaf:tickets]
        Subject: [opensaf:tickets] Re: #952 immsv: sync data Mbcsv Check pointing can be optimized
        
        In one of my test case where I was creating contentious objected creation on bother Active & Standby then triggered fail-over and observed very large data (size:92808 see below ) in FinilizeSync message , if I Stop Object creation on both node and give some delay of 5 minits , and then do fail-over IMMD is sending very small data size:11682 ( see below )
        
        ====================================================================================
        Sep 30 9:53:25.977339 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68734 size:60596
        Sep 30 9:53:26.072524 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68735 size:60594 <==========Sync
        Sep 30 9:53:26.142643 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68736 size:50486
        .........................................
        Sep 30 9:53:26.204038 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68748 size:628
        Sep 30 9:53:26.207858 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68749 size:16
        Sep 30 9:53:26.211459 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68750 size:1716
        Sep 30 9:53:26.215674 osafimmd [868:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:68751 size:11682 <====== with some delay FinilizeSync =============
        
        ====================================================================================
        Sep 30 10:05:31.284906 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78183 size:60560
        Sep 30 10:05:31.365129 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78184 size:60598 <==========Sync
        Sep 30 10:05:31.404772 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78185 size:17820
        Sep 30 10:05:31.407538 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78186 size:16
        .....................
        Sep 30 10:05:31.473186 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78198 size:16
        Sep 30 10:05:31.475273 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78199 size:1716
        Sep 30 10:05:31.480167 osafimmd [21206:immd_evt.c:0273] T5 immd_evt_proc_fevs_req send_count:78200 size:92808 <=======with out some delay FinilizeSync============
        ====================================================================================
        
        [tickets:#952]http://sourceforge.net/p/opensaf/tickets/952http://sourceforge.net/p/opensaf/tickets/952 immsv: sync data Mbcsv Check pointing can be optimized
        
        Status: unassigned
        Milestone: future
        Created: Tue Jul 08, 2014 03:31 AM UTC by A V Mahesh (AVM)
        Last Updated: Thu Oct 02, 2014 01:46 PM UTC
        Owner: nobody
        
        On 7/7/2014 9:58 AM, A V Mahesh wrote:
        While IMMD broadcasting datasync message (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2)
        to IMMNDs additionally the same message (including large sync data) getting Check pointed to standby director , this means for eachdata`sync FEVS message
        (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2) IMMD is sending two messages one to IMMND's as BCAST and one to peer IMMD as R-BCAST (MBCSV).
        
        Currently in my observation if fault happens when sync is in-progress that sync gets aborted ,
        and the new active is starting a fresh sync. If my understanding is right ,
        why we need to Check point IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 message to peer IMMD including large sync data?
        
        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/952/https://sourceforge.net/p/opensaf/tickets/952https://sourceforge.net/p/opensaf/tickets/952
        
        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/https://sourceforge.net/auth/subscriptionshttps://sourceforge.net/auth/subscriptions
        
        [tickets:#952]http://sourceforge.net/p/opensaf/tickets/952 immsv: sync data Mbcsv Check pointing can be optimized
        
        Status: unassigned
        Milestone: future
        Created: Tue Jul 08, 2014 03:31 AM UTC by A V Mahesh (AVM)
        Last Updated: Thu Oct 02, 2014 01:46 PM UTC
        Owner: nobody
        
        On 7/7/2014 9:58 AM, A V Mahesh wrote:
        While IMMD broadcasting datasync message (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2)
        to IMMNDs additionally the same message (including large sync data) getting Check pointed to standby director , this means for eachdata`sync FEVS message
        (IMMND_EVT_D2ND_GLOB_FEVS_REQ_2) IMMD is sending two messages one to IMMND's as BCAST and one to peer IMMD as R-BCAST (MBCSV).
        
        Currently in my observation if fault happens when sync is in-progress that sync gets aborted ,
        and the new active is starting a fresh sync. If my understanding is right ,
        why we need to Check point IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 message to peer IMMD including large sync data?
        
        Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to http://sourceforge.net/p/opensaf/tickets/http://sourceforge.net/p/opensaf/tickets
        
        To unsubscribe from further messages, a project admin can change settings at http://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
        
        Related
        
        Tickets: ~~#952~~
        Tickets: tickets
        
        alternate
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2014-10-02

Attaching workaround patch provided by Neel for reference.

FinilizeSyncOptimizingWorkaround.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-04

status: unassigned --> assigned

assigned_to: Neelakanta Reddy

Part: - --> d
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-10

when the Sync happens the following are the events are broad-casted by IMMD :

class create at the time of sync, IMMND_EVT_A2ND_CLASS_CREATE is broad-casted.IMMD sets IMMND_EVT_D2ND_GLOB_FEVS_REQ for mbcsv check-pointing and broadcasting

when searchInit is called for syncing objects of the class then, IMMD_EVT_ND2D_SYNC_FEVS_BASE will be broadcasted.
IMMD sets IMMND_EVT_D2ND_GLOB_FEVS_REQ for mbcsv check-pointing and broadcasting

For each object the sync process calls immsv_sync.immsv_sync sends a batch of the message equivalent to IMMSV_DEFAULT_MAX_SYNC_BATCH_SIZE(which will be 58931 bytes). The event sent is IMMND_EVT_A2ND_OBJ_SYNC_2.
IMMD uses IMMND_EVT_D2ND_GLOB_FEVS_REQ_2 for mbcsv check-pointing and broadcasting.

From buffering perspective the first and second event types(discussed above) will have less message buffer and can be ignored when the performance is considered. where as the third event mentioned above will have a maximum buffer size, and will have performance impacts. Because of which sync is delayed.

The solution is, when check-pointing to standby IMMD for IMMND_EVT_D2ND_GLOB_FEVS_REQ_2, the fevs message buffer will be set to NULL and message size will be set to 0. so, that the MBCSV check-pointing happens only for header.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anders Bjornerstedt - 2015-06-10

Sounds like a possible, simple and safe enhancement.

It does of course require rigorous system test, of fail-over in particular.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-10

status: assigned --> accepted

Milestone: future --> 4.7-Tentative
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-12

While testing the patch, more performance improvement is observed when MDS_TIPC_MCAST_ENABLED=0 (i.e MCAST is disabled)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-12

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-15

status: review --> accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-06-24

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-07-08

changeset: 6642:3647c1ea307b
tag: tip
user: Neelakanta Reddy reddy.neelakanta@oracle.com
date: Wed Jul 08 12:01:18 2015 +0530
summary: imm :checkpoint only FEVS header for sync messages [#952]

Related

Tickets: ~~#952~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Neelakanta Reddy - 2015-07-08

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

immsv: sync data Mbcsv Check pointing can be optimized

Milestone

Searches

Help

#952 immsv: sync data Mbcsv Check pointing can be optimized

Related

Discussion

Related

Related

Related

Related