Re: [jgroups-users] Can probe be used to recover network splits?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Fri, Mar 18, 2016 at 7:07 AM, Questions/problems related to using
JGroups <jav...@li...> wrote:

> Hi Pooja,
>
> On 17/03/16 17:09, Questions/problems related to using JGroups wrote:
> >
> >
> > On Thu, Mar 17, 2016 at 10:59 AM, Questions/problems related to using
> > JGroups <jav...@li...
> > <mailto:jav...@li...>> wrote:
> >
> >
> >
> >     On 17/03/16 15:47, Questions/problems related to using JGroups wrote:
> >     > HI Bela,
> >     > Thank you for your response. This is very helpful. I will try it
> out !!
> >     >
> >     > The cluster does recover automatically with MERGE3. Only thing it
> took 6
> >     > hours in one case and 2 hours in other case to finally merge into 1
> >     > cluster.
> >
> >     This should not be the case. Did you look at TRACE logs for GMS and
> >     MERGE3?
> >     MERGE3 with min and max around 10 and 30 seconds should take 1-2
> rounds,
> >     so ca. 60 secs max to merge when the network is fine.
> >
> > I have good enough min_interval and max_intervals. See below:
> > <MERGE3
> > max_interval="200000"
> > max_participants_in_merge="200"
> > min_interval="20000" />
>
>
> [Bela] OK
>
>
> > [PoojaK]: I could not enable TRACE for GMS and MERGE3 since it joined in
> > late after network had recovered.
>
>
> [Bela] GMS has an operation to print the previous N views (N defined by
> GMS.num_prev_views, default: 10):
>
> probe.sh op=GMS.printPreviousViews
>

>
> You could also look at the number of merges in MERGE3. If this number
> was small, then something is wrong, e.g. check_interval might be too small.
>
>
>
> >     I'm sure there's something else going on, preventing a merge. Have
> you
> >     checked your thread pools? Do they have queues enabled perhaps? Do
> you
> >     configure the internal and/or timer pools as well?
> >
> > Thread pool queue is enabled  for Regular thread pools but disabled for
> > OOB.
> > I have internal and timer threads configured too.
>
>
> [Bela] OK. An idea is to check with probe.sh (next time this occurs)
> what the size of the pools is (plus the active sizes)
>

> *[PoojaK] : the internal_queue_size and time_queue_size is all 0
> everywhere but the OOB_queue_size: most of the nodes reported queue size to
> be around 1000 and oob.thread_pool.queue_max_size is 1000. Does views() go
> through OOB channel? I think I should disable the queue on OOB thread pool
> for sure. so that messages reach right away. *

>
> >     What's you config?
> >
> >
> > Below is my config:
> >
> >         <TCP
> >         thread_pool.enabled="true"
> >         thread_pool.min_threads="50"
> >         thread_pool.max_threads="100"
> >         thread_pool.keep_alive_time="5000"
> >         thread_pool.queue_enabled="true"
> >         thread_pool.queue_max_size="50000"
> >         thread_pool.rejection_policy="Discard"
>
> [Bela]
> So if you have more than 50 threads active at the same time, new
> messages will be queued up to 50000. Is 50 the max number of nodes that
> are sending?
> An idea might be to disable the queue and up max_size to a large number.
> OTOH, you won't need more than (max_nodes + B) threads where B is a
> small buffer (say 3) as messages from the same sender S will simply get
> added to S's table is a thread is already processing a message (or
> batch) from S.
>
> *[PoojaK]: 50 was kept as higher limit since this system was going to
increase to 50 but currently its 32 members. But when I was checking this
system again I noticed min_threads are only 10 in current system where
issue happen whereas we have 32 nodes. This might be the main culprit :(.
That must be manual error (and I will get this corrected right away to
35+). *

*Also, most nodes are running with rejection_policy as Run for OOB and
regular thread pools. This might be another culprit. I need to get this
changed to "Discard" or "Run"*

*For the queues, do you suggest disabling OOB , regular , timer and
internal thread pool queues? or just regular and OOB?*

>
>
> >         <TCPPING
> >         timeout="200000"
>
> [Bela] 3+ minutes? Why so high? This means the first member will take 3+
> minutes to start!
>
*[PoojaK]: This is due to design in current version of JGroups that we are
using (3.3.5.final) where discovery isnt  multi-threaded and misses out
some members when we have large no of nodes when most of the nodes are
powered off. This is already changed in later versions. *

>
> >         initial_hosts=""
>
> [Bela] I assume you're setting this value at startup time?
> * [PoojaK]: Yes and we have been very accurate with this to not see "No
> Physical Address found" :-)*
>
> >         <MERGE3
> >         max_interval="200000"
> >         max_participants_in_merge="200"
> >         min_interval="20000" />
>
> [Bela] So on average it will take 200000 * 1.6 (check_interval,
> computed) to start a merge
>
*[PoojaK]: I checked the check_interval is 320 seconds for us right now.Do
you think this is adding to delay in merging? *

>
>
> >         <FD_ALL
> >         timeout="600000"
> >         interval="60000" />
>
>
> [Bela] 10 minutes to detect a hung or crashed member?
>
*[PoojaK]: that was done keeping in mind the nature of Application. This
application is data intensive and needs to store and re-transmit data if
the outage is within 10 minutes. I do remember from your workshop that this
shouldnt be very high. I will keep this point in mind for a change. *

>
>
> >         <UNIGY/>
>
> [Bela] That's the CULPRIT!!!! ha ha :-)
> *[PoojaK]: I am sure its not but I hope not ;-)*
>
>
> >         <pbcast.GMS
> >
> >         print_local_addr="true"
> >
> >         join_timeout="6000"
> >
> >         view_bundling="true"
> >
> >         merge_timeout="60000"
> >
> >         view_ack_collection_timeout="10000" />
>
> [Bela] Not a good idea; it might take up to 10s to install a view unless
> VIEW-ACKs from all members are received quickly
>
*[PoojaK]: If I recollect this variable is not stopping anything. Its
asynchronous right? *

>
>
> >         <FRAG2
> >         frag_size="60000" />
>
> [Bela] Oh the horror, why aren't you using FRAG2?
>
*[PoojaK]: I think I am using FRAG2 only. did you mean something else? *

>
>
> >         <pbcast.STATE_TRANSFER/>
> >
> >         </config>
> >
> >     > This is a cluster of 30 Nodes running with TCP as transport.
> >     > The nodes are geographically co-located across the globe.
> >     >
> >     >
> >     > On Thu, Mar 17, 2016 at 4:27 AM, JGroups - General mailing list
> [via
> >      > JGroups] <[hidden email]
> >     </user/SendEmail.jtp?type=node&node=10995&i=0>>
> >      > wrote:
> >      >
> >      >     Hi Pooja,
> >      >
> >      >     On 16/03/16 21:25, Questions/problems related to using
> >     JGroups wrote:
> >      >     > Hi ,
> >      >     >
> >      >     > Say if there is a split in the cluster and it is not
> recovering
> >      >     > automatically inspite of no network issues, has any one
> >     tried any probe
> >      >     > commands related to merge3 to heal the cluster
> automatically?
> >      >
> >      >     There's a @ManagedOperation MERGE3.sendInfo() that can be
> >     triggered, so
> >      >     everyone in a cluster sends their information, allowing the
> >     coordinator
> >      >     to start a merge.
> >      >
> >      >     However, this doesn't automatically merge the cluster, e.g.
> >     if everyone
> >      >     has the same view, then nothing will happen.
> >      >
> >      >     If there is no network problem, and MERGE3 does *not* recover
> the
> >      >     cluster, then that would be a bug. In such a case, what would
> be
> >      >     required for diagnosis is:
> >      >     - TRACE logs for all members of GMS and MERGE3
> >      >     - Views of all members. This could be used for a reproducer
> >      >
> >      >     I haven't yet come across a scenario with MERGE3 (*not*
> >     MERGE2!) that
> >      >     doesn't heal a network partition when the network is
> >     functioning ok
> >      >     again.
> >      >
> >      >      > Thanks
> >      >      > Pooja
> >      >      >
> >      >      >
> >      >
> >
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>