Re: [jgroups-users] Can probe be used to recover network splits?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Pooja,

On 17/03/16 17:09, Questions/problems related to using JGroups wrote:
>
>
> On Thu, Mar 17, 2016 at 10:59 AM, Questions/problems related to using
> JGroups <jav...@li...
> <mailto:jav...@li...>> wrote:
>
>
>
>     On 17/03/16 15:47, Questions/problems related to using JGroups wrote:
>     > HI Bela,
>     > Thank you for your response. This is very helpful. I will try it out !!
>     >
>     > The cluster does recover automatically with MERGE3. Only thing it took 6
>     > hours in one case and 2 hours in other case to finally merge into 1
>     > cluster.
>
>     This should not be the case. Did you look at TRACE logs for GMS and
>     MERGE3?
>     MERGE3 with min and max around 10 and 30 seconds should take 1-2 rounds,
>     so ca. 60 secs max to merge when the network is fine.
>
> I have good enough min_interval and max_intervals. See below:
> <MERGE3
> max_interval="200000"
> max_participants_in_merge="200"
> min_interval="20000" />

[Bela] OK

> [PoojaK]: I could not enable TRACE for GMS and MERGE3 since it joined in
> late after network had recovered.

[Bela] GMS has an operation to print the previous N views (N defined by 
GMS.num_prev_views, default: 10):

probe.sh op=GMS.printPreviousViews

You could also look at the number of merges in MERGE3. If this number 
was small, then something is wrong, e.g. check_interval might be too small.

>     I'm sure there's something else going on, preventing a merge. Have you
>     checked your thread pools? Do they have queues enabled perhaps? Do you
>     configure the internal and/or timer pools as well?
>
> Thread pool queue is enabled  for Regular thread pools but disabled for
> OOB.
> I have internal and timer threads configured too.

[Bela] OK. An idea is to check with probe.sh (next time this occurs) 
what the size of the pools is (plus the active sizes)

>     What's you config?
>
>
> Below is my config:
>
>         <TCP
>         thread_pool.enabled="true"
>         thread_pool.min_threads="50"
>         thread_pool.max_threads="100"
>         thread_pool.keep_alive_time="5000"
>         thread_pool.queue_enabled="true"
>         thread_pool.queue_max_size="50000"
>         thread_pool.rejection_policy="Discard"

[Bela]
So if you have more than 50 threads active at the same time, new 
messages will be queued up to 50000. Is 50 the max number of nodes that 
are sending?
An idea might be to disable the queue and up max_size to a large number. 
OTOH, you won't need more than (max_nodes + B) threads where B is a 
small buffer (say 3) as messages from the same sender S will simply get 
added to S's table is a thread is already processing a message (or 
batch) from S.

>         <TCPPING
>         timeout="200000"

[Bela] 3+ minutes? Why so high? This means the first member will take 3+ 
minutes to start!

>         initial_hosts=""

[Bela] I assume you're setting this value at startup time?

>         <MERGE3
>         max_interval="200000"
>         max_participants_in_merge="200"
>         min_interval="20000" />

[Bela] So on average it will take 200000 * 1.6 (check_interval, 
computed) to start a merge

>         <FD_ALL
>         timeout="600000"
>         interval="60000" />

[Bela] 10 minutes to detect a hung or crashed member?

>         <UNIGY/>

[Bela] That's the CULPRIT!!!! ha ha :-)

>         <pbcast.GMS
>
>         print_local_addr="true"
>
>         join_timeout="6000"
>
>         view_bundling="true"
>
>         merge_timeout="60000"
>
>         view_ack_collection_timeout="10000" />

[Bela] Not a good idea; it might take up to 10s to install a view unless 
VIEW-ACKs from all members are received quickly

>         <FRAG2
>         frag_size="60000" />

[Bela] Oh the horror, why aren't you using FRAG2?

>         <pbcast.STATE_TRANSFER/>
>
>         </config>
>
>     > This is a cluster of 30 Nodes running with TCP as transport.
>     > The nodes are geographically co-located across the globe.
>     >
>     >
>     > On Thu, Mar 17, 2016 at 4:27 AM, JGroups - General mailing list [via
>      > JGroups] <[hidden email]
>     </user/SendEmail.jtp?type=node&node=10995&i=0>>
>      > wrote:
>      >
>      >     Hi Pooja,
>      >
>      >     On 16/03/16 21:25, Questions/problems related to using
>     JGroups wrote:
>      >     > Hi ,
>      >     >
>      >     > Say if there is a split in the cluster and it is not recovering
>      >     > automatically inspite of no network issues, has any one
>     tried any probe
>      >     > commands related to merge3 to heal the cluster automatically?
>      >
>      >     There's a @ManagedOperation MERGE3.sendInfo() that can be
>     triggered, so
>      >     everyone in a cluster sends their information, allowing the
>     coordinator
>      >     to start a merge.
>      >
>      >     However, this doesn't automatically merge the cluster, e.g.
>     if everyone
>      >     has the same view, then nothing will happen.
>      >
>      >     If there is no network problem, and MERGE3 does *not* recover the
>      >     cluster, then that would be a bug. In such a case, what would be
>      >     required for diagnosis is:
>      >     - TRACE logs for all members of GMS and MERGE3
>      >     - Views of all members. This could be used for a reproducer
>      >
>      >     I haven't yet come across a scenario with MERGE3 (*not*
>     MERGE2!) that
>      >     doesn't heal a network partition when the network is
>     functioning ok
>      >     again.
>      >
>      >      > Thanks
>      >      > Pooja
>      >      >
>      >      >
>      >
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)