Re: [jgroups-users] Can probe be used to recover network splits?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 18/03/16 18:42, Questions/problems related to using JGroups wrote:

>     >     I'm sure there's something else going on, preventing a merge. Have you
>     >     checked your thread pools? Do they have queues enabled perhaps? Do you
>     >     configure the internal and/or timer pools as well?
>     >
>     > Thread pool queue is enabled  for Regular thread pools but disabled for
>     > OOB.
>     > I have internal and timer threads configured too.
>
>
>     [Bela] OK. An idea is to check with probe.sh (next time this occurs)
>     what the size of the pools is (plus the active sizes)
>
>     /[PoojaK] : the internal_queue_size and time_queue_size is all 0

[Bela] I meant pool size not queue size.

>     everywhere but the OOB_queue_size: most of the nodes reported queue
>     size to be around 1000 and oob.thread_pool.queue_max_size is 1000.

[Bela] The OOB queue size is ignored as the queue is disabled, but the 
regular queue size should be always more or less 0!

>     Does views() go through OOB channel?

[Bela] No, a new view is sent as regular message. So, if your regular 
queue size is > 0, then that could mean views are delayed.

> I think I should disable the  queue on OOB thread pool for sure. so that messages reach right away.

[Bela] I thought you're doing that already. But I'd agree that the 
regular pool's queue could be disabled, too.

>     >     What's you config?
>     >
>     >
>     > Below is my config:
>     >
>      >         <TCP
>     >         thread_pool.enabled="true"
>     >         thread_pool.min_threads="50"
>     >         thread_pool.max_threads="100"
>     >         thread_pool.keep_alive_time="5000"
>     >         thread_pool.queue_enabled="true"
>     >         thread_pool.queue_max_size="50000"
>     >         thread_pool.rejection_policy="Discard"
>
>     [Bela]
>     So if you have more than 50 threads active at the same time, new
>     messages will be queued up to 50000. Is 50 the max number of nodes that
>     are sending?
>     An idea might be to disable the queue and up max_size to a large number.
>     OTOH, you won't need more than (max_nodes + B) threads where B is a
>     small buffer (say 3) as messages from the same sender S will simply get
>     added to S's table is a thread is already processing a message (or
>     batch) from S.
>
> /[PoojaK]: 50 was kept as higher limit since this system was going to
> increase to 50 but currently its 32 members. But when I was checking
> this system again I noticed min_threads are only 10 in current system
> where issue happen whereas we have 32 nodes. This might be the main
> culprit :(. That must be manual error (and I will get this corrected
> right away to 35+). /

[Bela] Yes, unless you have 50'000 messages in the queue, you'll never 
have more than 10 threads processing messages from 32 nodes. So, in the 
worst case, messages (and views) get delayed a lot.

> /Also, most nodes are running with rejection_policy as Run for OOB and
> regular thread pools. This might be another culprit. I need to get this
> changed to "Discard" or "Run"/

[Bela] Yes, +1,  definitely don't use "run"!

> /For the queues, do you suggest disabling OOB , regular , timer and
> internal thread pool queues? or just regular and OOB?/

[Bela] Just regular and OOB. The other thread pools have tasks that are 
never supposed to block.

>     >         initial_hosts=""
>
>     [Bela] I assume you're setting this value at startup time?
>     /[PoojaK]: Yes and we have been very accurate with this to not see
>     "No Physical Address found" :-)/
>
>     >         <MERGE3
>     >         max_interval="200000"
>     >         max_participants_in_merge="200"
>     >         min_interval="20000" />
>
>     [Bela] So on average it will take 200000 * 1.6 (check_interval,
>     computed) to start a merge
>
> /[PoojaK]: I checked the check_interval is 320 seconds for us right
> now.Do you think this is adding to delay in merging? /

[Bela] Not necessarily, but if you only check every ~5 minutes, then you 
can have 12 merges per hour _max_. Since merge detection is done with 
unreliable messages (not retransmitted since they're below NAKACK2 / 
UNICAST3), a merge might not happen at all, or happen partially (not 
merging the entire cluster).

I don't think this is the main culprit, but it doesn't help having to 
wait for 5+ minutes for merge detection.

>     >         <FD_ALL
>     >         timeout="600000"
>     >         interval="60000" />
>
>
>     [Bela] 10 minutes to detect a hung or crashed member?
>
> /[PoojaK]: that was done keeping in mind the nature of Application. This
> application is data intensive and needs to store and re-transmit data if
> the outage is within 10 minutes. I do remember from your workshop that
> this shouldnt be very high. I will keep this point in mind for a change. /

[Bela] Yes, but I also suggested to pair FD_ALL with FD_SOCK. The latter 
would catch 90% of all crashes.

>     >         <pbcast.GMS
>     >
>     >         print_local_addr="true"
>     >
>     >         join_timeout="6000"
>     >
>     >         view_bundling="true"
>     >
>     >         merge_timeout="60000"
>     >
>     >         view_ack_collection_timeout="10000" />
>
>     [Bela] Not a good idea; it might take up to 10s to install a view unless
>     VIEW-ACKs from all members are received quickly
>
> /[PoojaK]: If I recollect this variable is not stopping anything. Its
> asynchronous right? /

[Bela] Nope. The view installer (coordinator) waits for 10s until it has 
acks from all members of the new view. So this will delay view 
installation (also merge views).

>     >         <FRAG2
>     >         frag_size="60000" />
>
>     [Bela] Oh the horror, why aren't you using FRAG2?
>
> /[PoojaK]: I think I am using FRAG2 only. did you mean something else? /

[Bela] Sorry, my bad, I thought I read FRAG... :-)

>      >         <pbcast.STATE_TRANSFER/>
>      >
>      >         </config>
>      >
>      >     > This is a cluster of 30 Nodes running with TCP as transport.
>      >     > The nodes are geographically co-located across the globe.
>      >     >
>      >     >
>      >     > On Thu, Mar 17, 2016 at 4:27 AM, JGroups - General mailing
>     list [via
>      >      > JGroups] <[hidden email]
>      >     </user/SendEmail.jtp?type=node&node=10995&i=0>>
>      >      > wrote:
>      >      >
>      >      >     Hi Pooja,
>      >      >
>      >      >     On 16/03/16 21:25, Questions/problems related to using
>      >     JGroups wrote:
>      >      >     > Hi ,
>      >      >     >
>      >      >     > Say if there is a split in the cluster and it is not
>     recovering
>      >      >     > automatically inspite of no network issues, has any one
>      >     tried any probe
>      >      >     > commands related to merge3 to heal the cluster
>     automatically?
>      >      >
>      >      >     There's a @ManagedOperation MERGE3.sendInfo() that can be
>      >     triggered, so
>      >      >     everyone in a cluster sends their information,
>     allowing the
>      >     coordinator
>      >      >     to start a merge.
>      >      >
>      >      >     However, this doesn't automatically merge the cluster,
>     e.g.
>      >     if everyone
>      >      >     has the same view, then nothing will happen.
>      >      >
>      >      >     If there is no network problem, and MERGE3 does *not*
>     recover the
>      >      >     cluster, then that would be a bug. In such a case,
>     what would be
>      >      >     required for diagnosis is:
>      >      >     - TRACE logs for all members of GMS and MERGE3
>      >      >     - Views of all members. This could be used for a
>     reproducer
>      >      >
>      >      >     I haven't yet come across a scenario with MERGE3 (*not*
>      >     MERGE2!) that
>      >      >     doesn't heal a network partition when the network is
>      >     functioning ok
>      >      >     again.
>      >      >
>      >      >      > Thanks
>      >      >      > Pooja
>      >      >      >
>      >      >      >
>      >      >
>      >
>
>     --
>     Bela Ban, JGroups lead (http://www.jgroups.org)
>
>
>
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>

-- 
Bela Ban, JGroups lead (http://www.jgroups.org)