Re: [jgroups-users] Can probe be used to recover network splits?
Brought to you by:
belaban
|
From: Questions/problems r. to u. J. <jav...@li...> - 2016-03-19 09:01:40
|
On 18/03/16 18:42, Questions/problems related to using JGroups wrote: > > I'm sure there's something else going on, preventing a merge. Have you > > checked your thread pools? Do they have queues enabled perhaps? Do you > > configure the internal and/or timer pools as well? > > > > Thread pool queue is enabled for Regular thread pools but disabled for > > OOB. > > I have internal and timer threads configured too. > > > [Bela] OK. An idea is to check with probe.sh (next time this occurs) > what the size of the pools is (plus the active sizes) > > /[PoojaK] : the internal_queue_size and time_queue_size is all 0 [Bela] I meant pool size not queue size. > everywhere but the OOB_queue_size: most of the nodes reported queue > size to be around 1000 and oob.thread_pool.queue_max_size is 1000. [Bela] The OOB queue size is ignored as the queue is disabled, but the regular queue size should be always more or less 0! > Does views() go through OOB channel? [Bela] No, a new view is sent as regular message. So, if your regular queue size is > 0, then that could mean views are delayed. > I think I should disable the queue on OOB thread pool for sure. so that messages reach right away. [Bela] I thought you're doing that already. But I'd agree that the regular pool's queue could be disabled, too. > > What's you config? > > > > > > Below is my config: > > > > <TCP > > thread_pool.enabled="true" > > thread_pool.min_threads="50" > > thread_pool.max_threads="100" > > thread_pool.keep_alive_time="5000" > > thread_pool.queue_enabled="true" > > thread_pool.queue_max_size="50000" > > thread_pool.rejection_policy="Discard" > > [Bela] > So if you have more than 50 threads active at the same time, new > messages will be queued up to 50000. Is 50 the max number of nodes that > are sending? > An idea might be to disable the queue and up max_size to a large number. > OTOH, you won't need more than (max_nodes + B) threads where B is a > small buffer (say 3) as messages from the same sender S will simply get > added to S's table is a thread is already processing a message (or > batch) from S. > > /[PoojaK]: 50 was kept as higher limit since this system was going to > increase to 50 but currently its 32 members. But when I was checking > this system again I noticed min_threads are only 10 in current system > where issue happen whereas we have 32 nodes. This might be the main > culprit :(. That must be manual error (and I will get this corrected > right away to 35+). / [Bela] Yes, unless you have 50'000 messages in the queue, you'll never have more than 10 threads processing messages from 32 nodes. So, in the worst case, messages (and views) get delayed a lot. > /Also, most nodes are running with rejection_policy as Run for OOB and > regular thread pools. This might be another culprit. I need to get this > changed to "Discard" or "Run"/ [Bela] Yes, +1, definitely don't use "run"! > /For the queues, do you suggest disabling OOB , regular , timer and > internal thread pool queues? or just regular and OOB?/ [Bela] Just regular and OOB. The other thread pools have tasks that are never supposed to block. > > initial_hosts="" > > [Bela] I assume you're setting this value at startup time? > /[PoojaK]: Yes and we have been very accurate with this to not see > "No Physical Address found" :-)/ > > > <MERGE3 > > max_interval="200000" > > max_participants_in_merge="200" > > min_interval="20000" /> > > [Bela] So on average it will take 200000 * 1.6 (check_interval, > computed) to start a merge > > /[PoojaK]: I checked the check_interval is 320 seconds for us right > now.Do you think this is adding to delay in merging? / [Bela] Not necessarily, but if you only check every ~5 minutes, then you can have 12 merges per hour _max_. Since merge detection is done with unreliable messages (not retransmitted since they're below NAKACK2 / UNICAST3), a merge might not happen at all, or happen partially (not merging the entire cluster). I don't think this is the main culprit, but it doesn't help having to wait for 5+ minutes for merge detection. > > <FD_ALL > > timeout="600000" > > interval="60000" /> > > > [Bela] 10 minutes to detect a hung or crashed member? > > /[PoojaK]: that was done keeping in mind the nature of Application. This > application is data intensive and needs to store and re-transmit data if > the outage is within 10 minutes. I do remember from your workshop that > this shouldnt be very high. I will keep this point in mind for a change. / [Bela] Yes, but I also suggested to pair FD_ALL with FD_SOCK. The latter would catch 90% of all crashes. > > <pbcast.GMS > > > > print_local_addr="true" > > > > join_timeout="6000" > > > > view_bundling="true" > > > > merge_timeout="60000" > > > > view_ack_collection_timeout="10000" /> > > [Bela] Not a good idea; it might take up to 10s to install a view unless > VIEW-ACKs from all members are received quickly > > /[PoojaK]: If I recollect this variable is not stopping anything. Its > asynchronous right? / [Bela] Nope. The view installer (coordinator) waits for 10s until it has acks from all members of the new view. So this will delay view installation (also merge views). > > <FRAG2 > > frag_size="60000" /> > > [Bela] Oh the horror, why aren't you using FRAG2? > > /[PoojaK]: I think I am using FRAG2 only. did you mean something else? / [Bela] Sorry, my bad, I thought I read FRAG... :-) > > <pbcast.STATE_TRANSFER/> > > > > </config> > > > > > This is a cluster of 30 Nodes running with TCP as transport. > > > The nodes are geographically co-located across the globe. > > > > > > > > > On Thu, Mar 17, 2016 at 4:27 AM, JGroups - General mailing > list [via > > > JGroups] <[hidden email] > > </user/SendEmail.jtp?type=node&node=10995&i=0>> > > > wrote: > > > > > > Hi Pooja, > > > > > > On 16/03/16 21:25, Questions/problems related to using > > JGroups wrote: > > > > Hi , > > > > > > > > Say if there is a split in the cluster and it is not > recovering > > > > automatically inspite of no network issues, has any one > > tried any probe > > > > commands related to merge3 to heal the cluster > automatically? > > > > > > There's a @ManagedOperation MERGE3.sendInfo() that can be > > triggered, so > > > everyone in a cluster sends their information, > allowing the > > coordinator > > > to start a merge. > > > > > > However, this doesn't automatically merge the cluster, > e.g. > > if everyone > > > has the same view, then nothing will happen. > > > > > > If there is no network problem, and MERGE3 does *not* > recover the > > > cluster, then that would be a bug. In such a case, > what would be > > > required for diagnosis is: > > > - TRACE logs for all members of GMS and MERGE3 > > > - Views of all members. This could be used for a > reproducer > > > > > > I haven't yet come across a scenario with MERGE3 (*not* > > MERGE2!) that > > > doesn't heal a network partition when the network is > > functioning ok > > > again. > > > > > > > Thanks > > > > Pooja > > > > > > > > > > > > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > > > > javagroups-users mailing list > jav...@li... > https://lists.sourceforge.net/lists/listinfo/javagroups-users > -- Bela Ban, JGroups lead (http://www.jgroups.org) |