Re: [jgroups-users] Can probe be used to recover network splits?
Brought to you by:
belaban
|
From: Questions/problems r. to u. J. <jav...@li...> - 2016-03-18 17:42:48
|
On Fri, Mar 18, 2016 at 7:07 AM, Questions/problems related to using JGroups <jav...@li...> wrote: > Hi Pooja, > > On 17/03/16 17:09, Questions/problems related to using JGroups wrote: > > > > > > On Thu, Mar 17, 2016 at 10:59 AM, Questions/problems related to using > > JGroups <jav...@li... > > <mailto:jav...@li...>> wrote: > > > > > > > > On 17/03/16 15:47, Questions/problems related to using JGroups wrote: > > > HI Bela, > > > Thank you for your response. This is very helpful. I will try it > out !! > > > > > > The cluster does recover automatically with MERGE3. Only thing it > took 6 > > > hours in one case and 2 hours in other case to finally merge into 1 > > > cluster. > > > > This should not be the case. Did you look at TRACE logs for GMS and > > MERGE3? > > MERGE3 with min and max around 10 and 30 seconds should take 1-2 > rounds, > > so ca. 60 secs max to merge when the network is fine. > > > > I have good enough min_interval and max_intervals. See below: > > <MERGE3 > > max_interval="200000" > > max_participants_in_merge="200" > > min_interval="20000" /> > > > [Bela] OK > > > > [PoojaK]: I could not enable TRACE for GMS and MERGE3 since it joined in > > late after network had recovered. > > > [Bela] GMS has an operation to print the previous N views (N defined by > GMS.num_prev_views, default: 10): > > probe.sh op=GMS.printPreviousViews > > > You could also look at the number of merges in MERGE3. If this number > was small, then something is wrong, e.g. check_interval might be too small. > > > > > I'm sure there's something else going on, preventing a merge. Have > you > > checked your thread pools? Do they have queues enabled perhaps? Do > you > > configure the internal and/or timer pools as well? > > > > Thread pool queue is enabled for Regular thread pools but disabled for > > OOB. > > I have internal and timer threads configured too. > > > [Bela] OK. An idea is to check with probe.sh (next time this occurs) > what the size of the pools is (plus the active sizes) > > *[PoojaK] : the internal_queue_size and time_queue_size is all 0 > everywhere but the OOB_queue_size: most of the nodes reported queue size to > be around 1000 and oob.thread_pool.queue_max_size is 1000. Does views() go > through OOB channel? I think I should disable the queue on OOB thread pool > for sure. so that messages reach right away. * > > > What's you config? > > > > > > Below is my config: > > > > <TCP > > thread_pool.enabled="true" > > thread_pool.min_threads="50" > > thread_pool.max_threads="100" > > thread_pool.keep_alive_time="5000" > > thread_pool.queue_enabled="true" > > thread_pool.queue_max_size="50000" > > thread_pool.rejection_policy="Discard" > > [Bela] > So if you have more than 50 threads active at the same time, new > messages will be queued up to 50000. Is 50 the max number of nodes that > are sending? > An idea might be to disable the queue and up max_size to a large number. > OTOH, you won't need more than (max_nodes + B) threads where B is a > small buffer (say 3) as messages from the same sender S will simply get > added to S's table is a thread is already processing a message (or > batch) from S. > > *[PoojaK]: 50 was kept as higher limit since this system was going to increase to 50 but currently its 32 members. But when I was checking this system again I noticed min_threads are only 10 in current system where issue happen whereas we have 32 nodes. This might be the main culprit :(. That must be manual error (and I will get this corrected right away to 35+). * *Also, most nodes are running with rejection_policy as Run for OOB and regular thread pools. This might be another culprit. I need to get this changed to "Discard" or "Run"* *For the queues, do you suggest disabling OOB , regular , timer and internal thread pool queues? or just regular and OOB?* > > > > <TCPPING > > timeout="200000" > > [Bela] 3+ minutes? Why so high? This means the first member will take 3+ > minutes to start! > *[PoojaK]: This is due to design in current version of JGroups that we are using (3.3.5.final) where discovery isnt multi-threaded and misses out some members when we have large no of nodes when most of the nodes are powered off. This is already changed in later versions. * > > > initial_hosts="" > > [Bela] I assume you're setting this value at startup time? > * [PoojaK]: Yes and we have been very accurate with this to not see "No > Physical Address found" :-)* > > > <MERGE3 > > max_interval="200000" > > max_participants_in_merge="200" > > min_interval="20000" /> > > [Bela] So on average it will take 200000 * 1.6 (check_interval, > computed) to start a merge > *[PoojaK]: I checked the check_interval is 320 seconds for us right now.Do you think this is adding to delay in merging? * > > > > <FD_ALL > > timeout="600000" > > interval="60000" /> > > > [Bela] 10 minutes to detect a hung or crashed member? > *[PoojaK]: that was done keeping in mind the nature of Application. This application is data intensive and needs to store and re-transmit data if the outage is within 10 minutes. I do remember from your workshop that this shouldnt be very high. I will keep this point in mind for a change. * > > > > <UNIGY/> > > [Bela] That's the CULPRIT!!!! ha ha :-) > *[PoojaK]: I am sure its not but I hope not ;-)* > > > > <pbcast.GMS > > > > print_local_addr="true" > > > > join_timeout="6000" > > > > view_bundling="true" > > > > merge_timeout="60000" > > > > view_ack_collection_timeout="10000" /> > > [Bela] Not a good idea; it might take up to 10s to install a view unless > VIEW-ACKs from all members are received quickly > *[PoojaK]: If I recollect this variable is not stopping anything. Its asynchronous right? * > > > > <FRAG2 > > frag_size="60000" /> > > [Bela] Oh the horror, why aren't you using FRAG2? > *[PoojaK]: I think I am using FRAG2 only. did you mean something else? * > > > > <pbcast.STATE_TRANSFER/> > > > > </config> > > > > > This is a cluster of 30 Nodes running with TCP as transport. > > > The nodes are geographically co-located across the globe. > > > > > > > > > On Thu, Mar 17, 2016 at 4:27 AM, JGroups - General mailing list > [via > > > JGroups] <[hidden email] > > </user/SendEmail.jtp?type=node&node=10995&i=0>> > > > wrote: > > > > > > Hi Pooja, > > > > > > On 16/03/16 21:25, Questions/problems related to using > > JGroups wrote: > > > > Hi , > > > > > > > > Say if there is a split in the cluster and it is not > recovering > > > > automatically inspite of no network issues, has any one > > tried any probe > > > > commands related to merge3 to heal the cluster > automatically? > > > > > > There's a @ManagedOperation MERGE3.sendInfo() that can be > > triggered, so > > > everyone in a cluster sends their information, allowing the > > coordinator > > > to start a merge. > > > > > > However, this doesn't automatically merge the cluster, e.g. > > if everyone > > > has the same view, then nothing will happen. > > > > > > If there is no network problem, and MERGE3 does *not* recover > the > > > cluster, then that would be a bug. In such a case, what would > be > > > required for diagnosis is: > > > - TRACE logs for all members of GMS and MERGE3 > > > - Views of all members. This could be used for a reproducer > > > > > > I haven't yet come across a scenario with MERGE3 (*not* > > MERGE2!) that > > > doesn't heal a network partition when the network is > > functioning ok > > > again. > > > > > > > Thanks > > > > Pooja > > > > > > > > > > > > > > > -- > Bela Ban, JGroups lead (http://www.jgroups.org) > > > > ------------------------------------------------------------------------------ > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140 > _______________________________________________ > javagroups-users mailing list > jav...@li... > https://lists.sourceforge.net/lists/listinfo/javagroups-users > |