Re: [javagroups-users] Strange MergeViews / are they expected?
Brought to you by:
belaban
From: Bela B. <be...@ya...> - 2012-03-24 11:45:56
|
When you put load on a system, depending on the config, that node will be excluded, only to be later merged back into the cluster. For instance, if you have FD or FD_ALL in your config, heartbeats won't be received from a stressed member, and that member will then be excluded. If a stressed member recovers, e.g. the CPU stres stops, there will be a simple merge. In most cases, a stressed member will get excluded and remains excluded. On 3/23/12 11:46 PM, Kohsuke Kawaguchi wrote: > I'm doing various monkey-testing of the GMS behaviours by putting > stress on member nodes and how the cluster reacts to it. > > If I kill a member (say via 'kill -9'), it works well, but when I > choke a node more slowly, strange things happen. When I say "choke > more slowly", I do things like: > > // CPU saturation > while (true) { > new Thread() { > public void run() { > while (true) ; > } > }.start(); > } > > or: > > // memory saturation attack > while (true) { > int i=0; > try { > System.setProperty("foo"+(i++),new String(new byte[10240])); > } catch (Throwable t) { ; } > } > > Mostly, a cluster splits (between everyone else and one node that's > stressed), presumably because everyone else thinks this one node is > dead. Sometimes they try to merge, and when I look at what's going on > (by looking at the view changes from a healthy member of a cluster), I > see a lot of strange merge views. > > For example, this one has two clusters merging, but notice that the > [alpha-7250|18] view doesn't have its coordinator in the member: > > MergeView::[alpha-7250|20] [alpha-7250, bravo-11837, bravo-60953, > alpha-25509], subgroups=[alpha-7250|19] [alpha-7250, bravo-11837, > alpha-25509], [alpha-7250|18] [bravo-60953] > > Or here is another one, where 1 cluster managed to merge all by itself > without 2nd cluster: > > MergeView::[alpha-7250|17] [alpha-7250, bravo-11837, bravo-60953], > subgroups=[alpha-7250|16] [alpha-7250, bravo-11837, bravo-60953] > > Are these a symptom of some bugs in JGroups? Or are they expected? > -- Bela Ban, JGroups lead (http://www.jgroups.org) |