Thread: [jgroups-users] removing unknown address from cluster? JGRP000032

Brought to you by: belaban

javagroups-users

[jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-04-05 21:43:18

Hi,

Our product uses the TCP stack with jgroups 4.1.8. It gets set up by end
users through a configuration file that contains (among other things), a
list of IP addresses for a node to connect to when joining a cluster. We
set this for TCPPING.initial_hosts.

If they have a wrong address at startup they end up getting JGRP000032
warnings filling the logs. For instance, the following leads to logs
filling on two nodes, one of which was set up correctly:

1. Start cluster A/B. A is the coordinator.
2. Start a one-node cluster C.
3. On node D, include addresses for D and B in the initial hosts list and
attempt to join.
4. D will join C for a cluster C/D and, obviously, not join A/B since it
didn't attempt to connect to the coordinator.

After this, the logs for D will fill with:
WARN: JGRP000032: <D>: no physical address for <A>, dropping message

...and B logs will fill with:
WARN: JGRP000032: <B>: no physical address for <C>, dropping message

I know this is a setup error on the user's side, but was wondering if
there's anything we could add programmatically to stop it. For instance,
when they see the logs on X filling up with messages about Y in another
cluster, is there something we could do to tell X to forget Y exists? It's
not enough just to stop/fix/start that cluster, as (in the case of A/B
above) the cluster that was started correctly could be showing this
problem. For some customers, getting a maintenance window to shut down all
related clusters and restart them is a problem.

For that matter, is there anything programmatically we could do to detect
that this is happening? Besides parsing the jgroups logging output I mean.

Thank you,
Bobby

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-04-06 11:44:59

You can always change the list of initial hosts in TCPPING 
programmatically, via getInitialHosts() / setInitialHosts().

Detecting that an address is wrong is outside the scope of JGroups, and 
should be done (IMO) by your application, e.g. at 
config/installation/startup time.

This can of course be arbitrarily difficult, e.g.
* See if a symbolic name resolves correctly
* Check if a host is pingable

You could also disallow a user from entering hostnames/IP addresses 
him/herself directly and instead generate them yourself, e.g. by 
recording all hosts on which an installation was performed and using 
this as initial_hosts.

You could also think of adding a protocol which checks (in init() or 
start()) that the hostnames/addresses in TCPPING.initial_hosts resolve, 
and possibly ping all entries before starting the stack.

On a related note, take a look at [1] (added in 4.2.12): it skips 
unresolved/unresolvable entries until an entry finally does resolve.

Hope this helps,

[1] https://issues.redhat.com/browse/JGRP-2535

On 05.04.21 22:50, Questions/problems related to using JGroups wrote:
> Hi,
> 
> Our product uses the TCP stack with jgroups 4.1.8. It gets set up by end 
> users through a configuration file that contains (among other things), a 
> list of IP addresses for a node to connect to when joining a cluster. We 
> set this for TCPPING.initial_hosts.
> 
> If they have a wrong address at startup they end up getting JGRP000032 
> warnings filling the logs. For instance, the following leads to logs 
> filling on two nodes, one of which was set up correctly:
> 
> 1. Start cluster A/B. A is the coordinator.
> 2. Start a one-node cluster C.
> 3. On node D, include addresses for D and B in the initial hosts list 
> and attempt to join.
> 4. D will join C for a cluster C/D and, obviously, not join A/B since it 
> didn't attempt to connect to the coordinator.
> 
> After this, the logs for D will fill with:
> WARN: JGRP000032: <D>: no physical address for <A>, dropping message
> 
> ...and B logs will fill with:
> WARN: JGRP000032: <B>: no physical address for <C>, dropping message
> 
> I know this is a setup error on the user's side, but was wondering if 
> there's anything we could add programmatically to stop it. For instance, 
> when they see the logs on X filling up with messages about Y in another 
> cluster, is there something we could do to tell X to forget Y exists? 
> It's not enough just to stop/fix/start that cluster, as (in the case of 
> A/B above) the cluster that was started correctly could be showing this 
> problem. For some customers, getting a maintenance window to shut down 
> all related clusters and restart them is a problem.
> 
> For that matter, is there anything programmatically we could do to 
> detect that this is happening? Besides parsing the jgroups logging 
> output I mean.
> 
> Thank you,
> Bobby
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
> 

-- 
Bela Ban | http://www.jgroups.org

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-03 15:03:51

Hi again,

Thanks for this. I have more information from the customer now, and see
that the problem they're having isn't due to incorrect host information at
startup like I thought. The setup to reproduce is pretty simple, and I
understand their point that it doesn't look like user error.

1. Set up cluster A/B/C (A is coordinator).
2. At some point they don't need C in the cluster anymore and shut down the
application there. It's a regular shutdown, not going suspect first. We use
JChannel#close and then exit.
3. Later they use a node with the same address to join a different cluster
with the same name. When C starts it only has D's address, and forms
cluster D/C.

After the above, the A/B cluster is getting a merge view change every
~minute, always including only A/B in the view. The log on A is also
filling with:
JGRP000032: <A>: no physical address for <D>, dropping message
Because it's a merge view, we do extra processing to handle potential
rejoin cases, which causes a couple other warnings every minute.

I also see every ~minute that A tries to authorize itself with C. C's log
has messages from our custom AuthToken class.

If I use a different cluster for C/D that avoids a lot of the issues. There
are no longer view changes and warnings in the first cluster, but the new
one D/C has this in C's log constantly:
JGRP000012: discarded message from different cluster <old> (our cluster is
<new>). Sender was <A>

That will help them some, but it's a large organization and they have a lot
of clusters, since we thought it would be ok to reuse the name as long as
the addresses weren't shared. Is there anything we can do to make a cluster
forget a member that has left gracefully?

Thanks,
Bobby

On Tue, Apr 6, 2021 at 7:46 AM Questions/problems related to using JGroups
via javagroups-users <jav...@li...> wrote:

> You can always change the list of initial hosts in TCPPING
> programmatically, via getInitialHosts() / setInitialHosts().
>
> Detecting that an address is wrong is outside the scope of JGroups, and
> should be done (IMO) by your application, e.g. at
> config/installation/startup time.
>
> This can of course be arbitrarily difficult, e.g.
> * See if a symbolic name resolves correctly
> * Check if a host is pingable
>
> You could also disallow a user from entering hostnames/IP addresses
> him/herself directly and instead generate them yourself, e.g. by
> recording all hosts on which an installation was performed and using
> this as initial_hosts.
>
> You could also think of adding a protocol which checks (in init() or
> start()) that the hostnames/addresses in TCPPING.initial_hosts resolve,
> and possibly ping all entries before starting the stack.
>
> On a related note, take a look at [1] (added in 4.2.12): it skips
> unresolved/unresolvable entries until an entry finally does resolve.
>
> Hope this helps,
>
> [1] https://issues.redhat.com/browse/JGRP-2535
>
> On 05.04.21 22:50, Questions/problems related to using JGroups wrote:
> > Hi,
> >
> > Our product uses the TCP stack with jgroups 4.1.8. It gets set up by end
> > users through a configuration file that contains (among other things), a
> > list of IP addresses for a node to connect to when joining a cluster. We
> > set this for TCPPING.initial_hosts.
> >
> > If they have a wrong address at startup they end up getting JGRP000032
> > warnings filling the logs. For instance, the following leads to logs
> > filling on two nodes, one of which was set up correctly:
> >
> > 1. Start cluster A/B. A is the coordinator.
> > 2. Start a one-node cluster C.
> > 3. On node D, include addresses for D and B in the initial hosts list
> > and attempt to join.
> > 4. D will join C for a cluster C/D and, obviously, not join A/B since it
> > didn't attempt to connect to the coordinator.
> >
> > After this, the logs for D will fill with:
> > WARN: JGRP000032: <D>: no physical address for <A>, dropping message
> >
> > ...and B logs will fill with:
> > WARN: JGRP000032: <B>: no physical address for <C>, dropping message
> >
> > I know this is a setup error on the user's side, but was wondering if
> > there's anything we could add programmatically to stop it. For instance,
> > when they see the logs on X filling up with messages about Y in another
> > cluster, is there something we could do to tell X to forget Y exists?
> > It's not enough just to stop/fix/start that cluster, as (in the case of
> > A/B above) the cluster that was started correctly could be showing this
> > problem. For some customers, getting a maintenance window to shut down
> > all related clusters and restart them is a problem.
> >
> > For that matter, is there anything programmatically we could do to
> > detect that this is happening? Besides parsing the jgroups logging
> > output I mean.
> >
> > Thank you,
> > Bobby
> >
> >
> >
> > _______________________________________________
> > javagroups-users mailing list
> > jav...@li...
> > https://lists.sourceforge.net/lists/listinfo/javagroups-users
> >
>
> --
> Bela Ban | http://www.jgroups.org
>
>
>
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-04 05:53:29


On 03.05.21 17:02, Questions/problems related to using JGroups wrote:
> Hi again,
> 
> Thanks for this. I have more information from the customer now, and see 
> that the problem they're having isn't due to incorrect host information 
> at startup like I thought. The setup to reproduce is pretty simple, and 
> I understand their point that it doesn't look like user error.
> 
> 1. Set up cluster A/B/C (A is coordinator).
> 2. At some point they don't need C in the cluster anymore and shut down 
> the application there. It's a regular shutdown, not going suspect first. 
> We use JChannel#close and then exit.

OK

> 3. Later they use a node with the same address to join a different 
> cluster with the same name.

Can you post an example? Note that discovery requests from different 
clusters are discarded.


> When C starts it only has D's address, and cluster D/C.
> 
> After the above, the A/B cluster is getting a merge view change every 
> ~minute, always including only A/B in the view. The log on A is also 
> filling with:
> JGRP000032: <A>: no physical address for <D>, dropping message
> Because it's a merge view, we do extra processing to handle potential 
> rejoin cases, which causes a couple other warnings every minute.
> 
> I also see every ~minute that A tries to authorize itself with C. C's 
> log has messages from our custom AuthToken class.
> 
> 
> If I use a different cluster for C/D that avoids a lot of the issues. 
> There are no longer view changes and warnings in the first cluster, but 
> the new one D/C has this in C's log constantly:
> JGRP000012: discarded message from different cluster <old> (our cluster 
> is <new>). Sender was <A>
> 
> That will help them some, but it's a large organization and they have a 
> lot of clusters, since we thought it would be ok to reuse the name as 
> long as the addresses weren't shared. Is there anything we can do to 
> make a cluster forget a member that has left gracefully?


You lost me early in your description of the case... can you post a 
simple example, with 2 configs including TCPPING?

In general, I recommend separating the sets of {TCP.bind_addr, 
TCPPING.initial_hosts) cleanly for each cluster, plus including *all* of 
the members of a cluster in TCPPING.initial_hosts.

If you can't do that, then look into using a dynamic discovery mechanism.
Cheers

> Thanks,
> Bobby
> 
> 
> 
> On Tue, Apr 6, 2021 at 7:46 AM Questions/problems related to using 
> JGroups via javagroups-users <jav...@li... 
> <mailto:jav...@li...>> wrote:
> 
>     You can always change the list of initial hosts in TCPPING
>     programmatically, via getInitialHosts() / setInitialHosts().
> 
>     Detecting that an address is wrong is outside the scope of JGroups, and
>     should be done (IMO) by your application, e.g. at
>     config/installation/startup time.
> 
>     This can of course be arbitrarily difficult, e.g.
>     * See if a symbolic name resolves correctly
>     * Check if a host is pingable
> 
>     You could also disallow a user from entering hostnames/IP addresses
>     him/herself directly and instead generate them yourself, e.g. by
>     recording all hosts on which an installation was performed and using
>     this as initial_hosts.
> 
>     You could also think of adding a protocol which checks (in init() or
>     start()) that the hostnames/addresses in TCPPING.initial_hosts resolve,
>     and possibly ping all entries before starting the stack.
> 
>     On a related note, take a look at [1] (added in 4.2.12): it skips
>     unresolved/unresolvable entries until an entry finally does resolve.
> 
>     Hope this helps,
> 
>     [1] https://issues.redhat.com/browse/JGRP-2535
>     <https://issues.redhat.com/browse/JGRP-2535>
> 
>     On 05.04.21 22:50, Questions/problems related to using JGroups wrote:
>      > Hi,
>      >
>      > Our product uses the TCP stack with jgroups 4.1.8. It gets set up
>     by end
>      > users through a configuration file that contains (among other
>     things), a
>      > list of IP addresses for a node to connect to when joining a
>     cluster. We
>      > set this for TCPPING.initial_hosts.
>      >
>      > If they have a wrong address at startup they end up
>     getting JGRP000032
>      > warnings filling the logs. For instance, the following leads to logs
>      > filling on two nodes, one of which was set up correctly:
>      >
>      > 1. Start cluster A/B. A is the coordinator.
>      > 2. Start a one-node cluster C.
>      > 3. On node D, include addresses for D and B in the initial hosts
>     list
>      > and attempt to join.
>      > 4. D will join C for a cluster C/D and, obviously, not join A/B
>     since it
>      > didn't attempt to connect to the coordinator.
>      >
>      > After this, the logs for D will fill with:
>      > WARN: JGRP000032: <D>: no physical address for <A>, dropping message
>      >
>      > ...and B logs will fill with:
>      > WARN: JGRP000032: <B>: no physical address for <C>, dropping message
>      >
>      > I know this is a setup error on the user's side, but was
>     wondering if
>      > there's anything we could add programmatically to stop it. For
>     instance,
>      > when they see the logs on X filling up with messages about Y in
>     another
>      > cluster, is there something we could do to tell X to forget Y
>     exists?
>      > It's not enough just to stop/fix/start that cluster, as (in the
>     case of
>      > A/B above) the cluster that was started correctly could be
>     showing this
>      > problem. For some customers, getting a maintenance window to shut
>     down
>      > all related clusters and restart them is a problem.
>      >
>      > For that matter, is there anything programmatically we could do to
>      > detect that this is happening? Besides parsing the jgroups logging
>      > output I mean.
>      >
>      > Thank you,
>      > Bobby
>      >
>      >
>      >
>      > _______________________________________________
>      > javagroups-users mailing list
>      > jav...@li...
>     <mailto:jav...@li...>
>      > https://lists.sourceforge.net/lists/listinfo/javagroups-users
>     <https://lists.sourceforge.net/lists/listinfo/javagroups-users>
>      >
> 
>     -- 
>     Bela Ban | http://www.jgroups.org <http://www.jgroups.org>
> 
> 
> 
>     _______________________________________________
>     javagroups-users mailing list
>     jav...@li...
>     <mailto:jav...@li...>
>     https://lists.sourceforge.net/lists/listinfo/javagroups-users
>     <https://lists.sourceforge.net/lists/listinfo/javagroups-users>
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
> 

-- 
Bela Ban | http://www.jgroups.org

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-04 19:07:38

Attachments: 128.log 129.log 130a.log 130b.log 131.log

On Tue, May 4, 2021 at 1:53 AM Questions/problems related to using JGroups
via javagroups-users <jav...@li...> wrote:

> [...]
>
>
> > 3. Later they use a node with the same address to join a different
> > cluster with the same name.
>
> Can you post an example? Note that discovery requests from different
> clusters are discarded.
>

Sure, but in summary: I can't reuse an IP address after it's already been
in a cluster. The customer is trying to run separate clusters, but the
address of a node in one of them was previously in a different one, and
that is causing problems.

My config is programmatic; I've included it below. We use a custom
authentication class. When authenticate() is called it will output the
source and the response it's returning. I've set the jgroups logging to
DEBUG level; my application only logs the initial_hosts it sets and the
authentication calls. The member addresses end in: 128, 129, 130, and 131.

1. I start a cluster with (started in this order) 128, 129, 130. Each of
them has all three of those addresses in initial_hosts.

2. I shut down the application running on 130. The logs for 128 and 129
have "*** stopping application on .130" in them right before this.

3. I start an application on 131 that has 130/131 in initial_hosts.

4. I start a new application on the node with the 130 address. It has 130
and 131 in initial hosts. The logs on 128 and 129 have "*** new application
on .130 starting and will join new cluster with .131" in them to show when
it happens.

About a minute later, the errors start showing up. The 128 application is
trying to connect to the one running on 130 even though that one had
previously shut down and left the cluster. The new one on 130 doesn't let
it join, and there are merge views repeating with warning messages
throughout. There is a merge view change every minute or so in the original
cluster (128/129).

The stack we create (comments and text changes for sharing):

    public JChannel createJChannel() throws Exception {
        Logger logger = <...>
        logger.log(Level.DEBUG, "Creating default JChannel.");
        List<Protocol> stack = new ArrayList<>();
        final Protocol tcp = new TCP()
            // bind_addr will be same address, e.g. .128, .129, etc that we
use in initial_hosts
            .setValue("bind_addr",
InetAddress.getByName(getBindingAddress()))
            .setValue("bind_port", bindingPort)
            .setValue("thread_pool_min_threads", 1)
            .setValue("thread_pool_keep_alive_time", 5000)
            .setValue("send_buf_size", 640000)
            .setValue("sock_conn_timeout", 300)
            .setValue("recv_buf_size", 5000000);
        // some optional things we could add to tcp removed. not used in
this example
        stack.add(tcp);
        stack.add(new TCPPING()
            // the parseHostList method will output the list for this
example at ERROR level
            .setValue("initial_hosts", parseHostList())
            .setValue("send_cache_on_join", true)
            .setValue("port_range", 0));
        stack.add(new MERGE3()
            .setValue("min_interval", 10000)
            .setValue("max_interval", 30000));
        FD_ALL fdAll = new FD_ALL();
        final long jgroupsTimeout = <>
        fdAll.setValue("timeout", jgroupsTimeout);
        final long maxInterval = jgroupsTimeout / 3L; // to have ~3
heartbeats before going suspect. <jira number removed>
        if (maxInterval < fdAll.getInterval()) {
            logger.log(Level.WARN, ".......");
            fdAll.setValue("interval", maxInterval);
        }
        stack.add(fdAll);
        stack.add(new VERIFY_SUSPECT()
            .setValue("timeout", 1500));
        stack.add(new BARRIER());
        if (getBoolean(<an application property>)) {
            logger.debug("adding jgroups asym encryption");
            stack.add(new ASYM_ENCRYPT()
                .setValue("sym_keylength", 128)
                .setValue("sym_algorithm", "AES/CBC/PKCS5Padding")
                .setValue("sym_iv_length", 16)
                .setValue("asym_keylength", 2048)
                .setValue("asym_algorithm", "RSA")
                .setValue("change_key_on_leave", true));
        }
        stack.add(new NAKACK2()
            .setValue("use_mcast_xmit", false));
        stack.add(new UNICAST3());
        stack.add(new STABLE()
            .setValue("desired_avg_gossip", 50000)
            .setValue("max_bytes", 4000000));
        // protocol will log auth request source and response
        stack.add(createAuthProtocol());
        stack.add(new GMS()
            .setValue("join_timeout", 3000));
        stack.add(new MFC()
            .setValue("max_credits", 2000000)
            .setValue("min_credits", 800000));
        stack.add(new FRAG2());
        stack.add(new STATE_TRANSFER());
        return new JChannel(stack);
    }

Thanks again,
Bobby

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-25 14:30:37

Hi Bobby
apologies for the delay!

You cannot have the old cluster's initial_hosts be 128,129,130 and the 
new one has the overlapping range 130,131.

The old cluster will try to contact 130 (e.g. trying to merge), thereby 
send its information to 130.

Depending on traffic patterns, everbody will know everyone's else's 
address, or not. For example, it could be that 128 and 130 know everyone 
else, but 129 and 131 don't know each other.

In the former case, there will be a merge to {128,129,130,131}. In the 
latter case, members will fail to talk to other members, as they don't 
have the other members in their logical address cache.

If the old cluster didn't have 130 in its initial_hosts, everything 
would be fine.

What is it you're trying to achieve?

If you're trying to start a new cluster, then either give it a new 
cluster name and/or a new set of (unused) ports. Both cluster names and 
ports could be dished out by a server accessible to all.
Cheers


On 04.05.21 21:00, Questions/problems related to using JGroups wrote:
> On Tue, May 4, 2021 at 1:53 AM Questions/problems related to using 
> JGroups via javagroups-users <jav...@li... 
> <mailto:jav...@li...>> wrote:
> 
>     [...]
> 
> 
>      > 3. Later they use a node with the same address to join a different
>      > cluster with the same name.
> 
>     Can you post an example? Note that discovery requests from different
>     clusters are discarded.
> 
> 
> Sure, but in summary: I can't reuse an IP address after it's already 
> been in a cluster. The customer is trying to run separate clusters, but 
> the address of a node in one of them was previously in a different one, 
> and that is causing problems.
> 
> My config is programmatic; I've included it below. We use a custom 
> authentication class. When authenticate() is called it will output the 
> source and the response it's returning. I've set the jgroups logging to 
> DEBUG level; my application only logs the initial_hosts it sets and the 
> authentication calls. The member addresses end in: 128, 129, 130, and 131.
> 
> 1. I start a cluster with (started in this order) 128, 129, 130. Each of 
> them has all three of those addresses in initial_hosts.
> 
> 2. I shut down the application running on 130. The logs for 128 and 129 
> have "*** stopping application on .130" in them right before this.
> 
> 3. I start an application on 131 that has 130/131 in initial_hosts.
> 
> 4. I start a new application on the node with the 130 address. It has 
> 130 and 131 in initial hosts. The logs on 128 and 129 have "*** new 
> application on .130 starting and will join new cluster with .131" in 
> them to show when it happens.
> 
> About a minute later, the errors start showing up. The 128 application 
> is trying to connect to the one running on 130 even though that one had 
> previously shut down and left the cluster. The new one on 130 doesn't 
> let it join, and there are merge views repeating with warning messages 
> throughout. There is a merge view change every minute or so in the 
> original cluster (128/129).
> 
> The stack we create (comments and text changes for sharing):
> 
>      public JChannel createJChannel() throws Exception {
>          Logger logger = <...>
>          logger.log(Level.DEBUG, "Creating default JChannel.");
>          List<Protocol> stack = new ArrayList<>();
>          final Protocol tcp = new TCP()
>              // bind_addr will be same address, e.g. .128, .129, etc 
> that we use in initial_hosts
>              .setValue("bind_addr", 
> InetAddress.getByName(getBindingAddress()))
>              .setValue("bind_port", bindingPort)
>              .setValue("thread_pool_min_threads", 1)
>              .setValue("thread_pool_keep_alive_time", 5000)
>              .setValue("send_buf_size", 640000)
>              .setValue("sock_conn_timeout", 300)
>              .setValue("recv_buf_size", 5000000);
>          // some optional things we could add to tcp removed. not used 
> in this example
>          stack.add(tcp);
>          stack.add(new TCPPING()
>              // the parseHostList method will output the list for this 
> example at ERROR level
>              .setValue("initial_hosts", parseHostList())
>              .setValue("send_cache_on_join", true)
>              .setValue("port_range", 0));
>          stack.add(new MERGE3()
>              .setValue("min_interval", 10000)
>              .setValue("max_interval", 30000));
>          FD_ALL fdAll = new FD_ALL();
>          final long jgroupsTimeout = <>
>          fdAll.setValue("timeout", jgroupsTimeout);
>          final long maxInterval = jgroupsTimeout / 3L; // to have ~3 
> heartbeats before going suspect. <jira number removed>
>          if (maxInterval < fdAll.getInterval()) {
>              logger.log(Level.WARN, ".......");
>              fdAll.setValue("interval", maxInterval);
>          }
>          stack.add(fdAll);
>          stack.add(new VERIFY_SUSPECT()
>              .setValue("timeout", 1500));
>          stack.add(new BARRIER());
>          if (getBoolean(<an application property>)) {
>              logger.debug("adding jgroups asym encryption");
>              stack.add(new ASYM_ENCRYPT()
>                  .setValue("sym_keylength", 128)
>                  .setValue("sym_algorithm", "AES/CBC/PKCS5Padding")
>                  .setValue("sym_iv_length", 16)
>                  .setValue("asym_keylength", 2048)
>                  .setValue("asym_algorithm", "RSA")
>                  .setValue("change_key_on_leave", true));
>          }
>          stack.add(new NAKACK2()
>              .setValue("use_mcast_xmit", false));
>          stack.add(new UNICAST3());
>          stack.add(new STABLE()
>              .setValue("desired_avg_gossip", 50000)
>              .setValue("max_bytes", 4000000));
>          // protocol will log auth request source and response
>          stack.add(createAuthProtocol());
>          stack.add(new GMS()
>              .setValue("join_timeout", 3000));
>          stack.add(new MFC()
>              .setValue("max_credits", 2000000)
>              .setValue("min_credits", 800000));
>          stack.add(new FRAG2());
>          stack.add(new STATE_TRANSFER());
>          return new JChannel(stack);
>      }
> 
> Thanks again,
> Bobby
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
> 

-- 
Bela Ban | http://www.jgroups.org

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-25 17:59:07

On Tue, May 25, 2021 at 10:30 AM Questions/problems related to using
JGroups via javagroups-users <jav...@li...> wrote:

> Hi Bobby
> apologies for the delay!
>

No problem -- thanks for looking.

>
> You cannot have the old cluster's initial_hosts be 128,129,130 and the
> new one has the overlapping range 130,131.
>

That's the problem. The customer has lots of nodes, clusters that grow and
shrink, and they're going to reuse the same IP addresses eventually.

>
> The old cluster will try to contact 130 (e.g. trying to merge), thereby
> send its information to 130.
>

Right, and what they want is some way to fully remove a node from a
cluster. I.e. the cluster stops trying to contact that address.

>
> What is it you're trying to achieve?
>

Simply to take a node out of a cluster when it's not needed, then later
reuse the address of that node with a different cluster. If I change the
cluster names (same port though) then I still get constant warnings, like:
JGRP000012: discarded message from different cluster <old> (our cluster is
<new>). Sender was <some addr>

We can suggest that they restart the cluster after removing a node, but I
don't know if that will work for them. I'll also try using different ports
for different clusters and see how that works for them. Given the size of
the company in question, I can see that it might be hard to coordinate that
and eventually they'll get back in the same situation where a previously
used address is being used again with the same port it used the last time.

Thanks,
Bobby

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2021-05-26 07:26:10


On 25.05.21 18:59, Questions/problems related to using JGroups wrote:
> On Tue, May 25, 2021 at 10:30 AM Questions/problems related to using 
> JGroups via javagroups-users <jav...@li... 
> <mailto:jav...@li...>> wrote:
> 
>     Hi Bobby
>     apologies for the delay!
> 
> 
> No problem -- thanks for looking.
> 
> 
>     You cannot have the old cluster's initial_hosts be 128,129,130 and the
>     new one has the overlapping range 130,131.
> 
> 
> That's the problem. The customer has lots of nodes, clusters that grow 
> and shrink, and they're going to reuse the same IP addresses eventually.


Then using TCPPING for the discovery is the wrong solution; it is 
designed for a static cluster with a fixed and known membership.

For the above requirements, I'd rather recommend:
* A dynamic discovery mechanism (TCPGOSSIP, FILE_PING, GOOGLE_PING etc)
* Emphemeral ports
* A new (different) cluster name for each new cluster that is started


>     The old cluster will try to contact 130 (e.g. trying to merge), thereby
>     send its information to 130.
> 
> 
> Right, and what they want is some way to fully remove a node from a 
> cluster. I.e. the cluster stops trying to contact that address.


Then you would have to remove the 130 node from the old cluster's 
initial_hosts (TCPPING) and TCP's logical address cache. Either by 
restarting, or by programmatically removing it. This can get complex 
quickly though, as you'd have to maintain a list of ports per cluster.

The first solution above is much better IMO.


>     What is it you're trying to achieve?
> 
> 
> Simply to take a node out of a cluster when it's not needed, then later 
> reuse the address of that node with a different cluster. If I change the 
> cluster names (same port though) then I still get constant warnings, like:
> JGRP000012: discarded message from different cluster <old> (our cluster 
> is <new>). Sender was <some addr>
> 
> We can suggest that they restart the cluster after removing a node, but 
> I don't know if that will work for them. I'll also try using different 
> ports for different clusters and see how that works for them.

That will certainly work, but - again - you'd have to maintain ports 
numbers for each cluster. Registration service? Excel spreadsheet?


> Given the size of the company in question, I can see that it might be hard to 
> coordinate that and eventually they'll get back in the same situation 
> where a previously used address is being used again with the same port 
> it used the last time.

Right. So I have to come back to my suggestion of not using TCPPING!
Cheers,


> Thanks,
> Bobby
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
> 

-- 
Bela Ban | http://www.jgroups.org

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2022-06-06 19:35:18

Hi again Bela et al,

We've finally come back to this issue after not working on the product for
a while. I'm keeping the context all below, but the short version was that
we use TCPPING and, if someone removes a node with address X and, later,
starts a new cluster that includes the address, the old cluster keeps
trying to find its lost buddy at X.

We're still back on v4.1.8 and I wanted to ask if the suggestion below,
i.e. use TCPGOSSIP or FILE_PING (this is for in-house deployments on their
own networks) is the most appropriate, and if there would be any benefit
for this particular issue by moving to v5.X? The way they run things now is
to put host:port info for each node in a file and then start the
applications, which read that file to set initial hosts. So FILE_PING might
be the best for them so that we don't need to have any new processes
running.

Thanks,
Bobby

On Wed, May 26, 2021 at 3:26 AM Questions/problems related to using JGroups
via javagroups-users <jav...@li...> wrote:

>
>
> On 25.05.21 18:59, Questions/problems related to using JGroups wrote:
> > On Tue, May 25, 2021 at 10:30 AM Questions/problems related to using
> > JGroups via javagroups-users <jav...@li...
> > <mailto:jav...@li...>> wrote:
> >
> >     Hi Bobby
> >     apologies for the delay!
> >
> >
> > No problem -- thanks for looking.
> >
> >
> >     You cannot have the old cluster's initial_hosts be 128,129,130 and
> the
> >     new one has the overlapping range 130,131.
> >
> >
> > That's the problem. The customer has lots of nodes, clusters that grow
> > and shrink, and they're going to reuse the same IP addresses eventually.
>
>
> Then using TCPPING for the discovery is the wrong solution; it is
> designed for a static cluster with a fixed and known membership.
>
> For the above requirements, I'd rather recommend:
> * A dynamic discovery mechanism (TCPGOSSIP, FILE_PING, GOOGLE_PING etc)
> * Emphemeral ports
> * A new (different) cluster name for each new cluster that is started
>
>
> >     The old cluster will try to contact 130 (e.g. trying to merge),
> thereby
> >     send its information to 130.
> >
> >
> > Right, and what they want is some way to fully remove a node from a
> > cluster. I.e. the cluster stops trying to contact that address.
>
>
> Then you would have to remove the 130 node from the old cluster's
> initial_hosts (TCPPING) and TCP's logical address cache. Either by
> restarting, or by programmatically removing it. This can get complex
> quickly though, as you'd have to maintain a list of ports per cluster.
>
> The first solution above is much better IMO.
>
>
> >     What is it you're trying to achieve?
> >
> >
> > Simply to take a node out of a cluster when it's not needed, then later
> > reuse the address of that node with a different cluster. If I change the
> > cluster names (same port though) then I still get constant warnings,
> like:
> > JGRP000012: discarded message from different cluster <old> (our cluster
> > is <new>). Sender was <some addr>
> >
> > We can suggest that they restart the cluster after removing a node, but
> > I don't know if that will work for them. I'll also try using different
> > ports for different clusters and see how that works for them.
>
> That will certainly work, but - again - you'd have to maintain ports
> numbers for each cluster. Registration service? Excel spreadsheet?
>
>
> > Given the size of the company in question, I can see that it might be
> hard to
> > coordinate that and eventually they'll get back in the same situation
> > where a previously used address is being used again with the same port
> > it used the last time.
>
> Right. So I have to come back to my suggestion of not using TCPPING!
> Cheers,
>
>
> > Thanks,
> > Bobby
> >
> >
> >
> > _______________________________________________
> > javagroups-users mailing list
> > jav...@li...
> > https://lists.sourceforge.net/lists/listinfo/javagroups-users
> >
>
> --
> Bela Ban | http://www.jgroups.org
>
>
>
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users
>

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2022-06-10 09:47:52

Hi Bobby

On 06.06.22 21:08, Questions/problems related to using JGroups wrote:
> Hi again Bela et al,
> 
> We've finally come back to this issue after not working on the product 
> for a while. I'm keeping the context all below, but the short version 
> was that we use TCPPING and, if someone removes a node with address X 
> and, later, starts a new cluster that includes the address, the old 
> cluster keeps trying to find its lost buddy at X.


Right, and I suggested using a dynamic discovery protocol, *not* TCPPING.


> We're still back on v4.1.8 and I wanted to ask if the suggestion below, 
> i.e. use TCPGOSSIP or FILE_PING (this is for in-house deployments on 
> their own networks) is the most appropriate, and if there would be any 
> benefit for this particular issue by moving to v5.X?

There are loads of benefits by moving to 5.x :-)

But, specifically to this case, only the ability to have multiple 
discovery protocols in the same stack would be beneficial here. I guess 
MULTI_PING in 4.x might do the same job though...


> The way they run 
> things now is to put host:port info for each node in a file and then 
> start the applications, which read that file to set initial hosts. So 
> FILE_PING might be the best for them so that we don't need to have any 
> new processes running.

Yes, the benefits/drawbacks of FILE_PING are
+ No additional process needed
+ All processes access a shared dir, e.g. on NFS
- NFS adds overhead (but only for discovery)
+ The discovery info is human-readable, and can thus be modified 
manually (if needed)


> Thanks,
> Bobby
> 
> On Wed, May 26, 2021 at 3:26 AM Questions/problems related to using 
> JGroups via javagroups-users <jav...@li... 
> <mailto:jav...@li...>> wrote:
> 
> 
> 
>     On 25.05.21 18:59, Questions/problems related to using JGroups wrote:
>      > On Tue, May 25, 2021 at 10:30 AM Questions/problems related to using
>      > JGroups via javagroups-users
>     <jav...@li...
>     <mailto:jav...@li...>
>      > <mailto:jav...@li...
>     <mailto:jav...@li...>>> wrote:
>      >
>      >     Hi Bobby
>      >     apologies for the delay!
>      >
>      >
>      > No problem -- thanks for looking.
>      >
>      >
>      >     You cannot have the old cluster's initial_hosts be
>     128,129,130 and the
>      >     new one has the overlapping range 130,131.
>      >
>      >
>      > That's the problem. The customer has lots of nodes, clusters that
>     grow
>      > and shrink, and they're going to reuse the same IP addresses
>     eventually.
> 
> 
>     Then using TCPPING for the discovery is the wrong solution; it is
>     designed for a static cluster with a fixed and known membership.
> 
>     For the above requirements, I'd rather recommend:
>     * A dynamic discovery mechanism (TCPGOSSIP, FILE_PING, GOOGLE_PING etc)
>     * Emphemeral ports
>     * A new (different) cluster name for each new cluster that is started
> 
> 
>      >     The old cluster will try to contact 130 (e.g. trying to
>     merge), thereby
>      >     send its information to 130.
>      >
>      >
>      > Right, and what they want is some way to fully remove a node from a
>      > cluster. I.e. the cluster stops trying to contact that address.
> 
> 
>     Then you would have to remove the 130 node from the old cluster's
>     initial_hosts (TCPPING) and TCP's logical address cache. Either by
>     restarting, or by programmatically removing it. This can get complex
>     quickly though, as you'd have to maintain a list of ports per cluster.
> 
>     The first solution above is much better IMO.
> 
> 
>      >     What is it you're trying to achieve?
>      >
>      >
>      > Simply to take a node out of a cluster when it's not needed, then
>     later
>      > reuse the address of that node with a different cluster. If I
>     change the
>      > cluster names (same port though) then I still get constant
>     warnings, like:
>      > JGRP000012: discarded message from different cluster <old> (our
>     cluster
>      > is <new>). Sender was <some addr>
>      >
>      > We can suggest that they restart the cluster after removing a
>     node, but
>      > I don't know if that will work for them. I'll also try using
>     different
>      > ports for different clusters and see how that works for them.
> 
>     That will certainly work, but - again - you'd have to maintain ports
>     numbers for each cluster. Registration service? Excel spreadsheet?
> 
> 
>      > Given the size of the company in question, I can see that it
>     might be hard to
>      > coordinate that and eventually they'll get back in the same
>     situation
>      > where a previously used address is being used again with the same
>     port
>      > it used the last time.
> 
>     Right. So I have to come back to my suggestion of not using TCPPING!
>     Cheers,
> 
> 
>      > Thanks,
>      > Bobby
>      >
>      >
>      >
>      > _______________________________________________
>      > javagroups-users mailing list
>      > jav...@li...
>     <mailto:jav...@li...>
>      > https://lists.sourceforge.net/lists/listinfo/javagroups-users
>     <https://lists.sourceforge.net/lists/listinfo/javagroups-users>
>      >
> 
>     -- 
>     Bela Ban | http://www.jgroups.org <http://www.jgroups.org>
> 
> 
> 
>     _______________________________________________
>     javagroups-users mailing list
>     jav...@li...
>     <mailto:jav...@li...>
>     https://lists.sourceforge.net/lists/listinfo/javagroups-users
>     <https://lists.sourceforge.net/lists/listinfo/javagroups-users>
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users

-- 
Bela Ban | http://www.jgroups.org

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2022-06-06 20:40:05

Although, looking at this again, I think we might not be talking about the
same setup. From this:

>
> On Wed, May 26, 2021 at 3:26 AM Questions/problems related to using
> JGroups via javagroups-users <jav...@li...>
> wrote:
>
>> [....]
>>
>> >
>> > Right, and what they want is some way to fully remove a node from a
>> > cluster. I.e. the cluster stops trying to contact that address.
>>
>>
>> Then you would have to remove the 130 node from the old cluster's
>> initial_hosts (TCPPING) and TCP's logical address cache. Either by
>> restarting, or by programmatically removing it. This can get complex
>> quickly though, as you'd have to maintain a list of ports per cluster.
>
>
Each cluster is separate from all the others, so I don't know what I would
need to keep in this list or why a cluster would need it. If a cluster has
A/B/C/D in it, and the code sees that D leaves the cluster without going
suspect first, can I programmatically do these?

- set new initial_hosts on the existing TCPPING protocol in my stack to
include only A/B/C
- access the logical address cache and remove the address

I mean, I know I can hack the TCPPING again, but didn't know that would
have any effect on the existing channel and members. I don't know offhand
how to access the address cache, which I think is all I'm missing
to experiment with this. If I can do the above then I think that solves the
issue -- if a suspect member leaves the view I won't do anything, because
we want to keep trying it in case it was disconnected and reconnected. But
if a member leaves gracefully and the above is all I need to make the
cluster forget about it, that's great and means we wouldn't have to change
any startup features for the customers.

Thanks again,
Bobby

Re: [jgroups-users] removing unknown address from cluster? JGRP000032

From: Questions/problems r. to u. J. <jav...@li...> - 2022-06-10 10:25:09


On 06.06.22 22:15, Questions/problems related to using JGroups wrote:
> Although, looking at this again, I think we might not be talking about 
> the same setup. From this:
> 
> 
> 
>     On Wed, May 26, 2021 at 3:26 AM Questions/problems related to using
>     JGroups via javagroups-users <jav...@li...
>     <mailto:jav...@li...>> wrote:
> 
>         [....]
> 
>          >
>          > Right, and what they want is some way to fully remove a node
>         from a
>          > cluster. I.e. the cluster stops trying to contact that address.
> 
> 
>         Then you would have to remove the 130 node from the old cluster's
>         initial_hosts (TCPPING) and TCP's logical address cache. Either by
>         restarting, or by programmatically removing it. This can get
>         complex
>         quickly though, as you'd have to maintain a list of ports per
>         cluster.
> 
> 
> Each cluster is separate from all the others, so I don't know what I 
> would need to keep in this list or why a cluster would need it. 


Referring to my previous email: if you use FILE_PING, each cluster has a 
_separate_ directory (the cluster name) under which the discovery info 
is stored.


> If a cluster has A/B/C/D in it, and the code sees that D leaves the cluster 
> without going suspect first, can I programmatically do these?

For TCP, it's complicated, but doable. Among other things you'd have to:
- Close all TCP connections to D
- Close all connections to D in UNICAST3, too
- Remove D's info from the address cache (contents: 'probe.sh uuids')
- Remove D's information from all instances of TCPPING (initial_hosts 
and dynamic_hosts)

Again, using a dynamic discovery protocol such as FILE_PING makes more 
sense here.

> - set new initial_hosts on the existing TCPPING protocol in my stack to 
> include only A/B/C
> - access the logical address cache and remove the address

Yes, but this is not enough (see above).

> I mean, I know I can hack the TCPPING again, but didn't know that would 
> have any effect on the existing channel and members. I don't know 
> offhand how to access the address cache, which I think is all I'm 
> missing to experiment with this.

Pseudo code:
TP tp=channel.getProtocolStack().getTransport();
LazyRemovalCache cache=tp.getLogicalAddressCache();
cache.remove(address, true); // force removal


> If I can do the above then I think that 
> solves the issue -- if a suspect member leaves the view I won't do 
> anything, because we want to keep trying it in case it was disconnected 
> and reconnected. But if a member leaves gracefully and the above is all 
> I need to make the cluster forget about it, that's great and means we 
> wouldn't have to change any startup features for the customers.
> 
> Thanks again,
> Bobby
> 
> 
> 
> _______________________________________________
> javagroups-users mailing list
> jav...@li...
> https://lists.sourceforge.net/lists/listinfo/javagroups-users

-- 
Bela Ban | http://www.jgroups.org