JGroups / Discussion / Help: Join problems

Heiko Tappe - 2019-11-26

Let's start with: I am a newbie to jgroups. So be patient with me ;-)

What I try to achieve are distributed locks.
Right now I do my first tests locally with one node. And some things seem to work already (a bit):
I can get a lock and a second attempt in the same session (same lock service instance) to get a lock fails.
But another instance of the lock service does not seem to "see" any locks held by the other "session".

The problem might be related to the warnings I see in the server log:

WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could not join /228.8.8.8:45588 on interface net0
WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could not join /228.8.8.8:45588 on interface eth0
...
...

Any idea?

Thanks in advance,
Heiko

BTW, my properties look like this:

<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd"> <UDP mcast_port="${jgroups.udp.mcast_port:45588}" receive_on_all_interfaces="true" enable_diagnostics="true" /> <PING /> <MERGE3 max_interval="30000" min_interval="10000" /> <FD_SOCK /> <FD_ALL /> <VERIFY_SUSPECT /> <BARRIER /> <pbcast.NAKACK2 xmit_interval="500" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="30000" use_mcast_xmit="false" discard_delivered_msgs="true"/> <UNICAST3 xmit_interval="500" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_table_max_compaction_time="60000" conn_expiry_timeout="0"/> <pbcast.STABLE /> <pbcast.GMS print_local_addr="true" join_timeout="2000"/> <UFC /> <MFC /> <FRAG2 /> <RSVP resend_interval="2000" timeout="10000"/> <pbcast.STATE_TRANSFER /> <CENTRAL_LOCK /> </config>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2019-11-26
  
  On 26.11.19 11:27 AM, Heiko Tappe wrote:
  
  Let's start with: I am a newbie to jgroups. So be patient with me ;-)
  
  always am... :-)
  
  What I try to achieve are distributed locks.
  Right now I do my first tests locally with one node. And some things
  seem to work already (a bit):
  I can get a lock and a second attempt in the same session (same lock
  service instance) to get a lock fails.
  But another instance of the lock service does not seem to "see" any
  locks held by the other "session".
  
  The problem might be related to the warnings I see in the server log:
  
  WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could
  not join /228.8.8.8:45588 on interface net0
  WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could
  not join /228.8.8.8:45588 on interface eth0
  
  Could be that your cluster doesn't form. I suggest use probe.sh (check
  the manual for details) to see if the cluster forms correctly. Also I
  suggest set UDP.bind_addr/mcast_addr, and verify you have a multicast
  route. See
  https://github.com/belaban/JGroups/wiki/Multicast-routing-on-Mac-OS for
  details. It is for MacOS, but I guess other systems have this issue, too.
  
  ...
  ...
  
  Any idea?
  
  Thanks in advance,
  Heiko
  
  BTW, my properties look like this:
  
  <config www.jgroups.org="" xmlns="urn:org:jgroups" jgroups.xsd"="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" http:="" xsi:schemalocation="urn:org:jgroups <a href=" schema="">http://www.jgroups.org/schema/jgroups.xsd">
  <udp mcast_port="${jgroups.udp.mcast_port:45588}" receive_on_all_interfaces="true" enable_diagnostics="true">
  <ping>
  <merge3 min_interval="10000" max_interval="30000">
  <fd_sock>
  <fd_all>
  <verify_suspect>
  <barrier>
  <pbcast.nakack2 discard_delivered_msgs="true" use_mcast_xmit="false" xmit_table_num_rows="100" xmit_table_max_compaction_time="30000" xmit_interval="500" xmit_table_msgs_per_row="2000">
  <unicast3 xmit_table_max_compaction_time="60000" conn_expiry_timeout="0" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_interval="500">
  <pbcast.stable>
  <pbcast.gms join_timeout="2000" print_local_addr="true">
  <ufc>
  <mfc>
  <frag2>
  <rsvp resend_interval="2000" timeout="10000">
  <pbcast.state_transfer>
  <central_lock>
  </central_lock></pbcast.state_transfer></rsvp></frag2></mfc></ufc></pbcast.gms></pbcast.stable></unicast3></pbcast.nakack2></barrier></verify_suspect></fd_all></fd_sock></merge3></ping></udp></config>
  
  Join problems
  https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#62e3
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18795/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Here is what I got...

I explicitly defined UDP like this:

<UDP mcast_port="${jgroups.udp.mcast_port:45588}" mcast_addr="${jgroups.udp.mcast_addr:224.0.0.0}" bind_addr="${jgroups.bind_addr:192.168.1.21}" receive_on_all_interfaces="true" enable_diagnostics="true" />

When I start my server (Wildfly 17.0.1.Final) and call probe like this:

$ java -cp jgroups-4.0.19.Final.jar -Djava.net.preferIPv4Stack=true org.jgroups.tests.Probe

I get

0 responses (0 matches, 0 non matches)

If I then init the LockService like this:

JChannel ch = new JChannel(props); lockService = new LockService(ch); ch.connect("lock-cluster");

I see the following lines in the server log:

JGRP000048: Could not join /224.0.0.0:45588 on interface net0 JGRP000048: Could not join /224.0.0.0:45588 on interface eth0 JGRP000048: Could not join /224.0.0.0:45588 on interface net1 JGRP000048: Could not join /224.0.0.0:45588 on interface eth2 JGRP000048: Could not join /224.0.0.0:45588 on interface eth3 ...

Probing again gives me:

#1 (158 bytes): local_addr=tdprg01-43803 physical_addr=192.168.1.21:60684 view=[tdprg01-43803|0] (1) [tdprg01-43803] cluster=lock-cluster version=4.0.19.Final (Schiener Berg) 1 responses (1 matches, 0 non matches)

Does that help? I am clueless :-(
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2019-11-27
  
  You probably have no route (netstat -nr will show you) from 192.168. to
  224..
  
  I suggest do not use 224.x.x.x, as routers may even discard
  application traffic to these addrs. I suggest use sth like 232.5.5.5 and
  perhaps add a route to it:
  
  sudo route add -net 232.0.0.0/5 192.168.1.21
  
  On 27.11.19 9:00 AM, Heiko Tappe wrote:
  
  Here is what I got...
  
  I explicitly defined UDP like this:
  
  <udp mcast_port="${jgroups.udp.mcast_port:45588}" bind_addr="${jgroups.bind_addr:192.168.1.21}" mcast_addr="${jgroups.udp.mcast_addr:224.0.0.0}" receive_on_all_interfaces="true" enable_diagnostics="true"></udp>
  
  When I start my server (Wildfly 17.0.1.Final) and call probe like this:
  
  |$ java -cp jgroups-4.0.19.Final.jar -Djava.net.preferIPv4Stack=true
  org.jgroups.tests.Probe|
  
  I get
  
  |0 responses (0 matches, 0 non matches)|
  
  If I then init the LockService like this:
  
  JChannel ch = new JChannel(props);
  lockService = new LockService(ch);
  ch.connect("lock-cluster");
  
  I see the following lines in the server log:
  
  JGRP000048: Could not join /224.0.0.0:45588 on interface net0
  JGRP000048: Could not join /224.0.0.0:45588 on interface eth0
  JGRP000048: Could not join /224.0.0.0:45588 on interface net1
  JGRP000048: Could not join /224.0.0.0:45588 on interface eth2
  JGRP000048: Could not join /224.0.0.0:45588 on interface eth3
  ...
  
  Probing again gives me:
  
  1 (158 bytes):
  
  local_addr=tdprg01-43803
  physical_addr=192.168.1.21:60684
  view=[tdprg01-43803|0] (1) [tdprg01-43803]
  cluster=lock-cluster
  version=4.0.19.Final (Schiener Berg)
  
  1 responses (1 matches, 0 non matches)
  
  Does that help? I am clueless :-(
  
  Join problems
  https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#ed37
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18795/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I switched to 232.5.5.5 with no success.

netstat -nr gives me:

IPv4-Routentabelle
===========================================================================
Aktive Routen:
     Netzwerkziel    Netzwerkmaske          Gateway    Schnittstelle Metrik
          0.0.0.0          0.0.0.0      192.168.1.7     192.168.1.21     25
        127.0.0.0        255.0.0.0   Auf Verbindung         127.0.0.1    331
        127.0.0.1  255.255.255.255   Auf Verbindung         127.0.0.1    331
  127.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
      192.168.1.0    255.255.255.0   Auf Verbindung      192.168.1.21    281
     192.168.1.21  255.255.255.255   Auf Verbindung      192.168.1.21    281
    192.168.1.255  255.255.255.255   Auf Verbindung      192.168.1.21    281
        224.0.0.0        240.0.0.0   Auf Verbindung         127.0.0.1    331
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.1.21    281
  255.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.1.21    281
===========================================================================
Ständige Routen:
  Keine

$netstat -an | find ":45588" returns:

UDP 0.0.0.0:45588 *:*

Why '0.0.0.0'? I expected to see some multicast address!?

Just again to clarify:
I am still "local" with just one node/server - no cluster yet! Windows 10 client.

Bela Ban - 2019-11-27

Try setting receive_on_all_interfaces to false, and use 4.1.8.Final

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Hmm. Things improved. A lot :-)
But I am not exactly sure why.
In the last tests I was so focused on the log messages ("Could not join ...") that I didn't check the lock functionality itself. And surprise - it works now, as far as I can tell! But the warnings in the log still remain.
One thing I noticed was some strange socket binding with a multi cast address of "192.168.1.21". Maybe things got better after removing that. I will recheck...
As for the lock functionality... What is the expected behaviour if client 1 gets a lock on "asdf", then client 2 tries to get a lock on "asdf" using a 10 sec timeout. Client 1 releases the lock within these 10 sec. Is tryLock returning true on client 2 after the unlock on client 1?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2019-11-27
  
  On 27.11.19 12:40, Heiko Tappe wrote:
  
  Hmm. Things improved. A lot :-)
  But I am not exactly sure why.
  In the last tests I was so focused on the log messages ("Could not join
  ...") that I didn't check the lock functionality itself. And surprise -
  it works now, as far as I can tell! But the warnings in the log still
  remain.
  
  Perhaps you should post the full logs of the 2 members' startup
  
  One thing I noticed was some strange socket binding with a multi cast
  address of "192.168.1.21". Maybe things got better after removing that.
  
  What are you referring to? You posted mcast_addr="224.0.0.0", not
  192.168.1.21!
  
  I will recheck...
  As for the lock functionality... What is the expected behaviour if
  client 1 gets a lock on "asdf", then client 2 tries to get a lock on
  "asdf" using a 10 sec timeout. Client 1 releases the lock within these
  10 sec. Is tryLock returning true on client 2 after the unlock on client 1?
  
  Yes
  
  Join problems
  https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#dcfa
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18795/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban | http://www.jgroups.org
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Heiko Tappe - 2019-11-27
    
    As for the mcast_addr - I found this in my wildfly socket bindings config and removed it:
    
    <socket-binding name="jgroups-diagnostics" multicast-address="${jboss.jgroups.diagnostics_addr:192.168.1.21}" multicast-port="${jboss.jgroups.diagnostics_port:7500}"/>
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Oh. I just noticed that releasing the lock on client 1 does not succeed if client 2 is trying to get a lock with timeout. Is that the expected behaviour?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2019-11-27
  
  On 27.11.19 12:42, Heiko Tappe wrote:
  
  Oh. I just noticed that releasing the lock on client 1 does not succeed
  if client 2 is trying to get a lock with timeout. Is that the expected
  behaviour?
  
  Client1 holds lock X
  
  Client2 does a trylock X 20000 // 20secs
  
  Client1 unlocks X within 20 secs
  
  Client2 will be able to acquire lock X
  
  Join problems
  https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#8111
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18795/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban | http://www.jgroups.org
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Heiko Tappe - 2019-11-27
    
    I see a different behaviour. Client 1 obviously does not release the lock if a tryLock of client 2 is busy. There is no error. But I can tell from calling printLocks afterwards.
    Without a running tryLock of client 2 releasing the lock works as expected.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Bela Ban - 2019-11-27
      
      Works for me, with CENTRAL_LOCK2 though... haven't looked at
      CENTRAL_LOCK for a while. Have you tried with 4.1.8?
      
      On 27.11.19 13:06, Heiko Tappe wrote:
      
      I see a different behaviour. Client 1 obviously does not release the
      lock if a tryLock of client 2 is busy. There is no error. But I can tell
      from calling printLocks afterwards.
      Without a running tryLock of client 2 releasing the lock works as expected.
      
      Join problems
      https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#8111/6404/c432
      
      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18795/
      
      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/
      
      --
      Bela Ban | http://www.jgroups.org
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Tried with CENTRAL_LOCK2. Same behaviour.

As for 4.1.8 - I don't know if I can just replace the jgroups wildfly module 4.0.19.Final with 4.1.8. First attempt gives me a

java.lang.NoSuchMethodError: org.jgroups.conf.XmlConfigurator.getInstance(Ljava/net/URL;)Lorg/jgroups/conf/XmlConfigurator

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Oops. Missed the method:

org.jgroups.conf.XmlConfigurator.getInstance(Ljava/net/URL;)

And it's correct. There is no getInstance with a URL param any more.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Heiko Tappe - 2019-11-27

Tried 4.0.21.Final. Unfortunately without success.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Join problems

Forums

Help

Join problems document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

1 (158 bytes):

Join problems