Menu

Join problems

Help
2019-11-26
2019-11-27
  • Heiko Tappe

    Heiko Tappe - 2019-11-26

    Let's start with: I am a newbie to jgroups. So be patient with me ;-)

    What I try to achieve are distributed locks.
    Right now I do my first tests locally with one node. And some things seem to work already (a bit):
    I can get a lock and a second attempt in the same session (same lock service instance) to get a lock fails.
    But another instance of the lock service does not seem to "see" any locks held by the other "session".

    The problem might be related to the warnings I see in the server log:

    WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could not join /228.8.8.8:45588 on interface net0
    WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could not join /228.8.8.8:45588 on interface eth0
    ...
    ...

    Any idea?

    Thanks in advance,
    Heiko

    BTW, my properties look like this:

    <config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
        <UDP
            mcast_port="${jgroups.udp.mcast_port:45588}"
            receive_on_all_interfaces="true"
            enable_diagnostics="true"
        />
        <PING />
        <MERGE3
            max_interval="30000"
            min_interval="10000" />
        <FD_SOCK />
        <FD_ALL />
        <VERIFY_SUSPECT />
        <BARRIER />
        <pbcast.NAKACK2
            xmit_interval="500"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="2000"
            xmit_table_max_compaction_time="30000"
            use_mcast_xmit="false"
            discard_delivered_msgs="true"/>
        <UNICAST3
            xmit_interval="500"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="2000"
            xmit_table_max_compaction_time="60000"
            conn_expiry_timeout="0"/>
        <pbcast.STABLE />
        <pbcast.GMS print_local_addr="true" join_timeout="2000"/>
        <UFC />
        <MFC />
        <FRAG2 />
        <RSVP resend_interval="2000" timeout="10000"/>
        <pbcast.STATE_TRANSFER />
        <CENTRAL_LOCK />
    </config>
    
     
    • Bela Ban

      Bela Ban - 2019-11-26

      On 26.11.19 11:27 AM, Heiko Tappe wrote:

      Let's start with: I am a newbie to jgroups. So be patient with me ;-)

      always am... :-)

      What I try to achieve are distributed locks.
      Right now I do my first tests locally with one node. And some things
      seem to work already (a bit):
      I can get a lock and a second attempt in the same session (same lock
      service instance) to get a lock fails.
      But another instance of the lock service does not seem to "see" any
      locks held by the other "session".

      The problem might be related to the warnings I see in the server log:

      WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could
      not join /228.8.8.8:45588 on interface net0
      WARN [org.jgroups.protocols.UDP] (default task-2) JGRP000048: Could
      not join /228.8.8.8:45588 on interface eth0

      Could be that your cluster doesn't form. I suggest use probe.sh (check
      the manual for details) to see if the cluster forms correctly. Also I
      suggest set UDP.bind_addr/mcast_addr, and verify you have a multicast
      route. See
      https://github.com/belaban/JGroups/wiki/Multicast-routing-on-Mac-OS for
      details. It is for MacOS, but I guess other systems have this issue, too.

      ...
      ...

      Any idea?

      Thanks in advance,
      Heiko

      BTW, my properties look like this:

      <config www.jgroups.org="" xmlns="urn:org:jgroups" jgroups.xsd"="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" http:="" xsi:schemalocation="urn:org:jgroups &lt;a href=" schema="">http://www.jgroups.org/schema/jgroups.xsd">
      <udp mcast_port="${jgroups.udp.mcast_port:45588}" receive_on_all_interfaces="true" enable_diagnostics="true">
      <ping>
      <merge3 min_interval="10000" max_interval="30000">
      <fd_sock>
      <fd_all>
      <verify_suspect>
      <barrier>
      <pbcast.nakack2 discard_delivered_msgs="true" use_mcast_xmit="false" xmit_table_num_rows="100" xmit_table_max_compaction_time="30000" xmit_interval="500" xmit_table_msgs_per_row="2000">
      <unicast3 xmit_table_max_compaction_time="60000" conn_expiry_timeout="0" xmit_table_num_rows="100" xmit_table_msgs_per_row="2000" xmit_interval="500">
      <pbcast.stable>
      <pbcast.gms join_timeout="2000" print_local_addr="true">
      <ufc>
      <mfc>
      <frag2>
      <rsvp resend_interval="2000" timeout="10000">
      <pbcast.state_transfer>
      <central_lock>
      </central_lock></pbcast.state_transfer></rsvp></frag2></mfc></ufc></pbcast.gms></pbcast.stable></unicast3></pbcast.nakack2></barrier></verify_suspect></fd_all></fd_sock></merge3></ping></udp></config>


      Join problems
      https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#62e3


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18795/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Here is what I got...

    I explicitly defined UDP like this:

    <UDP
            mcast_port="${jgroups.udp.mcast_port:45588}"
            mcast_addr="${jgroups.udp.mcast_addr:224.0.0.0}"
            bind_addr="${jgroups.bind_addr:192.168.1.21}"
            receive_on_all_interfaces="true"
            enable_diagnostics="true"
        />
    

    When I start my server (Wildfly 17.0.1.Final) and call probe like this:

    $ java -cp jgroups-4.0.19.Final.jar -Djava.net.preferIPv4Stack=true org.jgroups.tests.Probe

    I get

    0 responses (0 matches, 0 non matches)

    If I then init the LockService like this:

    JChannel ch = new JChannel(props);
    lockService = new LockService(ch);
    ch.connect("lock-cluster");
    

    I see the following lines in the server log:

    JGRP000048: Could not join /224.0.0.0:45588 on interface net0
    JGRP000048: Could not join /224.0.0.0:45588 on interface eth0
    JGRP000048: Could not join /224.0.0.0:45588 on interface net1
    JGRP000048: Could not join /224.0.0.0:45588 on interface eth2
    JGRP000048: Could not join /224.0.0.0:45588 on interface eth3
    ...
    

    Probing again gives me:

    #1 (158 bytes):
    local_addr=tdprg01-43803
    physical_addr=192.168.1.21:60684
    view=[tdprg01-43803|0] (1) [tdprg01-43803]
    cluster=lock-cluster
    version=4.0.19.Final (Schiener Berg)
    
    
    1 responses (1 matches, 0 non matches)
    

    Does that help? I am clueless :-(

     
    • Bela Ban

      Bela Ban - 2019-11-27

      You probably have no route (netstat -nr will show you) from 192.168. to
      224.
      .

      I suggest do not use 224.x.x.x, as routers may even discard
      application traffic to these addrs. I suggest use sth like 232.5.5.5 and
      perhaps add a route to it:

      sudo route add -net 232.0.0.0/5 192.168.1.21

      On 27.11.19 9:00 AM, Heiko Tappe wrote:

      Here is what I got...

      I explicitly defined UDP like this:

      <udp mcast_port="${jgroups.udp.mcast_port:45588}" bind_addr="${jgroups.bind_addr:192.168.1.21}" mcast_addr="${jgroups.udp.mcast_addr:224.0.0.0}" receive_on_all_interfaces="true" enable_diagnostics="true"></udp>

      When I start my server (Wildfly 17.0.1.Final) and call probe like this:

      |$ java -cp jgroups-4.0.19.Final.jar -Djava.net.preferIPv4Stack=true
      org.jgroups.tests.Probe|

      I get

      |0 responses (0 matches, 0 non matches)|

      If I then init the LockService like this:

      JChannel ch = new JChannel(props);
      lockService = new LockService(ch);
      ch.connect("lock-cluster");

      I see the following lines in the server log:

      JGRP000048: Could not join /224.0.0.0:45588 on interface net0
      JGRP000048: Could not join /224.0.0.0:45588 on interface eth0
      JGRP000048: Could not join /224.0.0.0:45588 on interface net1
      JGRP000048: Could not join /224.0.0.0:45588 on interface eth2
      JGRP000048: Could not join /224.0.0.0:45588 on interface eth3
      ...

      Probing again gives me:

      1 (158 bytes):

      local_addr=tdprg01-43803
      physical_addr=192.168.1.21:60684
      view=[tdprg01-43803|0] (1) [tdprg01-43803]
      cluster=lock-cluster
      version=4.0.19.Final (Schiener Berg)

      1 responses (1 matches, 0 non matches)

      Does that help? I am clueless :-(


      Join problems
      https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#ed37


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18795/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    I switched to 232.5.5.5 with no success.

    netstat -nr gives me:

    IPv4-Routentabelle
    ===========================================================================
    Aktive Routen:
         Netzwerkziel    Netzwerkmaske          Gateway    Schnittstelle Metrik
              0.0.0.0          0.0.0.0      192.168.1.7     192.168.1.21     25
            127.0.0.0        255.0.0.0   Auf Verbindung         127.0.0.1    331
            127.0.0.1  255.255.255.255   Auf Verbindung         127.0.0.1    331
      127.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
          192.168.1.0    255.255.255.0   Auf Verbindung      192.168.1.21    281
         192.168.1.21  255.255.255.255   Auf Verbindung      192.168.1.21    281
        192.168.1.255  255.255.255.255   Auf Verbindung      192.168.1.21    281
            224.0.0.0        240.0.0.0   Auf Verbindung         127.0.0.1    331
            224.0.0.0        240.0.0.0   Auf Verbindung      192.168.1.21    281
      255.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
      255.255.255.255  255.255.255.255   Auf Verbindung      192.168.1.21    281
    ===========================================================================
    Ständige Routen:
      Keine
    

    $netstat -an | find ":45588" returns:

    UDP 0.0.0.0:45588 *:*

    Why '0.0.0.0'? I expected to see some multicast address!?

    Just again to clarify:
    I am still "local" with just one node/server - no cluster yet! Windows 10 client.

     
  • Bela Ban

    Bela Ban - 2019-11-27

    Try setting receive_on_all_interfaces to false, and use 4.1.8.Final

     
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Hmm. Things improved. A lot :-)
    But I am not exactly sure why.
    In the last tests I was so focused on the log messages ("Could not join ...") that I didn't check the lock functionality itself. And surprise - it works now, as far as I can tell! But the warnings in the log still remain.
    One thing I noticed was some strange socket binding with a multi cast address of "192.168.1.21". Maybe things got better after removing that. I will recheck...
    As for the lock functionality... What is the expected behaviour if client 1 gets a lock on "asdf", then client 2 tries to get a lock on "asdf" using a 10 sec timeout. Client 1 releases the lock within these 10 sec. Is tryLock returning true on client 2 after the unlock on client 1?

     
    • Bela Ban

      Bela Ban - 2019-11-27

      On 27.11.19 12:40, Heiko Tappe wrote:

      Hmm. Things improved. A lot :-)
      But I am not exactly sure why.
      In the last tests I was so focused on the log messages ("Could not join
      ...") that I didn't check the lock functionality itself. And surprise -
      it works now, as far as I can tell! But the warnings in the log still
      remain.

      Perhaps you should post the full logs of the 2 members' startup

      One thing I noticed was some strange socket binding with a multi cast
      address of "192.168.1.21". Maybe things got better after removing that.

      What are you referring to? You posted mcast_addr="224.0.0.0", not
      192.168.1.21!

      I will recheck...
      As for the lock functionality... What is the expected behaviour if
      client 1 gets a lock on "asdf", then client 2 tries to get a lock on
      "asdf" using a 10 sec timeout. Client 1 releases the lock within these
      10 sec. Is tryLock returning true on client 2 after the unlock on client 1?

      Yes


      Join problems
      https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#dcfa


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18795/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban | http://www.jgroups.org

       
      • Heiko Tappe

        Heiko Tappe - 2019-11-27

        As for the mcast_addr - I found this in my wildfly socket bindings config and removed it:

        <socket-binding name="jgroups-diagnostics" multicast-address="${jboss.jgroups.diagnostics_addr:192.168.1.21}" multicast-port="${jboss.jgroups.diagnostics_port:7500}"/>

         
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Oh. I just noticed that releasing the lock on client 1 does not succeed if client 2 is trying to get a lock with timeout. Is that the expected behaviour?

     
    • Bela Ban

      Bela Ban - 2019-11-27

      On 27.11.19 12:42, Heiko Tappe wrote:

      Oh. I just noticed that releasing the lock on client 1 does not succeed
      if client 2 is trying to get a lock with timeout. Is that the expected
      behaviour?

      • Client1 holds lock X
      • Client2 does a trylock X 20000 // 20secs
      • Client1 unlocks X within 20 secs
      • Client2 will be able to acquire lock X

      Join problems
      https://sourceforge.net/p/javagroups/discussion/18795/thread/ffd1a67189/?limit=25#8111


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18795/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban | http://www.jgroups.org

       
      • Heiko Tappe

        Heiko Tappe - 2019-11-27

        I see a different behaviour. Client 1 obviously does not release the lock if a tryLock of client 2 is busy. There is no error. But I can tell from calling printLocks afterwards.
        Without a running tryLock of client 2 releasing the lock works as expected.

         
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Tried with CENTRAL_LOCK2. Same behaviour.

    As for 4.1.8 - I don't know if I can just replace the jgroups wildfly module 4.0.19.Final with 4.1.8. First attempt gives me a

    java.lang.NoSuchMethodError: org.jgroups.conf.XmlConfigurator.getInstance(Ljava/net/URL;)Lorg/jgroups/conf/XmlConfigurator

     
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Oops. Missed the method:

    org.jgroups.conf.XmlConfigurator.getInstance(Ljava/net/URL;)

    And it's correct. There is no getInstance with a URL param any more.

     
  • Heiko Tappe

    Heiko Tappe - 2019-11-27

    Tried 4.0.21.Final. Unfortunately without success.

     

Log in to post a comment.