HA-JDBC / Discussion / Help: Activating on the second node hangs

I have two nodes/applications that is connected via jgroups, pointing to two database.

Once I up the two applications, if I trigger activate/deactivate from the first node, everything is fine, I can deactivate and activate it back unlimited times. But after that, if I try to call "deactivate" on the second node, and then activate it back, the activate process will takes forever.

I am using ha-jdbc 3.0.3 with jgroups 3.4.3.

my ha-jdbc-db.xml for the first node

<ha-jdbc xmlns="urn:ha-jdbc:cluster:3.0">
    <distributable id="jgroups">
        <property name="stack">jgroups.xml</property>
    </distributable>
    <sync id="passive"/>
    <state id="simple"/>
    <cluster default-sync="passive" dialect="newderby">
       <database id="db1" location="jdbc:derby://localhost:1527//tmp/db;ssl=basic;create=true">
         <user>app</user>
         <password>tySC+TrkVrI=</password>
       </database>
       <database id="db2" location="jdbc:derby://192.168.56.67:1527//tmp/db;ssl=basic">
         <user>app</user>
         <password>tySC+TrkVrI=</password>
       </database>
    </cluster>
</ha-jdbc>

and for the second node

<ha-jdbc xmlns="urn:ha-jdbc:cluster:3.0">
    <distributable id="jgroups">
        <property name="stack">jgroups.xml</property>
    </distributable>
    <sync id="passive"/>
    <state id="simple"/>
    <cluster default-sync="passive" dialect="newderby">
       <database id="db1" location="jdbc:derby://192.168.56.61:1527//tmp/db;ssl=basic">
         <user>app</user>
         <password>tySC+TrkVrI=</password>
       </database>
       <database id="db2" location="jdbc:derby://localhost:1527//tmp/db;ssl=basic">
         <user>app</user>
         <password>tySC+TrkVrI=</password>
       </database>
    </cluster>
</ha-jdbc>

my jgroups.xml for both the nodes

<config xmlns="urn:org:jgroups"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
    <UDP
         mcast_port="${jgroups.udp.mcast_port:45588}"
         ip_ttl="4"
         tos="8"
         ucast_recv_buf_size="5M"
         ucast_send_buf_size="5M"
         mcast_recv_buf_size="5M"
         mcast_send_buf_size="5M"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         enable_diagnostics="true"
         thread_naming_pattern="cl"

         timer_type="new3"
         timer.min_threads="2"
         timer.max_threads="4"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="10000"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"/>

    <PING />
    <MERGE3 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL/>
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK2 xmit_interval="500"
                xmit_table_num_rows="100"
                xmit_table_msgs_per_row="2000"
                xmit_table_max_compaction_time="30000"
                max_msg_batch_size="500"
                use_mcast_xmit="false"
                discard_delivered_msgs="true"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
               max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="2000"
            view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <RSVP resend_interval="2000" timeout="10000"/>
    <pbcast.STATE_TRANSFER />
    <!-- pbcast.FLUSH  /-->
</config>

In the primary node, I see this in DEBUG:
2015-04-13 18:27:57,261 DEBUG net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher [JGroupsCommandDispatcher.java:225] Received CoordinatorAcquireLockCommand(writeLock()) from company-app-dr-55817

and secondary node I see this in DEBUG:
2015-04-13 18:27:57,301 DEBUG net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher [JGroupsCommandDispatcher.java:225] Received MemberAcquireLockCommand(writeLock()) from company-app-27881

It seems that the secondary node is asking for the primary node to perform a lock, but it just hangs there.

After a restart of both the applications then the activate/deactivate works again.

Last edit: Yee_Keat Phuah 2015-04-13

Activating on the second node hangs

High-Availability JDBC

Forums

Help

Activating on the second node hangs

Activating on the second node hangs

High-Availability JDBC

Forums

Help

Activating on the second node hangs document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Activating on the second node hangs