Menu

Activating on the second node hangs

Help
2015-04-13
2015-04-15
  • Yee_Keat Phuah

    Yee_Keat Phuah - 2015-04-13

    I have two nodes/applications that is connected via jgroups, pointing to two database.

    Once I up the two applications, if I trigger activate/deactivate from the first node, everything is fine, I can deactivate and activate it back unlimited times. But after that, if I try to call "deactivate" on the second node, and then activate it back, the activate process will takes forever.

    I am using ha-jdbc 3.0.3 with jgroups 3.4.3.

    my ha-jdbc-db.xml for the first node

    <ha-jdbc xmlns="urn:ha-jdbc:cluster:3.0">
        <distributable id="jgroups">
            <property name="stack">jgroups.xml</property>
        </distributable>
        <sync id="passive"/>
        <state id="simple"/>
        <cluster default-sync="passive" dialect="newderby">
           <database id="db1" location="jdbc:derby://localhost:1527//tmp/db;ssl=basic;create=true">
             <user>app</user>
             <password>tySC+TrkVrI=</password>
           </database>
           <database id="db2" location="jdbc:derby://192.168.56.67:1527//tmp/db;ssl=basic">
             <user>app</user>
             <password>tySC+TrkVrI=</password>
           </database>
        </cluster>
    </ha-jdbc>
    

    and for the second node

    <ha-jdbc xmlns="urn:ha-jdbc:cluster:3.0">
        <distributable id="jgroups">
            <property name="stack">jgroups.xml</property>
        </distributable>
        <sync id="passive"/>
        <state id="simple"/>
        <cluster default-sync="passive" dialect="newderby">
           <database id="db1" location="jdbc:derby://192.168.56.61:1527//tmp/db;ssl=basic">
             <user>app</user>
             <password>tySC+TrkVrI=</password>
           </database>
           <database id="db2" location="jdbc:derby://localhost:1527//tmp/db;ssl=basic">
             <user>app</user>
             <password>tySC+TrkVrI=</password>
           </database>
        </cluster>
    </ha-jdbc>
    

    my jgroups.xml for both the nodes

    <config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
        <UDP
             mcast_port="${jgroups.udp.mcast_port:45588}"
             ip_ttl="4"
             tos="8"
             ucast_recv_buf_size="5M"
             ucast_send_buf_size="5M"
             mcast_recv_buf_size="5M"
             mcast_send_buf_size="5M"
             max_bundle_size="64K"
             max_bundle_timeout="30"
             enable_diagnostics="true"
             thread_naming_pattern="cl"
    
             timer_type="new3"
             timer.min_threads="2"
             timer.max_threads="4"
             timer.keep_alive_time="3000"
             timer.queue_max_size="500"
    
             thread_pool.enabled="true"
             thread_pool.min_threads="2"
             thread_pool.max_threads="8"
             thread_pool.keep_alive_time="5000"
             thread_pool.queue_enabled="true"
             thread_pool.queue_max_size="10000"
             thread_pool.rejection_policy="discard"
    
             oob_thread_pool.enabled="true"
             oob_thread_pool.min_threads="1"
             oob_thread_pool.max_threads="8"
             oob_thread_pool.keep_alive_time="5000"
             oob_thread_pool.queue_enabled="false"
             oob_thread_pool.queue_max_size="100"
             oob_thread_pool.rejection_policy="discard"/>
    
        <PING />
        <MERGE3 max_interval="30000"
                min_interval="10000"/>
        <FD_SOCK/>
        <FD_ALL/>
        <VERIFY_SUSPECT timeout="1500"  />
        <BARRIER />
        <pbcast.NAKACK2 xmit_interval="500"
                    xmit_table_num_rows="100"
                    xmit_table_msgs_per_row="2000"
                    xmit_table_max_compaction_time="30000"
                    max_msg_batch_size="500"
                    use_mcast_xmit="false"
                    discard_delivered_msgs="true"/>
        <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
        <pbcast.GMS print_local_addr="true" join_timeout="2000"
                view_bundling="true"/>
        <UFC max_credits="2M"
             min_threshold="0.4"/>
        <MFC max_credits="2M"
             min_threshold="0.4"/>
        <FRAG2 frag_size="60K"  />
        <RSVP resend_interval="2000" timeout="10000"/>
        <pbcast.STATE_TRANSFER />
        <!-- pbcast.FLUSH  /-->
    </config>
    

    In the primary node, I see this in DEBUG:
    2015-04-13 18:27:57,261 DEBUG net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher [JGroupsCommandDispatcher.java:225] Received CoordinatorAcquireLockCommand(writeLock()) from company-app-dr-55817

    and secondary node I see this in DEBUG:
    2015-04-13 18:27:57,301 DEBUG net.sf.hajdbc.distributed.jgroups.JGroupsCommandDispatcher [JGroupsCommandDispatcher.java:225] Received MemberAcquireLockCommand(writeLock()) from company-app-27881

    It seems that the secondary node is asking for the primary node to perform a lock, but it just hangs there.

    After a restart of both the applications then the activate/deactivate works again.

     

    Last edit: Yee_Keat Phuah 2015-04-13
  • Yee_Keat Phuah

    Yee_Keat Phuah - 2015-04-15

    It seems like the error is when we try to activate a DB from a non coordinator. After trying out with tcp.xml, and removing UNICAST3 as what I read from here:
    https://sourceforge.net/p/ha-jdbc/discussion/383397/thread/9b3dc233/

    The problem is gone.

     

Log in to post a comment.