Menu

DatabaseClusterImpl.activate() hangs if called on non-primary application node

Help
2015-11-03
2015-11-04
  • Justin Cranford

    Justin Cranford - 2015-11-03

    I ran into this problem on HA-JDBC 3.0.4-SNAPSHOT. It is running on Tomcat 7u62, Java 7u80 x32, MySQL 5.1.58, and Linux. Each server runs a copy of Tomcat and MySQL.

    Steps to reproduce:
    1) Start MySQL on each server.
    2) Start Tomcat on server 1. HA-JDBC starts with active MySQL on server 1 and server 2.
    3) Start Tomcat on server 2. HA-JDBC queries state from Tomcat 1, both MySQL servers still active.
    4) Trigger deactivation of MySQL on server 2 (ex: /etc/init.d/mysql restart)
    5) Invoke DatabaseClusterImpl.activate(String databaseId,String strategyId)

    If I execute step 5) on Tomcat 1, HA-JDBC successfully reactivates MySQL using truncate/insert re-synchronization strategy. However, if I go through all the starts and execute step 5) on Tomcat 2, HA-JDBC hangs indefinitely.

    I can provide FINE logging from HA-JDBC and JGroups.

     

    Last edit: Justin Cranford 2015-11-03
  • Justin Cranford

    Justin Cranford - 2015-11-03

    I noticed the writeLock() messages are not balanced if initiating from Tomcat 1 (coordinator) versus Tomcat 2 (non-coordinator). I am wondering if there is a problem with my JGroups or HA-JDBC config.

    a) If initiating reactivate() from Tomcat 1 (coordinator), the writeLock() only appears on Tomcat 2 (non-coordinator). There are no writeLock() messages on Tomcat 1. This was a successful reactivation.

    Tomcat 1

    <no writeLock="" messages="">

    Tomcat 2

    Nov 03, 2015 8:15:02 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.214
    Nov 03, 2015 8:15:10 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    FINE: Received ReleaseLockCommand(writeLock()) from 10.0.0.214

    b) If initiating reactivate() from Tomcat 2 (non-coordinator), the writeLock() appears on both Tomcat 1 and Tomcat 2. This was an unsuccessful reactivation because HA-JDBC or JGroups is hung inside the call to reactivate().

    Tomcat 1

    Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
    Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215

    Tomcat 2

    Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
    FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
    <hung>

     

    Last edit: Justin Cranford 2015-11-03
  • Justin Cranford

    Justin Cranford - 2015-11-04

    Hi Paul,

    Ignore the previous attachments. I re-ran with FINER logging for net.sf.hajdbc and org.jgroups so there is much more detail.

    Tomcat #1 - 10.0.0.214 (coordinator)
    Tomcat #2 - 10.0.0.215 (non-coordinator)

    I tried to invoke reactivate on 10.0.0.215 and it hung. I am attaching the log files, so if you could look at 10.0.0.215 first maybe you can figure out if this is a HA-JDBC bug or a JGroups bug/config issue. Thanks.

     

    Last edit: Justin Cranford 2015-11-04

Log in to post a comment.