I ran into this problem on HA-JDBC 3.0.4-SNAPSHOT. It is running on Tomcat 7u62, Java 7u80 x32, MySQL 5.1.58, and Linux. Each server runs a copy of Tomcat and MySQL.
Steps to reproduce:
1) Start MySQL on each server.
2) Start Tomcat on server 1. HA-JDBC starts with active MySQL on server 1 and server 2.
3) Start Tomcat on server 2. HA-JDBC queries state from Tomcat 1, both MySQL servers still active.
4) Trigger deactivation of MySQL on server 2 (ex: /etc/init.d/mysql restart)
5) Invoke DatabaseClusterImpl.activate(String databaseId,String strategyId)
If I execute step 5) on Tomcat 1, HA-JDBC successfully reactivates MySQL using truncate/insert re-synchronization strategy. However, if I go through all the starts and execute step 5) on Tomcat 2, HA-JDBC hangs indefinitely.
I can provide FINE logging from HA-JDBC and JGroups.
I noticed the writeLock() messages are not balanced if initiating from Tomcat 1 (coordinator) versus Tomcat 2 (non-coordinator). I am wondering if there is a problem with my JGroups or HA-JDBC config.
a) If initiating reactivate() from Tomcat 1 (coordinator), the writeLock() only appears on Tomcat 2 (non-coordinator). There are no writeLock() messages on Tomcat 1. This was a successful reactivation.
Tomcat 1
<no writeLock="" messages="">
Tomcat 2
Nov 03, 2015 8:15:02 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.214
Nov 03, 2015 8:15:10 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received ReleaseLockCommand(writeLock()) from 10.0.0.214
b) If initiating reactivate() from Tomcat 2 (non-coordinator), the writeLock() appears on both Tomcat 1 and Tomcat 2. This was an unsuccessful reactivation because HA-JDBC or JGroups is hung inside the call to reactivate().
Tomcat 1
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
Tomcat 2
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
<hung>
I tried to invoke reactivate on 10.0.0.215 and it hung. I am attaching the log files, so if you could look at 10.0.0.215 first maybe you can figure out if this is a HA-JDBC bug or a JGroups bug/config issue. Thanks.
I ran into this problem on HA-JDBC 3.0.4-SNAPSHOT. It is running on Tomcat 7u62, Java 7u80 x32, MySQL 5.1.58, and Linux. Each server runs a copy of Tomcat and MySQL.
Steps to reproduce:
1) Start MySQL on each server.
2) Start Tomcat on server 1. HA-JDBC starts with active MySQL on server 1 and server 2.
3) Start Tomcat on server 2. HA-JDBC queries state from Tomcat 1, both MySQL servers still active.
4) Trigger deactivation of MySQL on server 2 (ex: /etc/init.d/mysql restart)
5) Invoke DatabaseClusterImpl.activate(String databaseId,String strategyId)
If I execute step 5) on Tomcat 1, HA-JDBC successfully reactivates MySQL using truncate/insert re-synchronization strategy. However, if I go through all the starts and execute step 5) on Tomcat 2, HA-JDBC hangs indefinitely.
I can provide FINE logging from HA-JDBC and JGroups.
Last edit: Justin Cranford 2015-11-03
I noticed the writeLock() messages are not balanced if initiating from Tomcat 1 (coordinator) versus Tomcat 2 (non-coordinator). I am wondering if there is a problem with my JGroups or HA-JDBC config.
a) If initiating reactivate() from Tomcat 1 (coordinator), the writeLock() only appears on Tomcat 2 (non-coordinator). There are no writeLock() messages on Tomcat 1. This was a successful reactivation.
Tomcat 1
<no writeLock="" messages="">
Tomcat 2
Nov 03, 2015 8:15:02 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.214
Nov 03, 2015 8:15:10 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received ReleaseLockCommand(writeLock()) from 10.0.0.214
b) If initiating reactivate() from Tomcat 2 (non-coordinator), the writeLock() appears on both Tomcat 1 and Tomcat 2. This was an unsuccessful reactivation because HA-JDBC or JGroups is hung inside the call to reactivate().
Tomcat 1
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
Tomcat 2
Nov 03, 2015 8:28:49 PM net.sf.hajdbc.logging.slf4j.SLF4JLogger log
FINE: Received AcquireLockCommand(writeLock()) from 10.0.0.215
<hung>
Last edit: Justin Cranford 2015-11-03
Hi Paul,
Ignore the previous attachments. I re-ran with FINER logging for net.sf.hajdbc and org.jgroups so there is much more detail.
Tomcat #1 - 10.0.0.214 (coordinator)
Tomcat #2 - 10.0.0.215 (non-coordinator)
I tried to invoke reactivate on 10.0.0.215 and it hung. I am attaching the log files, so if you could look at 10.0.0.215 first maybe you can figure out if this is a HA-JDBC bug or a JGroups bug/config issue. Thanks.
Last edit: Justin Cranford 2015-11-04